Cleansing and transforming schema drifted CSV files into relational data in Azure Databricks

This article looks at leveraging Apache Spark’s parallel analytics capabilities to iteratively cleanse and transform schema-drifted CSV files into queryable relational data to store in a data warehouse. We will work in a Spark environment and write code in PySpark to achieve our transformation goal.