Gurdit Singh

Part-1 'Overview' : Delta Lake

30 May 2021

In this first part, we will understand What kind of problem it causes for a typical data lake implementation More …

Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning

22 May 2021

Once a user application is bundled, it can be launched using the spark-submit script or via REST API apache Livy. More …

Episode-4 'File Formats' : Spark Performance Tuning

09 May 2021

Apache Spark supports many different file formats, common formats are CSV, JSON and other mainly used for big data analysis are Apache ORC, Apache Parquet and Apache Avro. More …

Episode-3 'Data Skewness' : Spark Performance Tuning

23 Apr 2021

Data skewness is caused by transformations that change data partitioning like join, groupBy, and orderBy, like group or join on a key that is not evenly distributed across the cluster. More …

Episode-2 'Memory Management' : Spark Performance Tuning

13 Apr 2021

Understanding the basics of Spark memory management helps you in tuning the configurations of spark to make the best out of the resources available. More …