30 May 2021
In this first part, we will understand What kind of problem it causes for a typical data lake implementation
More …
22 May 2021
Once a user application is bundled, it can be launched using the spark-submit script or via REST API apache Livy.
More …
09 May 2021
Apache Spark supports many different file formats, common formats are CSV, JSON and other mainly used for big data analysis are Apache ORC, Apache Parquet and Apache Avro.
More …
23 Apr 2021
Data skewness is caused by transformations that change data partitioning like join, groupBy, and orderBy, like group or join on a key that is not evenly distributed across the cluster.
More …
13 Apr 2021
Understanding the basics of Spark memory management helps you in tuning the configurations of spark to make the best out of the resources available.
More …