05 Jun 2021
Let’s create our first Delta table! Like in databases, to define table definition and schema, and store in Delta format.
More …
30 May 2021
In this first part, we will understand What kind of problem it causes for a typical data lake implementation
More …
22 May 2021
Once a user application is bundled, it can be launched using the spark-submit script or via REST API apache Livy.
More …
09 May 2021
Apache Spark supports many different file formats, common formats are CSV, JSON and other mainly used for big data analysis are Apache ORC, Apache Parquet and Apache Avro.
More …
23 Apr 2021
Data skewness is caused by transformations that change data partitioning like join, groupBy, and orderBy, like group or join on a key that is not evenly distributed across the cluster.
More …