05 Jun 2021
  
    
  
    
      Let’s create our first Delta table! Like in databases, to define table definition and schema, and store in Delta format.
    
    
      
      
      
      
        More …
      
    
  
  
  
    
    
  30 May 2021
  
    
  
    
      In this first part, we will understand What kind of problem it causes for a typical data lake implementation
    
    
      
      
      
      
        More …
      
    
  
  
  
    
    
  22 May 2021
  
    
  
    
      Once a user application is bundled, it can be launched using the spark-submit script or via REST API apache Livy.
    
    
      
      
      
      
        More …
      
    
  
  
  
    
    
  09 May 2021
  
    
  
    
      Apache Spark supports many different file formats, common formats are CSV, JSON and other  mainly used for big data analysis are Apache ORC, Apache Parquet and Apache Avro.
    
    
      
      
      
      
        More …
      
    
  
  
  
    
    
  23 Apr 2021
  
    
  
    
      Data skewness is caused by transformations that change data partitioning like join, groupBy, and orderBy, like group or join on a key that is not evenly distributed across the cluster.
    
    
      
      
      
      
        More …