abstraction
1
ACID
5
Aggregation
2
aggregation
1
Amazon
1
Apache Iceberg
1
Apache Kafka
4
Apache Livy
1
Apache Spark
6
API
1
Architecture
1
Auditing
1
Avro
1
AWS Backup
1
AWS Cloudwatch
1
AWS RDS
1
AWS Trusted
2
AWS
1
Azure Cloud services
4
Azure Data Lake
4
Azure SQL Data Warehouse
4
Azure Synapse Analytics
4
Azure
4
Big Data
9
Bigdata
5
Bloom filter
1
Centralized Data
3
changelogs
1
changelog
1
checkpointing
1
Checkpoint
1
Cloud Lakehouse
4
Cloud
3
cloud
4
Cluster Parameters
2
commit-log
2
Contravariance
1
Cost
2
Covariance
1
CSV
1
DAG
1
Data Architecture
3
Data Lakes
10
Data Mesh
1
Data Platforms
2
Data Skewness
1
Data Warehouse
4
database
2
Decomposition
1
Deep Learning
1
DELETE
1
Delta Lake
1
deployment
1
Derivatives
1
DML
1
duality
1
DuckDB
2
Embedded
1
EMR
1
Engines
1
ETL
2
Event time
1
fault-tolerance
1
File Formats
1
Fugue
1
functional programming
3
Futures
1
Generic
2
Governance
1
gRPC
1
hash
1
Higher order functions
1
HOF
1
HTTP/2
1
Ingestion Time
1
inner
1
Invariance
1
join
1
Json
1
Kafka producer
1
Kafka Streams
4
Kafka
1
KStream
2
KTables
1
Lambda architecture
1
Lambda
2
late data
1
late event
1
Left join
1
Linear Algebra
1
LogStore API
1
Machine Learning
1
Math
2
md5
1
Memory
3
MERGE
1
messaging queue
2
Monolithic
1
monotonically_increasing_id
1
MPG
1
Multiple Parameter Groups
1
Neural Network
1
offsets
1
ORC
1
Pandas
1
Parallelism
1
Parquet
1
Partitioning Strategy
1
Partitions
1
Performance Tuning
5
Probabilistic Data structures
1
Processing time
1
Processor
1
Producer Config
1
Producer Record
1
producer
1
programming
3
Protobuf
1
Python
4
repartitioning
1
REST API
1
Rest
1
right join
1
Rollbacks
1
RPC
1
S3
1
Scala
5
scala
3
Scheduler
1
Schema Enforcement
1
Schema Evolution
1
Schema
3
segment
1
Serde
1
Skewness
1
Sliding window
1
Snowflake
1
Spark Configurations
2
Spark context
1
Spark Pool
1
Spark session
1
Spark Streaming
1
spark-submit
1
Spark
20
SQL Pool
1
SQL
3
State store
1
Streaming
4
Structured Streaming
5
sub-topologies
1
Surrogate key
2
task
1
time travel
1
Time
1
topics
2
Topic
1
Topology
1
trait
1
Transaction Log
5
Tumbling window
1
TypeSystem
2
UPDATE
1
variance
1
Watermarking
1
Window
1
zipwithindex
1
ACID
Aggregation
Apache Iceberg
Apache Kafka
Apache Spark
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Working with Joins in Spark Structured Streaming 18 Feb 2021
-
Handling Late Data Using Watermarking in Spark Structured Streaming 17 Feb 2021
-
Working with Window Aggregations in Spark Structured Streaming 15 Feb 2021
-
Concept of Time in Spark Structured Streaming 10 Feb 2021
-
Checkpointing in Spark Structured Streaming 09 Feb 2021
AWS Trusted
AWS
Azure Cloud services
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021
-
Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021
-
Episode-1 'Overview' : Azure Synapse Analytics as a Cloud Lakehouse 20 Mar 2021
Azure Data Lake
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021
-
Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021
-
Episode-1 'Overview' : Azure Synapse Analytics as a Cloud Lakehouse 20 Mar 2021
Azure SQL Data Warehouse
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021
-
Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021
-
Episode-1 'Overview' : Azure Synapse Analytics as a Cloud Lakehouse 20 Mar 2021
Azure Synapse Analytics
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021
-
Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021
-
Episode-1 'Overview' : Azure Synapse Analytics as a Cloud Lakehouse 20 Mar 2021
Azure
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021
-
Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021
-
Episode-1 'Overview' : Azure Synapse Analytics as a Cloud Lakehouse 20 Mar 2021
Big Data
-
Seamless Data Processing with Fugue: Integrating Pandas, DuckDB, and Spark 21 Jul 2024
-
From Spark to DuckDB + Delta Lake: The Next Evolution 30 Jun 2024
-
Working with Joins in Spark Structured Streaming 18 Feb 2021
-
Handling Late Data Using Watermarking in Spark Structured Streaming 17 Feb 2021
-
Working with Window Aggregations in Spark Structured Streaming 15 Feb 2021
-
Concept of Time in Spark Structured Streaming 10 Feb 2021
-
Checkpointing in Spark Structured Streaming 09 Feb 2021
-
Probabilistic Data structures in Analysis of Big Data 08 Feb 2021
-
Lambda Architecture Design Pattern 18 Apr 2020
Bigdata
-
Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021
-
Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021
-
Episode-3 'Data Skewness' : Spark Performance Tuning 23 Apr 2021
-
Episode-2 'Memory Management' : Spark Performance Tuning 13 Apr 2021
-
Episode-1 'Overview' : Spark Performance Tuning 06 Apr 2021
Centralized Data
checkpointing
Cloud Lakehouse
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021
-
Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021
-
Episode-1 'Overview' : Azure Synapse Analytics as a Cloud Lakehouse 20 Mar 2021
Cloud
cloud
-
Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021
-
Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021
-
Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021
-
Episode-1 'Overview' : Azure Synapse Analytics as a Cloud Lakehouse 20 Mar 2021
Cluster Parameters
commit-log
Contravariance
Cost
Covariance
Data Architecture
Data Lakes
-
Unlocking Data Lake Potential: Harnessing Apache Iceberg with AWS Glue and Snowflake 21 Apr 2024
-
Episode-1 'Data Mesh' : Evolution of data platforms 21 May 2023
-
Part-6 'DML Operations' : Delta Lake 11 Jul 2021
-
Part-5 'Deep dive into Time Travel' : Delta Lake 26 Jun 2021
-
Part-4 'Deep dive into Schema Enforcement & Evolution' : Delta Lake 13 Jun 2021
-
Part-3 'Transaction Log' : Delta Lake 13 Jun 2021
-
Part-2 'First Delta Table' : Delta Lake 05 Jun 2021
-
Part-1 'Overview' : Delta Lake 30 May 2021
-
Lambda Architecture Design Pattern 18 Apr 2020
-
Delta Lake: (Learn Part:1) 12 Apr 2020
Data Platforms
Data Warehouse
database
Deep Learning
DuckDB
ETL
fault-tolerance
functional programming
Generic
Higher order functions
Ingestion Time
Invariance
Kafka Streams
KStream
Lambda architecture
LogStore API
Machine Learning
Math
Memory
messaging queue
monotonically_increasing_id
Multiple Parameter Groups
Neural Network
Partitioning Strategy
Performance Tuning
-
Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021
-
Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021
-
Episode-3 'Data Skewness' : Spark Performance Tuning 23 Apr 2021
-
Episode-2 'Memory Management' : Spark Performance Tuning 13 Apr 2021
-
Episode-1 'Overview' : Spark Performance Tuning 06 Apr 2021
Probabilistic Data structures
Processing time
programming
Python
Scala
scala
Schema
Snowflake
Spark Configurations
Spark Streaming
Spark
-
Seamless Data Processing with Fugue: Integrating Pandas, DuckDB, and Spark 21 Jul 2024
-
From Spark to DuckDB + Delta Lake: The Next Evolution 30 Jun 2024
-
Part-6 'DML Operations' : Delta Lake 11 Jul 2021
-
Part-5 'Deep dive into Time Travel' : Delta Lake 26 Jun 2021
-
Part-4 'Deep dive into Schema Enforcement & Evolution' : Delta Lake 13 Jun 2021
-
Part-3 'Transaction Log' : Delta Lake 13 Jun 2021
-
Part-2 'First Delta Table' : Delta Lake 05 Jun 2021
-
Part-1 'Overview' : Delta Lake 30 May 2021
-
Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021
-
Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021
-
Episode-3 'Data Skewness' : Spark Performance Tuning 23 Apr 2021
-
Episode-2 'Memory Management' : Spark Performance Tuning 13 Apr 2021
-
Episode-1 'Overview' : Spark Performance Tuning 06 Apr 2021
-
Surrogate key with Apache Spark - Part-2 01 Nov 2020
-
Surrogate key with Apache Spark - Part-1 28 Oct 2020
-
Parallelizing Apache Spark jobs with scala 25 Oct 2020
-
Lambda Architecture Design Pattern 18 Apr 2020
-
Delta Lake: (Learn Part:1) 12 Apr 2020
-
Apache Spark on Amazon EMR 29 Mar 2020
-
Sparks RDD is Invariant, Write generic types to make it Covariant 24 Jul 2019
SQL
Streaming
Structured Streaming
-
Working with Joins in Spark Structured Streaming 18 Feb 2021
-
Handling Late Data Using Watermarking in Spark Structured Streaming 17 Feb 2021
-
Working with Window Aggregations in Spark Structured Streaming 15 Feb 2021
-
Concept of Time in Spark Structured Streaming 10 Feb 2021
-
Checkpointing in Spark Structured Streaming 09 Feb 2021