Technical Blog

© 2026.

Tags

abstraction 1 ACID 5 AGENTS.md 1 Aggregation 2 aggregation 1 AI Engineering 1 Amazon 1 Apache Iceberg 1 Apache Kafka 4 Apache Livy 1 Apache Spark 6 API 1 Architecture 1 Auditing 1 Avro 1 AWS Backup 1 AWS Cloudwatch 1 AWS RDS 1 AWS Trusted 2 AWS 1 Azure Cloud services 4 Azure Data Lake 4 Azure SQL Data Warehouse 4 Azure Synapse Analytics 4 Azure 4 Big Data 10 Bigdata 5 Bloom filter 1 Centralized Data 3 changelogs 1 changelog 1 checkpointing 1 Checkpoint 1 CLAUDE.md 1 CLI 1 Cloud Lakehouse 4 Cloud 3 cloud 4 Cluster Parameters 2 Codex 1 Coding Agents 1 commit-log 2 Contravariance 1 Cost 2 Covariance 1 CSV 1 Cursor 1 DABs 1 DAG 1 Data Architecture 3 Data Lakes 10 Data Mesh 1 Data Platforms 2 Data Skewness 1 Data Warehouse 4 database 2 Databricks Asset Bundles 1 Databricks 1 Decomposition 1 Deep Learning 1 DELETE 1 Delta Lake 1 deployment 1 Derivatives 1 DML 1 duality 1 DuckDB 2 Embedded 1 EMR 1 Engines 1 Enterprise AI 1 ETL 2 Event time 1 fault-tolerance 1 File Formats 1 Fugue 1 functional programming 3 Futures 1 Gemini CLI 1 Generic 2 GitHub Copilot 1 Governance 1 gRPC 1 hash 1 Higher order functions 1 HOF 1 HTTP/2 1 Ingestion Time 1 inner 1 Invariance 1 join 1 Json 1 Kafka producer 1 Kafka Streams 4 Kafka 1 KStream 2 KTables 1 Lambda architecture 1 Lambda 2 late data 1 late event 1 Left join 1 Linear Algebra 1 LogStore API 1 Machine Learning 1 Math 2 md5 1 Memory 3 MERGE 1 messaging queue 2 Monolithic 1 monotonically_increasing_id 1 MPG 1 Multiple Parameter Groups 1 Neural Network 1 offsets 1 ORC 1 Pandas 1 Parallelism 1 Parquet 1 Partitioning Strategy 1 Partitions 1 Performance Tuning 5 Probabilistic Data structures 1 Processing time 1 Processor 1 Producer Config 1 Producer Record 1 producer 1 programming 3 Protobuf 1 Python 5 repartitioning 1 REST API 1 Rest 1 right join 1 Rollbacks 1 RPC 1 Rules 1 S3 1 Scala 5 scala 3 Scheduler 1 Schema Enforcement 1 Schema Evolution 1 Schema 3 segment 1 Serde 1 Skewness 1 Skills 1 Sliding window 1 Snowflake 1 Spark Configurations 2 Spark context 1 Spark Pool 1 Spark session 1 Spark Streaming 1 spark-submit 1 Spark 21 Specs 1 SQL Pool 1 SQL 3 State store 1 Streaming 4 Structured Streaming 5 sub-topologies 1 Surrogate key 2 task 1 time travel 1 Time 1 topics 2 Topic 1 Topology 1 trait 1 Transaction Log 5 Tumbling window 1 TypeSystem 2 UPDATE 1 variance 1 Watermarking 1 Window 1 Windsurf 1 YML 1 zipwithindex 1

abstraction

KStream and KTables Duality in Kafka Streams: Episode 4 15 Mar 2021

ACID

AGENTS.md

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

Aggregation

aggregation

Internal topics and State store in Kafka Streams: Episode 2 08 Mar 2021

AI Engineering

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

Amazon

Apache Spark on Amazon EMR 29 Mar 2020

Apache Iceberg

Unlocking Data Lake Potential: Harnessing Apache Iceberg with AWS Glue and Snowflake 21 Apr 2024

Apache Kafka

Apache Livy

Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021

Apache Spark

API

The return of RPC - because REST is not only the solution for API Design 10 May 2020

Architecture

Episode-2 'Architecture' : Azure Synapse Analytics as a Cloud Lakehouse 25 Mar 2021

Auditing

Part-5 'Deep dive into Time Travel' : Delta Lake 26 Jun 2021

Avro

Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021

AWS Backup

Driving Down Cost: Leveraging Tools for AWS RDS Cost Optimization 10 May 2024

AWS Cloudwatch

Driving Down Cost: Leveraging Tools for AWS RDS Cost Optimization 10 May 2024

AWS RDS

Driving Down Cost: Leveraging Tools for AWS RDS Cost Optimization 10 May 2024

AWS Trusted

AWS

Unlocking Data Lake Potential: Harnessing Apache Iceberg with AWS Glue and Snowflake 21 Apr 2024

Azure Cloud services

Azure Data Lake

Azure SQL Data Warehouse

Azure Synapse Analytics

Azure

Big Data

Bigdata

Bloom filter

Probabilistic Data structures in Analysis of Big Data 08 Feb 2021

Centralized Data

changelogs

KStream and KTables Duality in Kafka Streams: Episode 4 15 Mar 2021

changelog

Internal topics and State store in Kafka Streams: Episode 2 08 Mar 2021

checkpointing

Checkpointing in Spark Structured Streaming 09 Feb 2021

Checkpoint

Part-3 'Transaction Log' : Delta Lake 13 Jun 2021

CLAUDE.md

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

CLI

Ship Databricks Workloads with DABs — Part 1: The Essentials 23 Sep 2025

Cloud Lakehouse

Cloud

cloud

Cluster Parameters

Codex

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

Coding Agents

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

commit-log

Contravariance

What is variance in Scala? 25 Jul 2019

Cost

Covariance

What is variance in Scala? 25 Jul 2019

CSV

Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021

Cursor

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

DABs

Ship Databricks Workloads with DABs — Part 1: The Essentials 23 Sep 2025

DAG

basics of Kafka Streams: Episode 1 25 Feb 2021

Data Architecture

Data Lakes

Data Mesh

Episode-1 'Data Mesh' : Evolution of data platforms 21 May 2023

Data Platforms

Data Skewness

Episode-3 'Data Skewness' : Spark Performance Tuning 23 Apr 2021

Data Warehouse

database

Databricks Asset Bundles

Ship Databricks Workloads with DABs — Part 1: The Essentials 23 Sep 2025

Databricks

Ship Databricks Workloads with DABs — Part 1: The Essentials 23 Sep 2025

Decomposition

Episode-1 'Data Mesh' : Evolution of data platforms 21 May 2023

Deep Learning

Neural Network from Scratch using Python 01 Aug 2019

DELETE

Part-6 'DML Operations' : Delta Lake 11 Jul 2021

Delta Lake

From Spark to DuckDB + Delta Lake: The Next Evolution 30 Jun 2024

deployment

Parallelism and Task allocation in Kafka Streams: Episode 3 09 Mar 2021

Derivatives

Neural Network from Scratch using Python 01 Aug 2019

DML

Part-6 'DML Operations' : Delta Lake 11 Jul 2021

duality

KStream and KTables Duality in Kafka Streams: Episode 4 15 Mar 2021

DuckDB

Embedded

From Spark to DuckDB + Delta Lake: The Next Evolution 30 Jun 2024

EMR

Apache Spark on Amazon EMR 29 Mar 2020

Engines

Seamless Data Processing with Fugue: Integrating Pandas, DuckDB, and Spark 21 Jul 2024

Enterprise AI

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

ETL

Event time

Concept of Time in Spark Structured Streaming 10 Feb 2021

fault-tolerance

Checkpointing in Spark Structured Streaming 09 Feb 2021

File Formats

Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021

Fugue

Seamless Data Processing with Fugue: Integrating Pandas, DuckDB, and Spark 21 Jul 2024

functional programming

Futures

Parallelizing Apache Spark jobs with scala 25 Oct 2020

Gemini CLI

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

Generic

GitHub Copilot

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

Governance

Part-5 'Deep dive into Time Travel' : Delta Lake 26 Jun 2021

gRPC

The return of RPC - because REST is not only the solution for API Design 10 May 2020

hash

Probabilistic Data structures in Analysis of Big Data 08 Feb 2021

Higher order functions

Scala Days: Higher order functions 17 Jul 2020

HOF

Scala Days: Higher order functions 17 Jul 2020

HTTP/2

The return of RPC - because REST is not only the solution for API Design 10 May 2020

Ingestion Time

Concept of Time in Spark Structured Streaming 10 Feb 2021

inner

Working with Joins in Spark Structured Streaming 18 Feb 2021

Invariance

What is variance in Scala? 25 Jul 2019

join

Working with Joins in Spark Structured Streaming 18 Feb 2021

Json

Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021

Kafka producer

deep dive into Apache Kafka internals (producer) - Part-2 16 Jan 2021

Kafka Streams

Kafka

deep dive into Apache Kafka internals - Part-1 09 Jan 2021

KStream

KTables

KStream and KTables Duality in Kafka Streams: Episode 4 15 Mar 2021

Lambda architecture

Lambda Architecture Design Pattern 18 Apr 2020

Lambda

late data

Handling Late Data Using Watermarking in Spark Structured Streaming 17 Feb 2021

late event

Handling Late Data Using Watermarking in Spark Structured Streaming 17 Feb 2021

Left join

Working with Joins in Spark Structured Streaming 18 Feb 2021

Linear Algebra

Basic Linear Algebra for Machine Learning: (Learn Day:1) 23 Aug 2019

LogStore API

Part-3 'Transaction Log' : Delta Lake 13 Jun 2021

Machine Learning

Neural Network from Scratch using Python 01 Aug 2019

Math

md5

Probabilistic Data structures in Analysis of Big Data 08 Feb 2021

Memory

MERGE

Part-6 'DML Operations' : Delta Lake 11 Jul 2021

messaging queue

Monolithic

Episode-1 'Data Mesh' : Evolution of data platforms 21 May 2023

monotonically_increasing_id

Surrogate key with Apache Spark - Part-1 28 Oct 2020

MPG

Scala Days: Multiple Parameter Groups 26 Jul 2020

Multiple Parameter Groups

Scala Days: Multiple Parameter Groups 26 Jul 2020

Neural Network

Neural Network from Scratch using Python 01 Aug 2019

offsets

Checkpointing in Spark Structured Streaming 09 Feb 2021

ORC

Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021

Pandas

Seamless Data Processing with Fugue: Integrating Pandas, DuckDB, and Spark 21 Jul 2024

Parallelism

Parallelism and Task allocation in Kafka Streams: Episode 3 09 Mar 2021

Parquet

Episode-4 'File Formats' : Spark Performance Tuning 09 May 2021

Partitioning Strategy

deep dive into Apache Kafka internals (producer) - Part-2 16 Jan 2021

Partitions

deep dive into Apache Kafka internals - Part-1 09 Jan 2021

Performance Tuning

Probabilistic Data structures

Probabilistic Data structures in Analysis of Big Data 08 Feb 2021

Processing time

Concept of Time in Spark Structured Streaming 10 Feb 2021

Processor

basics of Kafka Streams: Episode 1 25 Feb 2021

Producer Config

deep dive into Apache Kafka internals (producer) - Part-2 16 Jan 2021

Producer Record

deep dive into Apache Kafka internals (producer) - Part-2 16 Jan 2021

producer

deep dive into Apache Kafka internals (producer) - Part-2 16 Jan 2021

programming

Protobuf

The return of RPC - because REST is not only the solution for API Design 10 May 2020

Python

repartitioning

Internal topics and State store in Kafka Streams: Episode 2 08 Mar 2021

REST API

Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021

Rest

The return of RPC - because REST is not only the solution for API Design 10 May 2020

right join

Working with Joins in Spark Structured Streaming 18 Feb 2021

Rollbacks

Part-5 'Deep dive into Time Travel' : Delta Lake 26 Jun 2021

RPC

The return of RPC - because REST is not only the solution for API Design 10 May 2020

Rules

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

S3

Apache Spark on Amazon EMR 29 Mar 2020

Scala

scala

Scheduler

Parallelizing Apache Spark jobs with scala 25 Oct 2020

Schema Enforcement

Part-4 'Deep dive into Schema Enforcement & Evolution' : Delta Lake 13 Jun 2021

Schema Evolution

Part-4 'Deep dive into Schema Enforcement & Evolution' : Delta Lake 13 Jun 2021

Schema

segment

deep dive into Apache Kafka internals - Part-1 09 Jan 2021

Serde

deep dive into Apache Kafka internals (producer) - Part-2 16 Jan 2021

Skewness

Episode-3 'Data Skewness' : Spark Performance Tuning 23 Apr 2021

Skills

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

Sliding window

Working with Window Aggregations in Spark Structured Streaming 15 Feb 2021

Snowflake

Unlocking Data Lake Potential: Harnessing Apache Iceberg with AWS Glue and Snowflake 21 Apr 2024

Spark Configurations

Spark context

Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021

Spark Pool

Episode-4 'Spark Pool' : Azure Synapse Analytics as a Cloud Lakehouse 31 Mar 2021

Spark session

Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021

Spark Streaming

Lambda Architecture Design Pattern 18 Apr 2020

spark-submit

Episode-5 'Spark-submit vs Apache Livy' : Spark Performance Tuning 22 May 2021

Spark

Specs

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

SQL Pool

Episode-3 'SQL Pool' : Azure Synapse Analytics as a Cloud Lakehouse 27 Mar 2021

SQL

State store

Internal topics and State store in Kafka Streams: Episode 2 08 Mar 2021

Streaming

Structured Streaming

sub-topologies

Parallelism and Task allocation in Kafka Streams: Episode 3 09 Mar 2021

Surrogate key

task

Parallelism and Task allocation in Kafka Streams: Episode 3 09 Mar 2021

time travel

Part-5 'Deep dive into Time Travel' : Delta Lake 26 Jun 2021

Time

Concept of Time in Spark Structured Streaming 10 Feb 2021

topics

Topic

deep dive into Apache Kafka internals - Part-1 09 Jan 2021

Topology

basics of Kafka Streams: Episode 1 25 Feb 2021

trait

Scala Days: Traits 15 Jul 2020

Transaction Log

Tumbling window

Working with Window Aggregations in Spark Structured Streaming 15 Feb 2021

TypeSystem

UPDATE

Part-6 'DML Operations' : Delta Lake 11 Jul 2021

variance

What is variance in Scala? 25 Jul 2019

Watermarking

Handling Late Data Using Watermarking in Spark Structured Streaming 17 Feb 2021

Window

Working with Window Aggregations in Spark Structured Streaming 15 Feb 2021

Windsurf

From Prompt Chaos to Project Discipline: How Enterprises Use AI Markdown Files to Operationalize Coding Agents 12 Apr 2026

YML

Ship Databricks Workloads with DABs — Part 1: The Essentials 23 Sep 2025

zipwithindex

Surrogate key with Apache Spark - Part-2 01 Nov 2020