Avatar Gurdit Singh

Technical Blog

Home About

© 2025.

Newer

Handling Late Data Using Watermarking in Spark Structured Streaming

17 Feb 2021
what happens if one of the events arrives late to the application and we apply window-based grouping More …

Working with Window Aggregations in Spark Structured Streaming

15 Feb 2021
In Streaming appliction rather than running aggregations over the whole stream, you want aggregations over subset of data by time windows (say, every 5 minutes or every hour) More …

Concept of Time in Spark Structured Streaming

10 Feb 2021
Time in streaming application is way to correlate different events in the stream to extract some meaningful insights. More …

Checkpointing in Spark Structured Streaming

09 Feb 2021
The primary goal of checkpointing is to ensure the fault-tolerance and state of a previous query of streaming jobs. More …

Probabilistic Data structures in Analysis of Big Data

08 Feb 2021
A Bloom filter is a probabilistic data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. More …
Older