Avatar Gurdit Singh

Technical Blog

Home About

© 2025.

Newer

Probabilistic Data structures in Analysis of Big Data

08 Feb 2021
A Bloom filter is a probabilistic data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. More …

deep dive into Apache Kafka internals (producer) - Part-2

16 Jan 2021
A Kafka producer is an application that can act as a source of data in a Kafka cluster. A producer can publish messages to one or more Kafka topics. More …

deep dive into Apache Kafka internals - Part-1

09 Jan 2021
With reference to storage in Kafka. Partitions are the units of storage in Kafka for messages and Topic can be thought of as being a container in which these partitions lie. More …

Surrogate key with Apache Spark - Part-2

01 Nov 2020
You can generate SURROGATE_KEY by apache spark to automatically generate numerical Ids for rows as you enter data into a table. More …

Surrogate key with Apache Spark - Part-1

28 Oct 2020
You can generate SURROGATE_KEY by apache spark to automatically generate numerical Ids for rows as you enter data into a table. More …
Older