Gurdit Singh

deep dive into Apache Kafka internals (producer) - Part-2

16 Jan 2021

A Kafka producer is an application that can act as a source of data in a Kafka cluster. A producer can publish messages to one or more Kafka topics. More …

deep dive into Apache Kafka internals - Part-1

09 Jan 2021

With reference to storage in Kafka. Partitions are the units of storage in Kafka for messages and Topic can be thought of as being a container in which these partitions lie. More …

Surrogate key with Apache Spark - Part-2

01 Nov 2020

You can generate SURROGATE_KEY by apache spark to automatically generate numerical Ids for rows as you enter data into a table. More …

Surrogate key with Apache Spark - Part-1

28 Oct 2020

You can generate SURROGATE_KEY by apache spark to automatically generate numerical Ids for rows as you enter data into a table. More …

Parallelizing Apache Spark jobs with scala

25 Oct 2020

Apache Spark already performs data processing in parallel. Spark runs multiple tasks among each executor to achieve parallelism, however, it is not true at job level. More …