This example requires running a multi-node kafka cluster in local. For instructions on how to do this, please refer to my previous article.
In this article we will send an insurance file with 36635 rows to Kafka with 3 clusters and have a group of 2 consumers consume this data. You can download the file by clicking on the hyperlink.
- Open the terminal window and start the zookeeper
- Open 3 more terminal windows and start the 3 Brokers
kafka-server-start ../etc/kafka/server1.properties kafka-server-start ../etc/kafka/server2.properties kafka-server-start ../etc/kafka/server3.properties
At this stage you should have a zookeeper and 3 kafka broker running in your local.
- Now, let us create a topic with below configurations
- Topic Name: Insurance
- Partition: 3. This allows us to parallelize the processing of data by splitting the data to 3 brokers
- Replication Factor: 1 . This defines the number of copies of the partition that need to be kept. Replication factor = 1 means there is a single copy of your data. If your broker fails then you loose all your data.
- Execute the command to create topic with above config
kafka-topics --create --topic insurance --partitions 3 --replication-factor 1 --bootstrap-server localhost:9092
- After the creation of topic, create 2 consumers and add them to the same group by executing below command in 2 terminal windows.
kafka-console-consumer --bootstrap-server localhost:9092 --topic insurance --from-beginning --group group 1 kafka-console-consumer --bootstrap-server localhost:9092 --topic insurance --from-beginning --group group 1
- The above step will create 2 consumers and add them to consumer group named group 1 and will consume data from insurance Topic
- So, at this stage we have a Topic named insurance created and following items running
- 3 Kafka brokers
- 2 consumers which belong to same consumer group which is group 1
- With this setup in place, we can now start our producer which will read the data from a csv file and push it to the Kafka Cluster
- To start the producer. In a terminal window execute the following command , you can arrange the window to see the consumer windows side by side to see data getting printed on consumer console. “/data/insurance-sample.csv” is the path of the data file downloaded.
kafka-console-producer –topic insurance –broker-list localhost:9092 /data/insurance-sample.csv
- With this step we have successfully ran 3 Node Kafka cluster, created a producer to push data to a topic with 3 partition and have it consumed by a consumer group of 2 consumers.
- If you check the log directory configured for your broker, you will find 3 folders for the 3 brokers like below
- Inside each of this folder , you will see a folder insurance created. The 3 partitions created will be in these folders
- If you use kafka-dump-log command and analyze the log file as below on a terminal window, it will show the offset processed by the broker.
kafka-dump-log --files /tmp/kafka-logs-3/insurance-1/00000000000000000000.log
- Now in my case for Topic 1 , I see 12398 as offset which means this topic processed 12399 records . Upon analyzing the other log files each partition this is how the records were processed
- This matches the total record count of the csv file.
- If you will do a control + C now on the consumer window. It will show the number of records it had processed.
- This indicates that first consumer in group 1 processed 25242 records and second consumer in group 2 processed 11393 records.
- Therefore Topic 1 and Topic 2 were assigned to first consumer in group 1 and Topic 3 was assigned to second consumer in group 1.
|Topic 1||12399||Consumer Group 1|
|Topic 2||12843||Consumer Group 1|
|Topic 3||11393||Consumer Group 2|
- With this we have also analyzed how Kafka stores the Topic, partitions and how consumer groups consume data from partitions from a multi node kafka cluster