Installing KAFKA Single Node - Quick Start Guide

Apache Kafka is often described as a distributed, fault-tolerant, high-throughput publish-subscribe messaging system. Think of it as a central nervous system for your applications, enabling them to communicate and share data in real-time. But what does that really mean? Let’s break it down.

Kafka reimagines the traditional publish-subscribe model as a distributed commit log. This design allows for massive scalability and durability, making it ideal for handling real-time data streams.

Before we dive into the installation process, let’s define some key Kafka terminology:

Topics: Kafka organizes messages into categories called topics. A topic is similar to a table in a database, representing a specific stream of data.
Producers: Producers are applications that publish (write) messages to Kafka topics. They are the source of data flowing into the system.
Consumers: Consumers are applications that subscribe to (read) messages from Kafka topics. They process the data published by producers.
Brokers: Kafka runs as a cluster of one or more servers, each called a broker. Brokers are responsible for storing and managing the data within the Kafka cluster.
Zookeeper: Kafka relies on Zookeeper for managing the cluster state, configuration, and coordination between brokers.

For more comprehensive information, refer to the official Apache Kafka documentation: http://kafka.apache.org/documentation.html

Prerequisites

Before proceeding with the installation, ensure you have the following:

Java Development Kit (JDK): Kafka requires a JDK to run. It’s recommended to use a stable version of Java 8 or later.
Linux Environment: This guide assumes you are using a Linux-based operating system.
Sudo Privileges: You’ll need sudo privileges to install software and configure the system.

Download and Extract

First, download the Kafka tgz file from the Apache Kafka website. Then, extract the archive to your desired installation directory. In this example, we’ll use /opt.

[kafka-admin@kafka Downloads]$ ls
jdk-7u75-linux-x64.rpm  kafka_2.9.2-0.8.2.0.tgz
[kafka-admin@kafka Downloads]$ sudo rpm -ivh jdk-7u75-linux-x64.rpm
... # Output from the JDK installation
[kafka-admin@kafka Downloads]$ sudo tar -xzf kafka_2.9.2-0.8.2.0.tgz -C /opt
[kafka-admin@kafka Downloads]$ cd /opt
[kafka-admin@kafka opt]$ sudo ln -s kafka_2.9.2-0.8.2.0 kafka
[kafka-admin@kafka opt]$ ls
kafka  kafka_2.9.2-0.8.2.0
[kafka-admin@kafka opt]$ sudo chmod kafka-admin:kafka-admin -R kafka

Explanation:

ls: Lists the files in the Downloads directory.
sudo rpm -ivh jdk-7u75-linux-x64.rpm: Installs the Java JDK from the RPM package. (Note: You might need to adjust the JDK version and installation command based on your system.)
sudo tar -xzf kafka_2.9.2-0.8.2.0.tgz -C /opt: Extracts the Kafka archive to the /opt directory.
cd /opt: Changes the current directory to /opt.
sudo ln -s kafka_2.9.2-0.8.2.0 kafka: Creates a symbolic link named kafka pointing to the Kafka installation directory. This simplifies future access.
ls: Lists the files in the /opt directory to verify the symbolic link.
sudo chmod kafka-admin:kafka-admin -R kafka: Changes the ownership of the Kafka directory to the kafka-admin user and group. This is important for ensuring the Kafka process has the necessary permissions. Adjust kafka-admin:kafka-admin to the appropriate user and group on your system. The -R option applies the changes recursively to all files and directories within the Kafka installation.

Now we are ready to start all the services required.

Navigate to the Kafka installation directory.

[kafka-admin@kafka opt]$ cd kafka
[kafka-admin@kafka kafka]$ ls
bin  config  libs  LICENSE  logs  NOTICE

Start the Zookeeper service:

[kafka-admin@kafka kafka]$ bin/zookeeper-server-start.sh config/zookeeper.properties

This will start a Zookeeper instance on localhost and port 2181. You can customize this configuration by modifying the config/zookeeper.properties file.

Important: If you intend to run Zookeeper on a separate machine, you must update the config/server.properties file to point the Kafka server to the correct Zookeeper address. By default, it is configured to localhost:2181.

Next, start the Kafka broker service:

[kafka-admin@kafka kafka]$ bin/kafka-server-start.sh config/server.properties

Understanding Broker Configuration:

For a single-node setup, the default config/server.properties file is usually sufficient. However, if you plan to run multiple brokers on the same machine (for testing purposes), you’ll need to create multiple copies of the server.properties file and modify the following parameters in each copy:

broker.id: A unique numerical identifier for each broker in the cluster. This ID must be different for each broker.
port: The port on which the broker will listen for connections. Choose a different port for each broker to avoid conflicts.
log.dir: The directory where the broker will store its log data. Each broker needs its own dedicated log directory.

Example:

config/server-1.properties:
    broker.id=1
    port=9093
    log.dir=/tmp/kafka-logs-1

config/server-2.properties:
    broker.id=2
    port=9094
    log.dir=/tmp/kafka-logs-2

For each additional broker, start the server using the corresponding configuration file:

bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties

Now that the server has started, let’s assume we only have one broker running for simplicity.

Creating Topics

A topic is a category or feed name to which messages are published. To create a topic, use the kafka-topics.sh script:

[kafka-admin@kafka kafka]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 \
                                            --replication-factor 1 --partitions 1 --topic test

Explanation of parameters:

--create: Specifies that we want to create a new topic.
--zookeeper localhost:2181: Specifies the Zookeeper connection string. This tells Kafka where to find Zookeeper.
--replication-factor 1: Specifies the number of replicas for each partition. In a single-node setup, the replication factor should be 1. For production environments, a replication factor of 2 or 3 is recommended for fault tolerance.
--partitions 1: Specifies the number of partitions for the topic. Partitions allow you to parallelize message processing across multiple consumers.
--topic test: Specifies the name of the topic to create. In this case, we’re creating a topic named test.

To verify that the topic has been created, use the --list option:

[kafka-admin@kafka kafka]$ bin/kafka-topics.sh --list --zookeeper localhost:2181
test
[kafka-admin@kafka kafka]$

This confirms that the test topic has been successfully created. We are now ready to send and receive messages.

Send some message

Open a new terminal window. We’ll use the kafka-console-producer.sh script to send messages to the test topic.

[kafka-admin@kafka kafka]$ bin/kafka-console-producer.sh --broker-list localhost:9092 \
                                                                                        --topic test

Explanation:

--broker-list localhost:9092: Specifies the list of Kafka brokers to connect to. In this case, we’re connecting to the broker running on localhost and port 9092.
--topic test: Specifies the topic to which we want to send messages.

Now, type your messages and press Enter after each message. Each line you type will be sent as a separate message to the Kafka topic.

This is a message
This is a message2

Start a Consumer

Open another new terminal window. We’ll use the kafka-console-consumer.sh script to consume messages from the test topic.

[kafka-admin@kafka kafka]$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 \
                                                                    --topic test --from-beginning

Explanation:

--zookeeper localhost:2181: Specifies the Zookeeper connection string.
--topic test: Specifies the topic from which we want to consume messages.
--from-beginning: This important option tells the consumer to read all messages from the beginning of the topic, even if they were published before the consumer started. Without this option, the consumer would only receive messages published after it started.

You should see the messages you typed in the producer terminal:

This is a message
This is a message2

Congratulations! You have successfully set up a single-node Kafka cluster and produced and consumed messages.

Next Steps

This guide provides a basic introduction to setting up a single-node Kafka cluster. To further explore Kafka, consider the following:

Multi-broker setup: Learn how to configure and manage a Kafka cluster with multiple brokers for increased scalability and fault tolerance.
Kafka Connect: Explore Kafka Connect for integrating Kafka with other data sources and sinks.
Kafka Streams: Dive into Kafka Streams for building real-time stream processing applications.
Security: Implement security measures such as authentication and authorization to protect your Kafka cluster.
Monitoring: Set up monitoring tools to track the health and performance of your Kafka cluster.

Share on

Twitter Facebook Google+ LinkedIn

Installing KAFKA Single Node - Quick Start Guide

AHMED ZBYR

Prerequisites

Download and Extract

Now we are ready to start all the services required.

Creating Topics

Send some message

Start a Consumer

Next Steps

Share on

You May Also Enjoy

Setting Up Pi-hole on a Raspberry Pi Zero 2W for Network-Wide Ad Blocking

Event-Driven Ansible - Automating Responses to Real-Time Events: A Deep Dive

Ansible and Terraform - A Powerful DevOps Combination for Cloud and On-Premises Infrastructure

5 Common Ansible Mistakes and How to Avoid Them