Apache Kafka is often described as a distributed, fault-tolerant, high-throughput publish-subscribe messaging system. Think of it as a central nervous system for your applications, enabling them to communicate and share data in real-time. But what does that really mean? Let’s break it down.
Kafka reimagines the traditional publish-subscribe model as a distributed commit log. This design allows for massive scalability and durability, making it ideal for handling real-time data streams.
Before we dive into the installation process, let’s define some key Kafka terminology:
- Topics: Kafka organizes messages into categories called topics. A topic is similar to a table in a database, representing a specific stream of data.
- Producers: Producers are applications that publish (write) messages to Kafka topics. They are the source of data flowing into the system.
- Consumers: Consumers are applications that subscribe to (read) messages from Kafka topics. They process the data published by producers.
- Brokers: Kafka runs as a cluster of one or more servers, each called a broker. Brokers are responsible for storing and managing the data within the Kafka cluster.
- Zookeeper: Kafka relies on Zookeeper for managing the cluster state, configuration, and coordination between brokers.
For more comprehensive information, refer to the official Apache Kafka documentation: http://kafka.apache.org/documentation.html
Prerequisites
Before proceeding with the installation, ensure you have the following:
- Java Development Kit (JDK): Kafka requires a JDK to run. It’s recommended to use a stable version of Java 8 or later.
- Linux Environment: This guide assumes you are using a Linux-based operating system.
- Sudo Privileges: You’ll need sudo privileges to install software and configure the system.
Download and Extract
First, download the Kafka tgz
file from the Apache Kafka website. Then, extract the archive to your desired installation directory. In this example, we’ll use /opt
.
[kafka-admin@kafka Downloads]$ ls
jdk-7u75-linux-x64.rpm kafka_2.9.2-0.8.2.0.tgz
[kafka-admin@kafka Downloads]$ sudo rpm -ivh jdk-7u75-linux-x64.rpm
... # Output from the JDK installation
[kafka-admin@kafka Downloads]$ sudo tar -xzf kafka_2.9.2-0.8.2.0.tgz -C /opt
[kafka-admin@kafka Downloads]$ cd /opt
[kafka-admin@kafka opt]$ sudo ln -s kafka_2.9.2-0.8.2.0 kafka
[kafka-admin@kafka opt]$ ls
kafka kafka_2.9.2-0.8.2.0
[kafka-admin@kafka opt]$ sudo chmod kafka-admin:kafka-admin -R kafka
Explanation:
ls
: Lists the files in the Downloads directory.sudo rpm -ivh jdk-7u75-linux-x64.rpm
: Installs the Java JDK from the RPM package. (Note: You might need to adjust the JDK version and installation command based on your system.)sudo tar -xzf kafka_2.9.2-0.8.2.0.tgz -C /opt
: Extracts the Kafka archive to the/opt
directory.cd /opt
: Changes the current directory to/opt
.sudo ln -s kafka_2.9.2-0.8.2.0 kafka
: Creates a symbolic link namedkafka
pointing to the Kafka installation directory. This simplifies future access.ls
: Lists the files in the/opt
directory to verify the symbolic link.sudo chmod kafka-admin:kafka-admin -R kafka
: Changes the ownership of the Kafka directory to thekafka-admin
user and group. This is important for ensuring the Kafka process has the necessary permissions. Adjustkafka-admin:kafka-admin
to the appropriate user and group on your system. The-R
option applies the changes recursively to all files and directories within the Kafka installation.
Now we are ready to start all the services required.
Navigate to the Kafka installation directory.
[kafka-admin@kafka opt]$ cd kafka
[kafka-admin@kafka kafka]$ ls
bin config libs LICENSE logs NOTICE
Start the Zookeeper service:
[kafka-admin@kafka kafka]$ bin/zookeeper-server-start.sh config/zookeeper.properties
This will start a Zookeeper instance on localhost
and port 2181
. You can customize this configuration by modifying the config/zookeeper.properties
file.
Important: If you intend to run Zookeeper on a separate machine, you must update the config/server.properties
file to point the Kafka server to the correct Zookeeper address. By default, it is configured to localhost:2181
.
Next, start the Kafka broker service:
[kafka-admin@kafka kafka]$ bin/kafka-server-start.sh config/server.properties
Understanding Broker Configuration:
For a single-node setup, the default config/server.properties
file is usually sufficient. However, if you plan to run multiple brokers on the same machine (for testing purposes), you’ll need to create multiple copies of the server.properties
file and modify the following parameters in each copy:
broker.id
: A unique numerical identifier for each broker in the cluster. This ID must be different for each broker.port
: The port on which the broker will listen for connections. Choose a different port for each broker to avoid conflicts.log.dir
: The directory where the broker will store its log data. Each broker needs its own dedicated log directory.
Example:
config/server-1.properties:
broker.id=1
port=9093
log.dir=/tmp/kafka-logs-1
config/server-2.properties:
broker.id=2
port=9094
log.dir=/tmp/kafka-logs-2
For each additional broker, start the server using the corresponding configuration file:
bin/kafka-server-start.sh config/server-1.properties
bin/kafka-server-start.sh config/server-2.properties
Now that the server has started, let’s assume we only have one broker running for simplicity.
Creating Topics
A topic is a category or feed name to which messages are published. To create a topic, use the kafka-topics.sh
script:
[kafka-admin@kafka kafka]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 \
--replication-factor 1 --partitions 1 --topic test
Explanation of parameters:
--create
: Specifies that we want to create a new topic.--zookeeper localhost:2181
: Specifies the Zookeeper connection string. This tells Kafka where to find Zookeeper.--replication-factor 1
: Specifies the number of replicas for each partition. In a single-node setup, the replication factor should be 1. For production environments, a replication factor of 2 or 3 is recommended for fault tolerance.--partitions 1
: Specifies the number of partitions for the topic. Partitions allow you to parallelize message processing across multiple consumers.--topic test
: Specifies the name of the topic to create. In this case, we’re creating a topic namedtest
.
To verify that the topic has been created, use the --list
option:
[kafka-admin@kafka kafka]$ bin/kafka-topics.sh --list --zookeeper localhost:2181
test
[kafka-admin@kafka kafka]$
This confirms that the test
topic has been successfully created. We are now ready to send and receive messages.
Send some message
Open a new terminal window. We’ll use the kafka-console-producer.sh
script to send messages to the test
topic.
[kafka-admin@kafka kafka]$ bin/kafka-console-producer.sh --broker-list localhost:9092 \
--topic test
Explanation:
--broker-list localhost:9092
: Specifies the list of Kafka brokers to connect to. In this case, we’re connecting to the broker running onlocalhost
and port9092
.--topic test
: Specifies the topic to which we want to send messages.
Now, type your messages and press Enter after each message. Each line you type will be sent as a separate message to the Kafka topic.
This is a message
This is a message2
Start a Consumer
Open another new terminal window. We’ll use the kafka-console-consumer.sh
script to consume messages from the test
topic.
[kafka-admin@kafka kafka]$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 \
--topic test --from-beginning
Explanation:
--zookeeper localhost:2181
: Specifies the Zookeeper connection string.--topic test
: Specifies the topic from which we want to consume messages.--from-beginning
: This important option tells the consumer to read all messages from the beginning of the topic, even if they were published before the consumer started. Without this option, the consumer would only receive messages published after it started.
You should see the messages you typed in the producer terminal:
This is a message
This is a message2
Congratulations! You have successfully set up a single-node Kafka cluster and produced and consumed messages.
Next Steps
This guide provides a basic introduction to setting up a single-node Kafka cluster. To further explore Kafka, consider the following:
- Multi-broker setup: Learn how to configure and manage a Kafka cluster with multiple brokers for increased scalability and fault tolerance.
- Kafka Connect: Explore Kafka Connect for integrating Kafka with other data sources and sinks.
- Kafka Streams: Dive into Kafka Streams for building real-time stream processing applications.
- Security: Implement security measures such as authentication and authorization to protect your Kafka cluster.
- Monitoring: Set up monitoring tools to track the health and performance of your Kafka cluster.