Basic concepts
A messaging system such as Kafka enables you to send messages between processes, applications and servers. Applications connect to Kafka to send or get data. Strictly speaking, a Kafka ‘topic’ is a unit of storage in Kafka: data in Kafka is stored in topics. A topic is shared, hence you have a topic partition. Therefore, you can think of a Kafka topic-partition tuple as an append log. Simplistically and without considering partitions, you can think of a Kafka topic as a table in a relational database.
A Kafka ‘broker’ is a Kafka server. Many Kafka brokers create a Kafka cluster. A Kafka ‘producer’ is a program that writes data to a Kafka topic, whereas a Kafka ‘consumer’ is a program that reads data from Kafka. A ‘partition’ is the smaller part of a Kafka topic. A Kafka topic consists of one or more partitions – this mainly depends on the amount of data that you have to deal with. It is better for topics with huge amounts of data to have multiple partitions. Among other things, this enables multiple consumers to read from the same topic in parallel. Additionally, a topic can be split across multiple brokers at the partition level. Replication in Kafka happens at the partition level. Each record in a partition is separated from the other records using an offset. As Kafka knows nothing about the format of a record, the offset is really important information. Last, Zookeeper (https:// zookeeper.apache.org) is used for doing the housekeeping of Kafka by containing configuration, naming and synchronisation information. This means that you cannot run Kafka without Zookeeper.