Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications.
Why would a traditional messaging system not be a good fit for log processing?
- Typical enterprise messaging systems tend to offer a rich set of delivery guarantees. Such extensive guarantees are an overkill for log processing scenarios.
- Secondly enterprise class messaging systems typically do not focus on maximizing throughput as the primary design constraint.
- Third, enterprise systems are weak in terms of support for distribution of messages. You cannot configure it to partition and store messages on different machines.
- Finally, their performance degrades when messages are allowed to remain in the queue for extended periods of time.
Looks like these were some of the reasons why they were motivated to build a different messaging system with a primary design goal of enabling very high throughput of messages.
Internals & Architecture
In Kafka a Topic is the container with which messages are associated. Producers send/publish messages to a topic and consumers consume these messages by pulling from topics they subscribe to. The published messages are stored on servers referred to as brokers.
Since Kafka is designed to support distribution of messages on different machines a typical cluster consists multiple brokers with each broker storing only a portion of all the messages from a topic.
A unique characteristic feature of Kafka is that is support a â€œpullâ€ model for message consumption. Having this feature enables the consuming application to control the rate at which it wants to consume the messages vis-Ã -vis the typical â€œpushâ€ model that could flood the consumer.
Kafka follows a simple storage layout with each partition of a topic corresponding to a logical log. Physically the log is further subdivided into segment files of 1 GB each.
When a producer publishes a message the payload is appended to the current segment. The messages are flushed to the disk either after a specific number of messages have accumulated or a certain time period has elapsed. The consumer sees the message only after the message has been flushed to the disk.
Speeding up Data Transfers, Achieving Higher Throughput
Data transfer is accelerated by –
1) Enabling the producer to send a batch of messages in one go. Once a message is sent the producer does not wait for an acknowledgment from the broker. The idea is to send messages as fast as a broker can handle. This significantly increases the publisher throughput.
2) Enabling the consumer to retrive messages in batches. Alongside this a very efficient storage format coupled with a stateless broker results in very high consumption throughput.
3) No caching of messages in the Kafka layer. They rely on the underlying file system cache. This is a bit surprising. Not sure how they decided on going with this approach.
4) Network access for consumers is optimized because of the fact that the Linux sendfile api is used. The sendfile api operates in the kernel space and hence is quicker. Nice!
The net result of these tricks is that (in an experiment involving messages of 200 bytes each) on an average, Kafka can publish messages at the rate of 50,000 and 400,000 messages per second for batch size of 1 and 50, respectively. Orders of magnitude higher than what RabbitMQ and ActiveMQ demonstrate in the experiments.
Distribution & Coordination
Kafka supports an abstraction called consumer groups. Each message when delivered to a consumer group is processed only by one consumer within that group.
In Kafka the smallest unit of parallelism is the partition of a topic. This implies that all messages that are belong to a particular partition of a topic will be consumed by a consumer in a consumer group.
Coordination is decentralized without a permanent â€œmasterâ€ node. Specifically, Zookeeper is employed to facilitate coordination between consumers and brokers.
Kafka only guarantees at-least-once delivery. In most cases a message is delivered exactly once to its consumers. It also guarantees that messages from a single partition are delivered to a consumer in order. Across partitions no such guarantee is made.
Previewing from http://research.microsoft.com