Apache Kafka guarantees the following with respect to Reliability
- Kafka does not guarantee ordering of messages between partitions. It does provide ordering within a partition.
- Order of messages in a partition will be ordered. If message B was written after message A, using the same producer and to the same partition, then Kafka guarantees that the offset of message B will be higher than the offset of message A. The consumers will also read message B after message A
- Produced messages are considered committed only when they are written to the partition on all its in-sync-replicas. Producers can choose to receive acknowledgments of sent messages when the message was fully committed, when it was written to the leader, or when it was sent over the network.
- In-sync-replicas are simply all the replicas of a partition that are “in-sync” with the leader. The definition of “in-sync” depends on the topic configuration, but by default, it means that a replica is or has been fully caught up with the leader in the last 10 seconds.
- Messages that are committed will not be lost as long as at least one replica remains alive.
- Consumers can only read messages that are committed.
- Each Kafka topic is broken down into partitions.
- A Partition is stored on a single disk. A Partition can never span multiple disk.
- Kafka guarantees ordering of events inside a partition.
- A Partition can have multiple replicas.
- Replication is the process of having multiple copies of the data for the sole purpose of availability in case one of the brokers goes down and is unavailable to serve the requests.
- In case of a partition having multiple replicas, one of the partition is a designated leader and rest are followers.
- All the read and write requests for a topic is handled by leader partition.
- Logs on the followers are identical to the leader’s log. All of them have same offsets and messages in the same order once the log is replicate successfully to followers.
- Follower replicas just need to stay in sync with the leader and replicate all the events on time.
- Kafka dynamically maintains a set of in sync replicas (ISR) that are caught up to the leader.
- This ISR set is persisted to the Zookeeper when it changes.
- Only members of ISR are eligible for election as a leader when the leader becomes unavailable.
- A replica is considered to be in-sync, if it is the leader for a partition or if it is a follower that
- Has an active session with zookeeper
- Fetched messages from the leader the configured interval
- If a replica looses connection to Zookeeper , stops fetching news messages or fails behind and cant catchup within the configured interval then the replica is considered to be out of sync.
Catching up to the Leader
Let us consider a scenario where we have 1 partition with a replication factor of 3 and below configuration for replication
replica.lag.max.messages= 4 replica.lag.time.max.ms=500 ms
replica.lag.max.=4 means that as long as a follower is behind the leader by not more than 3 messages, it will not be removed from the ISR.
replica.lag.time.max.ms = 500 ms means that as long as the followers send a fetch request to the leader every 500 ms or sooner, they will not be marked dead and will not be removed from the ISR.
Initially all the followers are in sync with leader but when producer sent next message “D” to the leader , at the same time follower 3 broker went to GC pause and the logs would look like the second block.
As broker 3 is in the ISR latest message is not considered to be committed until either broker 3 is removed from the ISR or catches up to the leader’s log end offset.
Note that since broker 3 is less than
replica.lag.max.messages=4 messages behind the leader, it does not qualify to be removed from the ISR. In this case, it means follower broker 3 needs to catch up to offset 3 and if it did, then it has fully “caught up” to the leader’s log.
If broker 3 gets out of GC pause and fetches a message within the lag time then it has caught up to the leader and does not qualify to be removed from the ISR
A replica is considered to be out-of-sync or lagging when it falls “sufficiently” behind the leader of the partition.
An in-sync replica that is slightly behind can slow down producers and consumers— since they wait for all the in-sync replicas to get the message before it is committed.
There are 3 configuration parameters in the broker that change Kafka’s behavior regarding reliable message storage.
- Replication Factor
- The topic level configuration is replication.factor
- At the broker level, we control the default.replication.factor for automatically created topics.
- A replication factor of N allows you to lose N-1 brokers while still being able to read and write data to the topic reliably.
- Unclean Leader Election
- This configuration is only available at the broker level.
- The parameter name is unclean.leader.election.enable and by default it is set to true.
- When the leader for a partition is no longer available, one of the in-sync replicas will be chosen as the new leader. This leader election is “clean” in the sense that it guarantees no loss of committed data.
- There could be scenarios when the leader broken goes down and and no in-sync replica exists except for the leader that just became unavailable.
- In such scenarios if we don’t allow the out-of-sync replica to become the new leader, the partition will remain offline until we bring the old leader (and the last in-sync replica)back online.
- If we do allow the out-of-sync replica to become the new leader, we are going to lose all messages that were written to the old leader while that replica was out of sync and also cause some inconsistencies in consumers.
- Setting unclean.leader.election.enable to true means we allow out-of-sync replicas to become leaders .
- Minimum in-sync replicas
- Both the topic and the broker level configurations are called min.insyc.replicas.
- If you would like to be sure that committed data is written to more than one replica, you need to set the minimum number of in-sync replicas to a higher value.
- If a topic has three replicas and you set min.insync.replicas to 2, then you can only write toa partition in the topic if at least two out of the three replicas are in-sync.
- However, if two out of three replicas are not available, the brokers will no longer accept produce requests. Instead, producers thatattempt to send data will receive NotEnoughReplicasException
Producers can choose between three different acknowledgment modes:
- acks=0 means that a message is considered to be written successfully to Kafka if the producer managed to send it over the network.You can get amazing throughput and utilize most of your bandwidth, but you are guaranteed to lose some messages if you choose this route.
- acks=1 means that the leader will send either an acknowledgment or an error the moment it got the message and wrote it to the partition data file.You can lose data if the leader crashes and some messages that were successfully written to the leader and acknowledged were not replicated to the followers before the crash.
- acks=all means that the leader will wait until all in-sync replicas got the message before sending back an acknowledgment or an error.In conjunction with the min.insync.replica configuration on the broker, this lets you control how many replicas get the message before it is acknowledged.
Building a reliable system always have trade-offs in application complexity, performance, availability, or disk-space usage. This blog highlights some of the features and options to help make informed decisions while designing and understanding the trade-offs which will suit the use-case.
Thanks and please share your comments/feedback.