https://chatgpt.
com/share/6747f489-211c-8009-bfa1-bc6dc51f3bfb
What is Log Compaction in Kafka
Log compaction in Apache Kafka is a mechanism to retain only the most recent value for
each key in a topic, while deleting older values. This feature is particularly useful for
topics where the latest state of a key is sufficient, such as in scenarios involving
databases, caches, or configurations.
How Log Compaction Works:
1. Key-Value Messages: Kafka topics configured for log compaction store
messages as key-value pairs. A key is required for log compaction to work.
2. Retention Policy: Instead of retaining messages based solely on time or size (like
the default retention policy), log compaction ensures that:
o The most recent message for each key is always retained.
o Older messages with the same key can be deleted, except the latest
message.
3. Special Case: Tombstone Records:
A message with a key and a null value (known as a tombstone) marks the
o
deletion of that key.
o Tombstone records are eventually removed during compaction, once all
older records with that key have been purged.
4. Compaction Process:
o Compaction is performed in the background by Kafka’s log cleaner
thread.
o The cleaner scans log segments and removes older messages with
duplicate keys, keeping only the latest version.
o This ensures the compacted topic maintains the latest state of each key.
Benefits of Log Compaction:
1. State Preservation: Provides a compact view of the latest state of each key.
2. Efficient Storage: Reduces storage requirements by deleting redundant or
outdated messages.
3. System Recovery: Allows consumers to rebuild state from compacted topics
efficiently without needing a full historical log.
Use Cases for Log Compaction:
1. Database Change Streams: Storing change events for a database, where only the
latest update for each primary key is relevant.
2. Configuration Management: Topics holding configuration updates for
distributed systems.
3. Cache Invalidation: Topics used to update caches with only the latest value.
Configuring Log Compaction:
To enable log compaction for a topic, set the cleanup.policy configuration to compact.
This can be done either during topic creation or through an update:
bash
Copy code
bin/kafka-topics.sh --create --topic my-topic --partitions 3 --
replication-factor 2 --config cleanup.policy=compact
You can also have a mixed retention policy (both compact and delete) by setting
cleanup.policy=compact,delete.
Log compaction is an advanced feature in Kafka that enables efficient data storage and
ensures that topics reflect the latest state of data keys.