CQ shared store: Delete from index on remove or roll over #13959

lhoguin · 2025-05-27T11:42:56Z

It is expensive to delete files because we must clean up the index and to get the messages in the file we have to scan it.

Instead of cleaning up the index on file delete this commit deletes from the index as soon as possible. There are two scenarios: messages that are removed from the current write file, and messages that are removed from other files. In the latter case, we
can just delete the index entry on remove. For messages in the current write file, we want to keep the entry in case fanout is used, because we don't want to write the fanout message multiple times if we can avoid it. So we keep track of removes in the current write file and do a cleanup of these entries on file roll over.

Compared to the previous implementation we will no longer increase the ref_count of messages that are not in the current write file, meaning we may do more writes in fanout scenarios. But at the same time the file delete operation is much cheaper.

@mkuratczyk

gomoripeti · 2025-05-28T14:30:30Z

sorry to keep on nitpicking :)
a small note, that if I see it correctly compaction might become slower with this change (as during scan_file_for_valid_messages before this change there were ref-count=0 index entries which resulted in a previously_valid status while now not_found entries result in invalid status, and causing a scan_next_byte scanning mode)

during healthy operation when messages are mostly consumed in the same order as they are written, most of the rdq files can be deleted before any compaction happens on them. So I agree that its more important to make the delete_file more efficient than compact_file. Just wanted to mention the trade off in the performance of compaction.

mkuratczyk · 2025-05-28T14:36:06Z

Thanks a lot, I'll keep an eye on this. I've just fixed (?) a bug that actually crashed the message store during compaction. :)
I've backported these changes to v4.1.x (since main is quite far from 4.1.x at this point, with Khepri on by default, etc).
Here's that temporary PR/branch: #13967

lhoguin · 2025-05-29T05:53:36Z

Right it's likely to make compaction itself slower. But it might be better overall when compaction+delete are both happening, because before delete was very slow.

There's other cases where it is likely slower (like mass deleting messages).

deps/rabbit/src/rabbit_msg_store_gc.erl

It was expensive to delete files because we had clean up the index and to get the messages in the file we have to scan it. Instead of cleaning up the index on file delete this commit deletes from the index as soon as possible. There are two scenarios: messages that are removed from the current write file, and messages that are removed from other files. In the latter case, we can just delete the index entry on remove. For messages in the current write file, we want to keep the entry in case fanout is used, because we don't want to write the fanout message multiple times if we can avoid it. So we keep track of removes in the current write file and do a cleanup of these entries on file roll over. Compared to the previous implementation we will no longer increase the ref_count of messages that are not in the current write file, meaning we may do more writes in fanout scenarios. But at the same time the file delete operation is much cheaper. Additionally, we prioritise delete calls in rabbit_msg_store_gc. Without that change, if the compaction was lagging behind, we could have file deletion requests queued behind many compaction requests, leading to many unnecessary compactions of files that could already be deleted. Co-authored-by: Michal Kuratczyk <michal.kuratczyk@broadcom.com>

mkuratczyk · 2025-05-30T11:54:22Z

Test results (with this change applied on top of v4.1.x)
https://grafana.lionhead.rabbitmq.com/goto/rkYSf0BHg?orgId=1

Highlights:

Disk space is reclaimed much faster (v4.1.x env, green, required 12 additional minutes to reclaim the space):

Much lower memory usage throughout the tests

The throughput is comparable in most cases.

The downside is slower compaction, since we no longer skip over consumed messages. This should be less of a problem since we prioritised deletion requests now, avoiding unnecessary compactions (for files we can already delete). Deletions are generally much more common, so it feels like it's worth to have faster deletions and slower compactions than the other way round. On my machine we can still do a few compactions per second.

mkuratczyk · 2025-05-30T12:30:28Z

@lhoguin This is now ready for review. While we discussed ideas for speeding up compaction, I feel like it's a too risky change to include here. With 4.1.1 coming soon, I'd rather include this change without further complications.

For future reference, the idea to speed up compaction was to:

During recovery, zero the file in places that contain invalid messages
Then, on compaction, we could change not_found -> invalid to not_found -> previously_valid in scan_and_vacuum_message_file/2

But any bug in zeroing means data corruption on potentially valid data so I don't want to rush it.

lhoguin · 2025-05-30T12:52:52Z

GH doesn't want me to approve my own PR, but I approve it nevertheless. I didn't double check the CI failures though.

It was expensive to delete files because we had clean up the index and to get the messages in the file we have to scan it. Instead of cleaning up the index on file delete this commit deletes from the index as soon as possible. There are two scenarios: messages that are removed from the current write file, and messages that are removed from other files. In the latter case, we can just delete the index entry on remove. For messages in the current write file, we want to keep the entry in case fanout is used, because we don't want to write the fanout message multiple times if we can avoid it. So we keep track of removes in the current write file and do a cleanup of these entries on file roll over. Compared to the previous implementation we will no longer increase the ref_count of messages that are not in the current write file, meaning we may do more writes in fanout scenarios. But at the same time the file delete operation is much cheaper. Additionally, we prioritise delete calls in rabbit_msg_store_gc. Without that change, if the compaction was lagging behind, we could have file deletion requests queued behind many compaction requests, leading to many unnecessary compactions of files that could already be deleted. Co-authored-by: Michal Kuratczyk <michal.kuratczyk@broadcom.com> (cherry picked from commit 0278980)

CQ shared store: Delete from index on remove or roll over (backport #13959)

…3959) It was expensive to delete files because we had clean up the index and to get the messages in the file we have to scan it. Instead of cleaning up the index on file delete this commit deletes from the index as soon as possible. There are two scenarios: messages that are removed from the current write file, and messages that are removed from other files. In the latter case, we can just delete the index entry on remove. For messages in the current write file, we want to keep the entry in case fanout is used, because we don't want to write the fanout message multiple times if we can avoid it. So we keep track of removes in the current write file and do a cleanup of these entries on file roll over. Compared to the previous implementation we will no longer increase the ref_count of messages that are not in the current write file, meaning we may do more writes in fanout scenarios. But at the same time the file delete operation is much cheaper. Additionally, we prioritise delete calls in rabbit_msg_store_gc. Without that change, if the compaction was lagging behind, we could have file deletion requests queued behind many compaction requests, leading to many unnecessary compactions of files that could already be deleted. Co-authored-by: Michal Kuratczyk <michal.kuratczyk@broadcom.com>

This was referenced May 27, 2025

CQ message store: Don't scan for valid messages when deleting #13951

Closed

DO NOT MERGE: v41x msg store index delete on remove < 8000 span class="color-fg-muted text-normal" >#13967

Closed

lhoguin commented May 29, 2025

View reviewed changes

deps/rabbit/src/rabbit_msg_store_gc.erl Show re 8000 solved Hide resolved

mkuratczyk force-pushed the loic-index-delete-on-remove-or-rollover branch from cb60eb0 to 78d2e9b Compare May 30, 2025 11:43

mkuratczyk marked this pull request as ready for review May 30, 2025 12:22

mkuratczyk added the backport-v4.1.x label May 30, 2025

mkuratczyk self-requested a review May 30, 2025 15:42

mkuratczyk approved these changes May 30, 2025

View reviewed changes

mkuratczyk merged commit 0278980 into main May 30, 2025
549 of 560 checks passed

mkuratczyk deleted the loic-index-delete-on-remove-or-rollover branch May 30, 2025 15:51

mergify bot mentioned this pull request May 30, 2025

CQ shared store: Delete from index on remove or roll over (backport #13959) #13987

Merged

michaelklishin added a commit that referenced this pull request May 30, 2025

Merge pull request #13987 from rabbitmq/mergify/bp/v4.1.x/pr-13959

ee6ffd9

CQ shared store: Delete from index on remove or roll over (backport #13959)

michaelklishin added this to the 4.2.0 milestone Jun 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CQ shared store: Delete from index on remove or roll over #13959

CQ shared store: Delete from index on remove or roll over #13959

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CQ shared store: Delete from index on remove or roll over #13959

CQ shared store: Delete from index on remove or roll over #13959

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!