8000 CQ shared store: Delete from index on remove or roll over by lhoguin · Pull Request #13959 · rabbitmq/rabbitmq-server · GitHub
[go: up one dir, main page]

Skip to content

CQ shared store: Delete from index on remove or roll over #13959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2025

Conversation

lhoguin
Copy link
Contributor
@lhoguin lhoguin commented May 27, 2025

It is expensive to delete files because we must clean up the index and to get the messages in the file we have to scan it.

Instead of cleaning up the index on file delete this commit deletes from the index as soon as possible. There are two scenarios: messages that are removed from the current write file, and messages that are removed from other files. In the latter case, we
can just delete the index entry on remove. For messages in the current write file, we want to keep the entry in case fanout is used, because we don't want to write the fanout message multiple times if we can avoid it. So we keep track of removes in the current write file and do a cleanup of these entries on file roll over.

Compared to the previous implementation we will no longer increase the ref_count of messages that are not in the current write file, meaning we may do more writes in fanout scenarios. But at the same time the file delete operation is much cheaper.

@mkuratczyk

@gomoripeti
Copy link
Contributor

sorry to keep on nitpicking :)
a small note, that if I see it correctly compaction might become slower with this change (as during scan_file_for_valid_messages before this change there were ref-count=0 index entries which resulted in a previously_valid status while now not_found entries result in invalid status, and causing a scan_next_byte scanning mode)

during healthy operation when messages are mostly consumed in the same order as they are written, most of the rdq files can be deleted before any compaction happens on them. So I agree that its more important to make the delete_file more efficient than compact_file. Just wanted to mention the trade off in the performance of compaction.

@mkuratczyk
Copy link
Contributor

Thanks a lot, I'll keep an eye on this. I've just fixed (?) a bug that actually crashed the message store during compaction. :)
I've backported these changes to v4.1.x (since main is quite far from 4.1.x at this point, with Khepri on by default, etc).
Here's that temporary PR/branch: #13967

@lhoguin
Copy link
Contributor Author
lhoguin commented May 29, 2025

Right it's likely to make compaction itself slower. But it might be better overall when compaction+delete are both happening, because before delete was very slow.

There's other cases where it is likely slower (like mass deleting messages).

It was expensive to delete files because we had clean up
the index and to get the messages in the file we have to
scan it.

Instead of cleaning up the index on file delete this
commit deletes from the index as soon as possible.
There are two scenarios: messages that are removed
from the current write file, and messages that are
removed from other files. In the latter case, we
can just delete the index entry on remove. For messages
in the current write file, we want to keep the entry
in case fanout is used, because we don't want to write
the fanout message multiple times if we can avoid it.
So we keep track of removes in the current write file
and do a cleanup of these entries on file roll over.

Compared to the previous implementation we will no
longer increase the ref_count of messages that are
not in the current write file, meaning we may do more
writes in fanout scenarios. But at the same time the
file delete operation is much cheaper.

Additionally, we prioritise delete calls in rabbit_msg_store_gc.
Without that change, if the compaction was lagging behind,
we could have file deletion requests queued behind many compaction
requests, leading to many unnecessary compactions of files
that could already be deleted.

Co-authored-by: Michal Kuratczyk <michal.kuratczyk@broadcom.com>
@mkuratczyk mkuratczyk force-pushed the loic-index-delete-on-remove-or-rollover branch from cb60eb0 to 78d2e9b Compare May 30, 2025 11:43
@mkuratczyk
Copy link
Contributor

Test results (with this change applied on top of v4.1.x)
https://grafana.lionhead.rabbitmq.com/goto/rkYSf0BHg?orgId=1

Highlights:

  1. Disk space is reclaimed much faster (v4.1.x env, green, required 12 additional minutes to reclaim the space):
1
  1. Much lower memory usage throughout the tests
2

The throughput is comparable in most cases.

The downside is slower compaction, since we no longer skip over consumed messages. This should be less of a problem since we prioritised deletion requests now, avoiding unnecessary compactions (for files we can already delete). Deletions are generally much more common, so it feels like it's worth to have faster deletions and slower compactions than the other way round. On my machine we can still do a few compactions per second.

@mkuratczyk mkuratczyk marked this pull request as ready for review May 30, 2025 12:22
@mkuratczyk
Copy link
Contributor

@lhoguin This is now ready for review. While we discussed ideas for speeding up compaction, I feel like it's a too risky change to include here. With 4.1.1 coming soon, I'd rather include this change without further complications.

For future reference, the idea to speed up compaction was to:

  1. During recovery, zero the file in places that contain invalid messages
  2. Then, on compaction, we could change not_found -> invalid to not_found -> previously_valid in scan_and_vacuum_message_file/2

But any bug in zeroing means data corruption on potentially valid data so I don't want to rush it.

@lhoguin
Copy link
Contributor Author
lhoguin commented May 30, 2025

GH doesn't want me to approve my own PR, but I approve it nevertheless. I didn't double check the CI failures though.

@mkuratczyk mkuratczyk self-requested a review May 30, 2025 15:42
@mkuratczyk mkuratczyk merged commit 0278980 into main May 30, 2025
549 of 560 checks passed
@mkuratczyk mkuratczyk deleted the loic-index-delete-on-remove-or-rollover branch May 30, 2025 15:51
mergify bot pushed a commit that referenced this pull request May 30, 2025
It was expensive to delete files because we had clean up
the index and to get the messages in the file we have to
scan it.

Instead of cleaning up the index on file delete this
commit deletes from the index as soon as possible.
There are two scenarios: messages that are removed
from the current write file, and messages that are
removed from other files. In the latter case, we
can just delete the index entry on remove. For messages
in the current write file, we want to keep the entry
in case fanout is used, because we don't want to write
the fanout message multiple times if we can avoid it.
So we keep track of removes in the current write file
and do a cleanup of these entries on file roll over.

Compared to the previous implementation we will no
longer increase the ref_count of messages that are
not in the current write file, meaning we may do more
writes in fanout scenarios. But at the same time the
file delete operation is much cheaper.

Additionally, we prioritise delete calls in rabbit_msg_store_gc.
Without that change, if the compaction was lagging behind,
we could have file deletion requests queued behind many compaction
requests, leading to many unnecessary compactions of files
that could already be deleted.

Co-authored-by: Michal Kuratczyk <michal.kuratczyk@broadcom.com>
(cherry picked from commit 0278980)
michaelklishin added a commit that referenced this pull request May 30, 2025
CQ shared store: Delete from index on remove or roll over (backport #13959)
@michaelklishin michaelklishin added this to the 4.2.0 milestone Jun 30, 2025
LoisSotoLopez pushed a commit to cloudamqp/rabbitmq-server that referenced this pull request Jul 16, 2025
…3959)

It was expensive to delete files because we had clean up
the index and to get the messages in the file we have to
scan it.

Instead of cleaning up the index on file delete this
commit deletes from the index as soon as possible.
There are two scenarios: messages that are removed
from the current write file, and messages that are
removed from other files. In the latter case, we
can just delete the index entry on remove. For messages
in the current write file, we want to keep the entry
in case fanout is used, because we don't want to write
the fanout message multiple times if we can avoid it.
So we keep track of removes in the current write file
and do a cleanup of these entries on file roll over.

Compared to the previous implementation we will no
longer increase the ref_count of messages that are
not in the current write file, meaning we may do more
writes in fanout scenarios. But at the same time the
file delete operation is much cheaper.

Additionally, we prioritise delete calls in rabbit_msg_store_gc.
Without that change, if the compaction was lagging behind,
we could have file deletion requests queued behind many compaction
requests, leading to many unnecessary compactions of files
that could already be deleted.

Co-authored-by: Michal Kuratczyk <michal.kuratczyk@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0