E52A Introduce a delta-encoded bitmap by Kerollmops · Pull Request #5985 · meilisearch/meilisearch · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@Kerollmops
Copy link
Member
@Kerollmops Kerollmops commented Nov 10, 2025

This PR introduces a new method for reducing the database size. Our first measures found that the size of the word-docids database was reduced by half (yes, 50%) and the word-prefix-docids database was shrunk by 30%.

We will have to introduce a new DeCboRoaringBitmap that is capable of deserializing the CboRoaringBitmaps (including the raw number ones), deserializing DeRoaringBitmaps and encoding the numbers as DeRoaringBitmaps. For this, we add a header to the serialization format and also check the size of the serialized data; if it is less than a threshold, it consists of raw u32s.

To Do

  • Make sure the delta-encoding codec is named DeRoaringBitmap.
  • Create a DeCboRoaringBitmap that manages all encodings.
  • Introduce an alias to simplify the usage in milli's index.

@Kerollmops Kerollmops force-pushed the delta-encoding-bitmaps branch 2 times, most recently from 1a6b774 to faf430a Compare November 10, 2025 17:37
@Kerollmops Kerollmops force-pushed the delta-encoding-bitmaps branch from faf430a to a558dbd Compare November 24, 2025 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0