8000 feat: persist sst meta size by jiacai2050 · Pull Request #1440 · apache/horaedb · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@jiacai2050
Copy link
Contributor
@jiacai2050 jiacai2050 commented Jan 15, 2024

Rationale

When read sst's metadata, it will use get method to read whole file, but there is no cache in get, so perf may degrade.

Detailed Changes

  • Add meta_size in SST's custom meta.
  • When read SST's metadata, use get_range when we have size.

The parquet writer is kind of massy now, will do some refactor after this PR.

Test Plan

CI

Copy link
Member
@ShiKaiWi ShiKaiWi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the review, I find it's weird to fix the object store cache problem by avoiding using get because the interface of the object store doesn't tell the caller get is worse than get_range.

Maybe the right way is to ensure get method to work as expected so that any caller won't be required to remember the file size to use get_range.

Comment on lines 80 to 82
meta_size = Some(size.parse::<usize>().with_context(|| InvalidSize {
size: size.to_string(),
})?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess snafu will auto clone for the string reference:

Suggested change
meta_size = Some(size.parse::<usize>().with_context(|| InvalidSize {
size: size.to_string(),
})?);
meta_size = Some(size.parse::<usize>().context(InvalidSize {
size,
}?);

@ShiKaiWi
Copy link
Member

@jiacai2050 Would you please choose the main branch as the base for merging?

@jiacai2050
Copy link
Contributor Author

The best way to fix is indeed to add cache for get method, but it's require more work, such as:

  • Request duplicate
  • Deal with IO failure

So here I choose to bypass get. In future We may use get_range with 0.. to replace get, but
this require object_store's support, so we may need to contribute there first.

@jiacai2050 jiacai2050 changed the base branch from dev to main January 23, 2024 12:15
@apache apache deleted a comment from CLAassistant Jan 23, 2024
Comment on lines +86 to +106
< 8000 /tr>
Some(size) => {
let all_range = 0..size;
self.store
.get_range(meta_path, all_range)
.await
.with_context(|| FetchFromStore {
file_path: meta_path.to_string(),
})?
}
None => self
.store
.get(meta_path)
.await
.with_context(|| FetchFromStore {
file_path: meta_path.to_string(),
})?
.bytes()
.await
.with_context(|| FetchAndDecodeSstMeta {
file_path: meta_path.to_string(),
})?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to explain why we are in favor of get_range?

@jiacai2050 jiacai2050 merged commit 6f9d426 into apache:main Jan 31, 2024
@jiacai2050 jiacai2050 deleted the feat-meta-size branch January 31, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0