10000 [APM-84] Support for external result sets by cpjulia · Pull Request #16421 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

[APM-84] Support for external result sets #16421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 104 commits into from
Jul 17, 2022
Merged

Conversation

cpjulia
Copy link
Contributor
@cpjulia cpjulia commented Jun 20, 2022

Scope & Purpose

Enterprise companion PR: https://github.com/arangodb/enterprise/pull/974

Whenever the amount of entries for SORT or COLLECT AGGREGATE in queries reach a specific threshold in memory usage, the entries are stored on disk temporarily using rocksdb as a means to it, because then also the entries can be stored when inserted on disk using a customized comparator.

  • 💩 Bugfix
  • 🍕 New feature
  • 🔥 Performance improvement
  • 🔨 Refactoring/simplification

Checklist

  • Tests
    • Regression tests
    • C++ Unit tests
    • integration tests
    • resilience tests
  • 📖 CHANGELOG entry made
  • 📚 documentation written (release notes, API changes, ...)
  • Backports
    No backport required.

Related Information

cpjulia added 8 commits June 17, 2022 17:21
…dified executor to handle inputs incrementally intead of matrix
@cpjulia cpjulia added this to the devel milestone Jun 23, 2022
cpjulia and others added 21 commits June 23, 2022 17:30
… for optimizing, removed already inserted include
This is WIP, as the feature does not nothing so far except creating a
temp directory and cleaning it. Next step is to wire the storage engine.
@jsteemann jsteemann marked this pull request as ready for review July 14, 2022 14:53
@jsteemann jsteemann requested a review from a team as a code owner July 14, 2022 14:53
Copy link
Member
@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now looked at most of the new code and left a few comments. But I am now stopping for the day. I will read on tomorrow morning, then looking at the AQL code, the JS tests and the enterprise part. Nothing serious found, but a few comments to potentially address.

D7AF
Comment on lines 106 to 109
size_t newCapacity = std::max(_rowIndexes.capacity() * 2, numDataRows);
while (newCapacity < _rowIndexes.size() + numDataRows) {
newCapacity *= 2;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why like this, why not directly take std::max(_rowIndexes.capacity() * 2, _rowIndexes.size() + numDataRos) and get rid of the loop and the subsequent reserve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in the latest commit to use this suggestion

p1 += sizeof(uint64_t);
p2 += sizeof(uint64_t);
} else if (hasPrefixId1) {
diffInId = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not advance p1 by sizeof(uint64_t) in this case?

Copy link
Contributor Author
@cpjulia cpjulia Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we arrive here, the lhs and rhs differ in size, so we wouldn't reach the while below to catch the actual key value and the order in which to compare

 while (static_cast<size_t>(p1 - lhs.data()) < lhs.size() &&
           static_cast<size_t>(p2 - rhs.data()) < rhs.size())

meaning the string with lower byte size will always score lower, so we don't have to advance the pointer for the rest of the data and just return the result, as we already have the response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, understood, but on a second look: If lhs or rhs have a size larger than 8 but smaller than 16, then we reach with the memcmp over the end of the buffer. Should the computation of hasPrefixId1 and hasPrefixId2 not ask if there are 8 more bytes from p1 (or p2 resp.) on?

} else if (hasPrefixId1) {
diffInId = 1;
} else if (hasPrefixId2) {
diffInId = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not advance p2 by sizeof(uint64_t) in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we arrive here, the lhs and rhs differ in size, so we wouldn't reach the while below to catch the actual key value and the order in which to compare

 while (static_cast<size_t>(p1 - lhs.data()) < lhs.size() &&
           static_cast<size_t>(p2 - rhs.data()) < rhs.size())

meaning the string with lower byte size will always score lower, so we don't have to advance the pointer for the rest of the data and just return the result

Copy link
Member
@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no further comments and have gone through all files now.

10000

Copy link
Member
@neunhoef neunhoef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neunhoef neunhoef merged commit b43fbec into devel Jul 17, 2022
@neunhoef neunhoef deleted the feature/APM-84-rocksdb-store branch July 17, 2022 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0