-
Notifications
You must be signed in to change notification settings - Fork 855
[APM-84] Support for external result sets #16421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dified executor to handle inputs incrementally intead of matrix
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
… for optimizing, removed already inserted include
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
This is WIP, as the feature does not nothing so far except creating a temp directory and cleaning it. Next step is to wire the storage engine.
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…-84-rocksdb-store
…angodb into feature/APM-84-rocksdb-store
…-84-rocksdb-store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now looked at most of the new code and left a few comments. But I am now stopping for the day. I will read on tomorrow morning, then looking at the AQL code, the JS tests and the enterprise part. Nothing serious found, but a few comments to potentially address.
size_t newCapacity = std::max(_rowIndexes.capacity() * 2, numDataRows); | ||
while (newCapacity < _rowIndexes.size() + numDataRows) { | ||
newCapacity *= 2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why like this, why not directly take std::max(_rowIndexes.capacity() * 2, _rowIndexes.size() + numDataRos)
and get rid of the loop and the subsequent reserve
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in the latest commit to use this suggestion
p1 += sizeof(uint64_t); | ||
p2 += sizeof(uint64_t); | ||
} else if (hasPrefixId1) { | ||
diffInId = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not advance p1 by sizeof(uint64_t) in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we arrive here, the lhs and rhs differ in size, so we wouldn't reach the while below to catch the actual key value and the order in which to compare
while (static_cast<size_t>(p1 - lhs.data()) < lhs.size() &&
static_cast<size_t>(p2 - rhs.data()) < rhs.size())
meaning the string with lower byte size will always score lower, so we don't have to advance the pointer for the rest of the data and just return the result, as we already have the response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, understood, but on a second look: If lhs or rhs have a size larger than 8 but smaller than 16, then we reach with the memcmp over the end of the buffer. Should the computation of hasPrefixId1
and hasPrefixId2
not ask if there are 8 more bytes from p1 (or p2 resp.) on?
} else if (hasPrefixId1) { | ||
diffInId = 1; | ||
} else if (hasPrefixId2) { | ||
diffInId = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not advance p2 by sizeof(uint64_t) in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we arrive here, the lhs and rhs differ in size, so we wouldn't reach the while below to catch the actual key value and the order in which to compare
while (static_cast<size_t>(p1 - lhs.data()) < lhs.size() &&
static_cast<size_t>(p2 - rhs.data()) < rhs.size())
meaning the string with lower byte size will always score lower, so we don't have to advance the pointer for the rest of the data and just return the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no further comments and have gone through all files now.
Co-authored-by: Max Neunhöffer <max@arangodb.com>
Co-authored-by: Max Neunhöffer <max@arangodb.com>
Co-authored-by: Max Neunhöffer <max@arangodb.com>
…-84-rocksdb-store
…angodb into feature/APM-84-rocksdb-store
…ect size so don't read out of scope in the comparator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Scope & Purpose
Enterprise companion PR: https://github.com/arangodb/enterprise/pull/974
Whenever the amount of entries for SORT or COLLECT AGGREGATE in queries reach a specific threshold in memory usage, the entries are stored on disk temporarily using rocksdb as a means to it, because then also the entries can be stored when inserted on disk using a customized comparator.
Checklist
No backport required.
Related Information