-
Notifications
You must be signed in to change notification settings - Fork 854
Bug fix/agency restart after compaction and holes in log #3413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix/agency restart after compaction and holes in log #3413
Conversation
… entries to followers
… entries to followers
arangod/Agency/State.cpp
Outdated
@@ -865,6 +865,7 @@ bool State::loadRemaining() { | |||
|
|||
TRI_ASSERT(_log.empty()); // was cleared in loadCompacted | |||
std::string clientId; | |||
index_t lastIndex = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastIndex should probably be initialized with _cur, which has been set to the index of the latest snapshot in loadCompacted
just before loadRemaining
was called. I think 0 is wrong if there is a persisted snapshot.
arangod/Agency/State.cpp
Outdated
index_t index(basics::StringUtils::uint64( | ||
ii.get(StaticStrings::KeyString).copyString())); | ||
term_t term(ii.get("term").getNumber<uint64_t>()); | ||
if (lastIndex > 0 && index-lastIndex > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very robust. Why lastIndex > 0? How do we know that index >= lastIndex and there is no overflow? If index < lastIndex, we must not append at all. If index == lastIndex, we must append if and only if _log is still empty. If index == lastIndex+1 we must append normally and move lastIndex one up. If index > lastindex+1, then we must insert empty entries until index == lastIndex+1.
Note that in all the formulae I typed here there can never be an overflow.
* State fixes holes in RAFT index range * Avoid application of entries older than compaction index _cur and guard for unsigned overflow
* State fixes holes in RAFT index range * Avoid application of entries older than compaction index _cur and guard for unsigned overflow
* Revert "Return the result of the inquiry (#3472)" This reverts commit 1dc1a98. * Revert "cherry picking of bug-fix/agency-restart-after-compaction-and-holes-i… (#3423)" This reverts commit 324184d. * State has to keep log for removeConflicts and acoording log all the way (#3249) * Bug fix/sort out agency locks (#3306) New locking concept in Agency. Ensure empty heartbeats can be sent, answered and processed without long locks. Adjust logging. Fix compaction bugs. * Bug fix/agency compactor deadlock (#3335) * Fix a deadlock between Agent thread and compactor thread. * Improve comments in header. * Organise clean shutdown of agency threads. * Bug fix/agency leader timeouts (#3373) * Send out empty heartbeats regardless of non-empty AppendEntriesRPC. * Also improve logging: Note if a log in the empty heartbeat sending takes > 0.01 s. Clearly mark places where a leader resigns in logging. Log if no empty heartbeat is sent out. * Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses. * Add debug logging for _lastAcked and challengeLeadership. * Remove some unused code. Do not count ourselves in challengeLeadership. * Removal of entire activation/deactivation mechanisms in agency * TRI_microtime up to c++11 * added term to response to sendAppendEntries. * Bug fix/agency restart after compaction and holes in log (#3413) * State fixes holes in RAFT index range * Avoid application of entries older than compaction index _cur and guard for unsigned overflow * Return the result of the inquiry (#3465) * Add a hidden AGENCY_DUMP for agency emergency recovery. (#3474) * Port more changes from devel to 3.2. This could not be cherry-picked, since the changes concerning the agency were in squash commits which touch a lot of different things. * Make members private in AgentConfiguration * Log update of agency configuration. * Do not deal with active in gossip phase. * Take out some debugging output.
Fixing holes in State.
When an Agent takes over lead, we need to make sure that we do not start with index 0 with sendAppendEntries but with first uncompacted one.