8000 Bug fix/agency restart after compaction and holes in log by kvahed · Pull Request #3413 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Bug fix/agency restart after compaction and holes in log #3413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kvahed
Copy link
Contributor
@kvahed kvahed commented Oct 13, 2017

Fixing holes in State.
When an Agent takes over lead, we need to make sure that we do not start with index 0 with sendAppendEntries but with first uncompacted one.

@kvahed kvahed requested a review from neunhoef October 13, 2017 11:38
@@ -865,6 +865,7 @@ bool State::loadRemaining() {

TRI_ASSERT(_log.empty()); // was cleared in loadCompacted
std::string clientId;
index_t lastIndex = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastIndex should probably be initialized with _cur, which has been set to the index of the latest snapshot in loadCompacted just before loadRemaining was called. I think 0 is wrong if there is a persisted snapshot.

index_t index(basics::StringUtils::uint64(
ii.get(StaticStrings::KeyString).copyString()));
term_t term(ii.get("term").getNumber<uint64_t>());
if (lastIndex > 0 && index-lastIndex > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very robust. Why lastIndex > 0? How do we know that index >= lastIndex and there is no overflow? If index < lastIndex, we must not append at all. If index == lastIndex, we must append if and only if _log is still empty. If index == lastIndex+1 we must append normally and move lastIndex one up. If index > lastindex+1, then we must insert empty entries until index == lastIndex+1.
Note that in all the formulae I typed here there can never be an overflow.

@neunhoef neunhoef merged commit 46333a7 into devel Oct 13, 2017
neunhoef pushed a commit that referenced this pull request Oct 22, 2017
* State fixes holes in RAFT index range
* Avoid application of entries older than compaction index _cur and guard for unsigned overflow
neunhoef pushed a commit that referenced this pull request Oct 24, 2017
* State fixes holes in RAFT index range
* Avoid application of entries older than compaction index _cur and guard for unsigned overflow
jsteemann pushed a commit that referenced this pull request Oct 26, 2017
* Revert "Return the result of the inquiry (#3472)"

This reverts commit 1dc1a98.

* Revert "cherry picking of bug-fix/agency-restart-after-compaction-and-holes-i… (#3423)"

This reverts commit 324184d.

* State has to keep log for removeConflicts and acoording log all the way (#3249)

* Bug fix/sort out agency locks (#3306)

New locking concept in Agency. Ensure empty heartbeats can be sent, answered and processed without long locks. Adjust logging. Fix compaction bugs.

* Bug fix/agency compactor deadlock (#3335)

* Fix a deadlock between Agent thread and compactor thread.
* Improve comments in header.
* Organise clean shutdown of agency threads.

* Bug fix/agency leader timeouts (#3373)

* Send out empty heartbeats regardless of non-empty AppendEntriesRPC.
* Also improve logging:
  Note if a log in the empty heartbeat sending takes > 0.01 s.
  Clearly mark places where a leader resigns in logging.
  Log if no empty heartbeat is sent out.
* Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses.
* Add debug logging for _lastAcked and challengeLeadership.
* Remove some unused code. Do not count ourselves in challengeLeadership.
* Removal of entire activation/deactivation mechanisms in agency
* TRI_microtime up to c++11
* added term to response to sendAppendEntries.

* Bug fix/agency restart after compaction and holes in log (#3413)

* State fixes holes in RAFT index range
* Avoid application of entries older than compaction index _cur and guard for unsigned overflow

* Return the result of the inquiry (#3465)

* Add a hidden AGENCY_DUMP for agency emergency recovery. (#3474)

* Port more changes from devel to 3.2.

This could not be cherry-picked, since the changes concerning the agency
were in squash commits which touch a lot of different things.

* Make members private in AgentConfiguration

* Log update of agency configuration.

* Do not deal with active in gossip phase.

* Take out some debugging output.
@fceller fceller deleted the bug-fix/agency-restart-after-compaction-and-holes-in-log branch October 30, 2017 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0