8000 Improve sync reliability by holgerd77 · Pull Request #133 · ethereumjs/ethereumjs-client · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Dec 10, 2020. It is now read-only.

Conversation

@holgerd77
Copy link
Member

This PR improves sync reliability by increasing the timeout for jobs in the Fetcher class from 5000 to 8000.

With the existing timeout peers where directly banned again - especially during startup and first initialization period, see:

INFO [06-11|17:27:36] Listener up transport=libp2p url=/p2p-circuit/ip4/127.0.0.1/tcp/50580/ws/ipfs/QmSjHzTiYBcBwjuYnWEuT4vTvWznsTKTQPjAC4tPBEXF1T
INFO [06-11|17:27:36] Listener up transport=libp2p url=/ip4/127.0.0.1/tcp/50580/ws/ipfs/QmSjHzTiYBcBwjuYnWEuT4vTvWznsTKTQPjAC4tPBEXF1T
INFO [06-11|17:27:36] Started eth service.
INFO [06-11|17:27:36] 0 peers connected (max: 25)
DEBUG [06-11|17:27:37] Peer connected: id=a24ac7c5 address=52.169.42.101:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:37] Found fast peer: id=a24ac7c5 address=52.169.42.101:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:37] Peer added: id=a24ac7c5 address=52.169.42.101:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:37] Peer connected: id=b6b28890 address=159.89.28.211:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:37] Found fast peer: id=b6b28890 address=159.89.28.211:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:37] Peer added: id=b6b28890 address=159.89.28.211:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:38] Peer connected: id=343149e4 address=52.3.158.184:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:38] Found fast peer: id=343149e4 address=52.3.158.184:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:38] Peer added: id=343149e4 address=52.3.158.184:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:38] New height: number=6648919 hash=f49c550d...
DEBUG [06-11|17:27:38] New height: number=6648919 hash=f49c550d...
Connections refilled.
INFO [06-11|17:27:41] 3 peers connected (max: 25)
ERROR [06-11|17:27:42] Error: Request timed out after 5000ms
    at Timeout._onTimeout (/ethereumjs-client/lib/net/protocol/boundprotocol.js:129:16)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7)
DEBUG [06-11|17:27:46] Syncing with peer: id=343149e4feefa15d882d9fe4ac7d88f885bd05ebb735e547f12e12080a9fa07c8014ca6fd7f373123488102fe5e34111f8509cf0b7de3f5b44339c9f25e87cb8 address=52.3.158.184:30303 transport=rlpx protocols=eth height=6648919
INFO [06-11|17:27:46] 3 peers connected (max: 25)
INFO [06-11|17:27:49] Imported blocks count=128 number=7937 hash=30665f5e... peers=3
Connections refilled.
DEBUG [06-11|17:27:51] Peer removed: id=343149e4 address=52.3.158.184:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:51] Peer banned: id=343149e4 address=52.3.158.184:30303 transport=rlpx protocols=eth
ERROR [06-11|17:27:51] Error: Request timed out after 5000ms
    at Timeout._onTimeout (/ethereumjs-client/lib/net/protocol/boundprotocol.js:129:16)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7)
DEBUG [06-11|17:27:51] Peer removed: id=a24ac7c5 address=52.169.42.101:30303 transport=rlpx protocols=eth
DEBUG [06-11|17:27:51] Peer banned: id=a24ac7c5 address=52.169.42.101:30303 transport=rlpx protocols=eth
ERROR [06-11|17:27:51] Error: Request timed out after 5000ms
    at Timeout._onTimeout (/ethereumjs-client/lib/net/protocol/boundprotocol.js:129:16)
    at listOnTimeout (internal/timers.js:531:17)
    at processTimers (internal/timers.js:475:7)

This led to banning of especially valuable bootstrap peers which made initial connection start often difficult respectively unsuccessful.

This effect is in particular noticeable on a rinkeby connection (--network rinkeby) which generally is the most suited network atm for a reliable connection.

If generally no suited peers are found on a network, this change won't help on the situation either.

I also added a simple repeated logger with a "Looking for suited peers..." message. This is psychologically helpful as some feedback loop when getting an initial connection is difficult and takes some time (often occuring on mainnet e.g.). This gives therefore a "Something is happening..." notion and gives a clear feedback (every 20 seconds) that the client has not crashed or stopped working.

This helps a lot to "bridge" the first 1-2 minutes, if it takes that long until a connection succeeds.

@holgerd77 holgerd77 requested review from evertonfraga and ryanio June 11, 2020 16:10
@holgerd77
Copy link
Member Author
8000 holgerd77 commented Jun 11, 2020

Also increased the protocol timeouts from 5000 to 8000. This is somewhat too often triggered and should be no problem to increase on such a level without side effects.

@holgerd77
Copy link
Member Author

Here is a screenshot with the "Looking for suited peers..." message:

Bildschirmfoto 2020-06-11 um 18 10 23

@holgerd77
Copy link
Member Author

I think with this PR we should be back to the situation on having a satisfactory and repeatedly reliable syncing experience on at least one network. 😄

@holgerd77
Copy link
Member Author
holgerd77 commented 8000 Jun 11, 2020

Ok, added one last thing which is a bootstrap retrigger after two periods with no peers. This to some extend solves the all-or-nothing problem being that there is often the situation that chain sync either starts relatively quickly or not at all.

Here is a screenshot on its behavior:

Bildschirmfoto 2020-06-11 um 23 53 36

@holgerd77
Copy link
Member Author

Hmm, the tests won't terminate here anymore. @ryanio could you have a short look here if I use async/await correctly?

@holgerd77
Copy link
Member Author

(or could it be by this setInterval loop?)

@ryanio
Copy link
Contributor
ryanio commented Jun 12, 2020

taking a look, I will add some logic to clearInterval on close.

The integration tests still aren't terminating. I found a few open()s without close() in the tests, but the ones I've found haven't fixed it so far.

@coveralls
Copy link
coveralls commented Jun 18, 2020

Coverage Status

Coverage decreased (-0.9%) to 91.781% when pulling 3672296 on improve-sync-reliability into a9eb167 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 91.886% when pulling 3672296 on improve-sync-reliability into a9eb167 on master.

@ryanio
Copy link
Contributor
ryanio commented Jun 18, 2020

i got it! i noticed the peerpool wasn't properly being closed in the integration tests so i was able to fix it with 3672296.

@ryanio ryanio requested a review from tomonari-t June 18, 2020 00:44
if (this.size === 0) {
this.noPeerPeriods += 1
if (this.noPeerPeriods >= 3) {
const promises = this.servers.map(async server => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanio Thank you
I don not know this context in detail yet.
Which do you think is better, Promise.allSettled or Promise.all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay if it looks good as it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanio let me know if I need to update anything here or you can alternatively also directly push yourself

@tomonari-t thanks for reviewing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomonari-t ah thanks. I think Promise.all looks good since Promise.allSettled seems like it's useful if we need to use the results of the outcomes. In these two cases I think we're just interested in waiting for them to finish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanio I got it! so LGTM 👍

@holgerd77 holgerd77 merged commit 9b32bd3 into master Jun 19, 2020
@holgerd77 holgerd77 deleted the improve-sync-reliability branch June 19, 2020 07:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

0