Timed out attempting to find data in the correct node #1385

jianqiangsong · 2018-08-08T07:58:32Z

Expected behaviour

Actual behaviour

I'm seeing this behaviour on

OS: ubabntu 16.04 Lts
Redis: 3.2
PHP: 7.0
phpredis: 4.0

Steps to reproduce, backtrace or example script

we use rediscluster ;
so many problem : timed out attempting to find data in the correct node!

I've checked

There is no similar issue from other users
Issue isn't fixed in develop branch

The text was updated successfully, but these errors were encountered:

yatsukhnenko · 2018-08-08T08:11:25Z

@jianqiangsong could you add more details about issue?

jianqiangsong · 2018-08-08T09:43:39Z

We use cluster mode, but the client links redis, and the following reports are wrong.
timed out attempting to find data in the correct node!
How to solve the mistake ？

yatsukhnenko · 2018-08-08T11:54:32Z

I'm not sure I understand the problem :(
Could you provide test script?

jianqiangsong · 2018-08-09T02:47:50Z

[2018-08-08 14:36:26 *176089.7] ERROR zm_deactivate_swoole (ERROR 503): Fatal error: Uncaught RedisClusterException: Timed out attempting to find data in the correct node! in /var/www/pt_target/Newtrade/SwooleCommand/Controller/SwooleServerController.class.php:221
Stack trace:
#0 /var/www/pt_target/Newtrade/SwooleCommand/Controller/SwooleServerController.class.php(221): RedisCluster->hgetall('ws_client_servi...')

yatsukhnenko · 2018-08-13T09:34:09Z

@jianqiangsong #888 (comment)
Try to change timeouts

jianqiangsong · 2018-08-20T06:33:44Z

Is there another solution？

yatsukhnenko · 2018-08-20T10:55:03Z

@jianqiangsong could you show how does the key look like?

hhp12360 · 2018-10-23T16:13:43Z

we also have this question

env:

OS: centos 7.0
Redis: 3.2
PHP: 5.6
phpredis: 4.1

error info:

exception 'RedisClusterException' with message 'Timed out attempting to find data in the correct node!'

and it's random happen,Each lasting one second

michael-grunder · 2018-10-23T18:17:41Z

This isn't necessarily a bug. Have you tried monitoring your Redis servers to see if you're having latency spikes?

pagottoo · 2018-11-26T22:20:55Z

Same problem here...

the redis cluster has entered recovery mode and when it returned the error message began to appear on NewRelic.

However, another apps be connect and retrieves messages from the same redis-cluster, be one app is nodejs, and the other one is php7, without any error message.

The app that catch the exception is php5.4

We are running the apps (php, node) and redis cluster in docker (inside k8s)...

kjoe · 2018-12-14T23:38:07Z

We also have this issue on PHP 5.5, but in cluster_library.c is there any reason, that waitms calculated from connect timeout, rather than read timeout? (used in function cluster_send_command)

Another possible reason is that msstart does not reseted in every loop, as written in the documentation:

"The way RedisCluster handles user specified timeout values is that every time a command is sent to the cluster, we record the time at the start of the request and then again every time we have to re-issue the command to a different node (either because Redis cluster responded with MOVED/ASK or because we failed to communicate with a given node). Once we detect having been in the command loop for longer than our specified timeout, an error is raised."

sofire · 2019-11-13T08:09:12Z

I have the same error on PHP 7.1，and PHP-FPM exited on signal 11 (SIGSEGV)

PHP Redis Version => 4.3.0
Redis cluster Version 4.0.11

yatsukhnenko · 2019-11-13T08:40:35Z

@sofire it is recommended to use phpredis 5 with PHP 7

astar10239 · 2021-07-20T06:55:41Z

Any solution for this, i am also getting the same issue

saturnonfire · 2023-08-30T07:35:10Z

🔼 UP
Any updates on this issue?

ivan-nezhura · 2023-09-11T14:07:23Z

Have the same issue

tuomasva · 2024-01-02T08:58:15Z

We are also seeing this in one of our testing environments but don't really know where to look for more information on how to fix this. We haven't even really be able to track this to redis itself but very likely seems a somekind of problem with the redis instance at least (network, hdd, redis, menory).

While the problem is on, still around half of the incoming request are served normally.

EDIT: The only hint that we are seeing is that the "broken" redis-instance gets its memory degradted over time, we have been seeing this output:

127.0.0.1:6379> memory doctor
Sam, I detected a few issues in this Redis instance memory implants:

 * High total RSS: This instance has a memory fragmentation and RSS overhead greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc. Note: The currently used allocator is "jemalloc-5.2.1".

I'm here to keep you safe, Sam. I want to help you.

and

127.0.0.1:6379> memory doctor
Sam, I detected a few issues in this Redis instance memory implants:

 * Peak memory: In the past this instance used more than 150% the memory that is currently using. The allocator is normally not able to release memory after a peak, so you can expect to see a big fragmentation ratio, however this is actually harmless and is only due to the memory peak, and if the Redis instance Resident Set Size (RSS) is currently bigger than expected, the memory will be used as soon as you fill the Redis instance with more data. If the memory peak was only occasional and you want to try to reclaim memory, please try the MEMORY PURGE command, otherwise the only other option is to shutdown and restart the instance.

 * High total RSS: This instance has a memory fragmentation and RSS overhead greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc. Note: The currently used allocator is "jemalloc-5.2.1".

 * High process RSS overhead: This instance has non-allocator RSS memory overhead is greater than 1.1 (this means that the Resident Set Size of the Redis process is much larger than the RSS the allocator holds). This problem may be due to Lua scripts or Modules.

I'm here to keep you safe, Sam. I want to help you.

When a node timeout occurs, then phpredis will try to connect to another node, whose answer probably will be MOVED redirect. After this we need more time to accomplish the redirection, otherwise we get "Timed out attempting to find data in the correct node" error message. Fixes phpredis#795 phpredis#888 phpredis#1142 phpredis#1385 phpredis#1633 phpredis#1707 phpredis#1811 phpredis#2407

When a node timeout occurs, then phpredis will try to connect to another node, whose answer probably will be MOVED redirect. After this we need more time to accomplish the redirection, otherwise we get "Timed out attempting to find data in the correct node" error message. Fixes #795 #888 #1142 #1385 #1633 #1707 #1811 #2407

kjoe mentioned this issue Mar 17, 2024

Fix random connection timeouts with Redis Cluster #2459

Merged

michael-grunder closed this as completed in #2459 Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Timed out attempting to find data in the correct node #1385

Timed out attempting to find data in the correct node #1385

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Timed out attempting to find data in the correct node #1385

Timed out attempting to find data in the correct node #1385

Comments

Expected behaviour

Actual behaviour

I'm seeing this behaviour on

Steps to reproduce, backtrace or example script

I've checked

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

env:

error info:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!