Scenario:
- Redis server has a network anomaly (link down, no response, no reply packet at all).
- The client tries to write data larger than the TCP send buffer size.
Symptom:
The Jedis client thread blocks on SocketOutputStream.socketWrite0() (native write method) for about 900 seconds (full TCP retransmission timeout), instead of failing fast or timing out as expected.
Root cause I’ve identified:
When the network is dead but the socket is not closed:
The TCP send buffer becomes full.
The blocking Java Socket write() will block indefinitely until TCP gives up retransmitting.
Jedis does not enforce a timeout on the write operation, only on read.
Question:
Has anyone faced this issue? How did you solve or mitigate this?
- Is there any configuration in Jedis to set a timeout on write?
- Is there a recommended way to prevent the thread from hanging for 15 minutes in this scenario?
- Is switching to a non-blocking/NIO client (like Netty-based clients) the only reliable solution?
Thanks a lot for any advice or experience sharing.