8000 Client Hangs during subscribe or unsubscribe · Issue #110 · adafruit/Adafruit_CircuitPython_MiniMQTT · GitHub
[go: up one dir, main page]

Skip to content

Client Hangs during subscribe or unsubscribe #110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
calcut opened this issue Jun 28, 2022 · 11 comments
Closed

Client Hangs during subscribe or unsubscribe #110

calcut opened this issue Jun 28, 2022 · 11 comments

Comments

@calcut
Copy link
Contributor
calcut commented Jun 28, 2022

I've been seeing intermittent failures with MQTT, where function calls can hang and never return.
I've been working around it with a watchdog timer to detect the condition and hard reset the device, which is pretty ugly.

I think various things can cause it, but seems to be reproducable with subscribe or unsubscribe
If I run the code below, it typically loops for 10-30s before getting stuck. Sometimes longer.

Potentially another symptom of an underlying problem, see other issues e.g.
#107
#101

I have also tried this with a completely different MQTT broker / service, and had similar results. Quicker to fail if anything.
So I don't think this is down to the Adafruit IO server.

I wondered about throttling, but the time to failure seems too inconsistent

Adafruit CircuitPython 7.3.0 on 2022-05-23; Adafruit Feather ESP32S2 with ESP32S2

Minimal example

import time
import wifi, ssl, socketpool
import adafruit_minimqtt.adafruit_minimqtt as MQTT

from secrets import secrets


def connected_callback(client, userdata, flags, rc):
    print("Connected to AIO")

def subscribe_callback(client, userdata, topic, granted_qos):
    print(f"Subscribed to {topic}")

def unsubscribe_callback(client, userdata, topic, granted_qos):
    print(f"Unsubscribed from {topic}")

def message_callback(client, topic, payload):
    print(f"{topic} = {payload}")

ssid = secrets["ssid"]
password = secrets["password"]
wifi.radio.connect(ssid, password)

pool = socketpool.SocketPool(wifi.radio)

client = MQTT.MQTT(
    broker="io.adafruit.com",
    username=secrets["aio_username"],
    password=secrets["aio_key"],
    socket_pool=pool,
    ssl_context=ssl.create_default_context(),
)

client.on_subscribe = subscribe_callback
client.on_unsubscribe = unsubscribe_callback
client.on_connect = connected_callback
client.on_message = message_callback
client.connect()

start = time.monotonic()
while True:
    print(f'time = {round(time.monotonic() - start, 1)}s')
    client.subscribe('time/hours')
    time.sleep(0.1)
    client.unsubscribe('time/hours')
    time.sleep(0.1)
@calcut
Copy link
Contributor Author
calcut commented Jun 28, 2022

Running it a few more times, I've seen it take 195s to hang.

And it also got to 411s once then has an OS error.

time = 411.3s
Subscribed to time/hours
Traceback (most recent call last):
File "/circuitpy_septic_tank/septic_tank_aio_debug.py", line 45, in
File "adafruit_minimqtt/adafruit_minimqtt.py", line 750, in unsubscribe
File "adafruit_minimqtt/adafruit_minimqtt.py", line 841, in _wait_for_msg
File "adafruit_minimqtt/adafruit_minimqtt.py", line 875, in _recv_len
File "adafruit_minimqtt/adafruit_minimqtt.py", line 907, in _sock_exact_recv
OSError: [Errno 116] ETIMEDOUT

@ladyada
Copy link
Member
ladyada commented Jun 28, 2022

can you wrap in try/except? wifi is just not a reliable transport

@calcut
Copy link
Contributor Author
calcut commented Jun 28, 2022

I could try something like that, although most of the time it does not produce an exception. So I think I'd still be relying on some kind of watchdog timer.

I've actually tried switching to http rather than mqtt, (which is much slower due to overheads) but hasn't produced any errors like this so far.

@ladyada
Copy link
Member
ladyada commented Jun 28, 2022

it should definitely not 'hang' - can you determine where the hang occurs?

@calcut
Copy link
Contributor Author
calcut commented Jun 28, 2022

Just ran my example again, it hung after 8.7s. i.e stopped printing anything.
After ctrl-c I get the following:

Subscribed to time/hours
Unsubscribed from time/hours
time = 8.1s
Subscribed to time/hours
Unsubscribed from time/hours
time = 8.7s
Subscribed to time/hours
Traceback (most recent call last):
  File "/circuitpy_septic_tank/septic_tank_aio_debug.py", line 45, in <module>
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 750, in unsubscribe
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 818, in _wait_for_msg
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 907, in _sock_exact_recv
KeyboardInterrupt: 

Code done running.

@calcut
Copy link
Contributor Author
calcut commented Jun 28, 2022

This is with that latest adafruit_minimqt release 5.3.2
I've also tried with CP 7.3.0, 7.3.1 and 8.0.0-alpha1, just in case. no obvious differences

@ladyada
Copy link
Member
ladyada commented Jun 28, 2022

kk what if the delay is shorter or longer between sub & unsub? just trying to get as much reproducability deets

@calcut
Copy link
Contributor Author
calcut commented Jun 29, 2022

Tried no delay and 2s delays. Same behaviour.

Further digging with print statements...

When it 'hangs', I can see
_sock_exact_recv() is being repeatedly called,

It is stuck in the while True: loop of unsubscribe() waiting for
op == 176 which never happens.

@calcut
Copy link
Contributor Author
calcut commented Jun 29, 2022

Maybe getting into the boundary between python and C now...
Behaviour I don't understand, e.g.

def _sock_exact_recv(self, bufsize):
      print('_sock_exact_recv')
      if not self._backwards_compatible_sock:
          # CPython/Socketpool Impl.
          rc = bytearray(bufsize)
          print(f'{bufsize=}')
          self._sock.recv_into(rc, bufsize)
          print(f'{rc=}')

gives me output like this, where I don't understand why rc= would only be printed some of the time.

_sock_exact_recv
bufsize=1
_sock_exact_recv
bufsize=1
_sock_exact_recv
bufsize=1
rc=bytearray(b'\x90')
_sock_exact_recv
bufsize=4
rc=bytearray(b'\x03\x01\x0f\x00')

@vladak
Copy link
Contributor
vladak commented Aug 16, 2022

Just ran my example again, it hung after 8.7s. i.e stopped printing anything. After ctrl-c I get the following:

Subscribed to time/hours
Unsubscribed from time/hours
time = 8.1s
Subscribed to time/hours
Unsubscribed from time/hours
time = 8.7s
Subscribed to time/hours
Traceback (most recent call last):
  File "/circuitpy_septic_tank/septic_tank_aio_debug.py", line 45, in <module>
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 750, in unsubscribe
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 818, in _wait_for_msg
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 907, in _sock_exact_recv
KeyboardInterrupt: 

Code done running.

I am having the same problem on Adafruit CircuitPython 7.3.2 on 2022-07-20; Adafruit Feather ESP32S2 with ESP32S2. Even though the socket timeout is set in

(to 1 second - cannot be changed #112), it does not seem to have any effect. In my case it is the connect() that is hanging, however the underlying issue is the same I believe.

For details see https://forums.adafruit.com/viewtopic.php?p=937190#p937190

@calcut
Copy link
Contributor Author
calcut commented Sep 2, 2022

I believe this issue has been resolved. My minimal example doesn't "hang" with the new version of this library. Although I did get a timeout exception, which I think is desirable behaviour.

Subscribed to time/hours
Unsubscribed from time/hours
time = 143.0s
Traceback (most recent call last):
  File "/circuitpy_septic_tank/septic_tank_aio_debug.py", line 43, in <module>
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 757, in subscribe
MMQTTException: No data received from broker for 10 seconds.

@calcut calcut closed this as completed Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0