8000 Getting "Repeated socket failures" on PicoW · Issue #126 · adafruit/Adafruit_CircuitPython_Requests · GitHub
[go: up one dir, main page]

Skip to content

Getting "Repeated socket failures" on PicoW #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matoushybl opened this issue Jan 24, 2023 · 18 comments
Closed

Getting "Repeated socket failures" on PicoW #126

matoushybl opened this issue Jan 24, 2023 · 18 comments

Comments

@matoushybl
Copy link
matoushybl commented Jan 24, 2023

Hi,
I am trying to use this library on a Pico W with CircuitPython 8.0.0 beta 6 and I also tried nightly. The problem is that most of the time, I am getting "OutOfRetries: Repeated socket failures" when trying to GET basic URLs. The code is as follows:

import os
import time
import ssl
import wifi
import socketpool
import adafruit_requests

quotes_url = "http://wifitest.adafruit.com/testwifi/index.html"

wifi.radio.connect(os.getenv('CIRCUITPY_WIFI_SSID'), os.getenv('CIRCUITPY_WIFI_PASSWORD'))

pool = socketpool.SocketPool(wifi.radio)
requests = adafruit_requests.Session(pool, ssl.create_default_context())

print("Fetching text from %s" % quotes_url)
response = requests.get(quotes_url)
print("-" * 40)
print("Text Response: ", response.text)
print("-" * 40)
response.close()

The output is:

Fetching text from http://wifitest.adafruit.com/testwifi/index.html
Traceback (most recent call last):
  File "code.py", line 80, in <module>
  File "adafruit_requests.py", line 721, in get
  File "adafruit_requests.py", line 686, in request
OutOfRetries: Repeated socket failures

Randomly, the request is ok, but after rerunning the code it fails. Power cycling the board doesn't work and I've tried multiple APs.

Can you please give me any pointers on how to debug this problem?
Thanks!

@matoushybl
Copy link
Author

From some further debugging, it seems that it works before first power cycle is performed. I also tried MQTT and from the logs it seems like the Pico is capable of transmitting the data to the broker, it is not able to receive data and fails with the following message.

Traceback (most recent call last):
  File "code.py", line 103, in <module>
  File "adafruit_minimqtt/adafruit_minimqtt.py", line 530, in connect
MMQTTException: No data received from broker for 10 seconds.
device-cluster-mosquitto-1  | 1674650912: New client connected from 172.18.0.1:56462 as cpy35454 (p2, c1, k60).
device-cluster-mosquitto-1  | 1674650912: Client cpy35454 closed its connection.

I am starting to think, that this problem is connected to something in the wifi driver and I should probably move this issue to the CircuitPython repo.

@anecdata
Copy link
Member

@matoushybl can you try this code, it bypasses requests (we can debug requests in detail depending how this comes out):

import os
import wifi
import socketpool
import ssl
import ipaddress
from secrets import secrets

HOST = "wifitest.adafruit.com"
PATH = "/testwifi/index.html"
PORT = 80
TIMEOUT = 60
MAXBUF = 1024

print("Connecting to wifi")
wifi.radio.connect(os.getenv('CIRCUITPY_WIFI_SSID'), os.getenv('CIRCUITPY_WIFI_PASSWORD'))

pool = socketpool.SocketPool(wifi.radio)

print("Create TCP Client Socket")
s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

s.settimeout(TIMEOUT)

print("Connecting")
s.connect((HOST, PORT))

size = s.send(f"GET {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")

# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])

s.close()
print("Done-ish.")

@matoushybl
Copy link
Author

I'll do that. I made some progress yesterday when I came across a similar issue here: adafruit/circuitpython#7333 . It seems like when I disable the web workflow by not specifying the SSID and password with the CIRCUITPY_ prefix, it all starts to work.

@matoushybl
Copy link
Author

First running with CIRCUITPY_ prefix:

Connecting to wifi
Create TCP Client Socket
Connecting
Sent 69 bytes
Received 314 bytes bytearray(b'HTTP/1.1 200 OK\r\nServer: nginx/1.18.0 (Ubuntu)\r\nDate: Thu, 26 Jan 2023 15:04:29 GMT\r\nContent-Type: text/html\r\nContent-Length: 69\r\nLast-Modified: Thu, 09 Dec 2021 17:26:22 GMT\r\nConnection: keep-alive\r\nETag: "61b23c3e-45"\r\nAccept-Ranges: bytes\r\n\r\nThis is a test of Adafruit WiFi!\nIf you can read this, its working :)')
Done-ish.

After powerclycling:

Connecting to wifi
Create TCP Client Socket
Connecting
Sent 69 bytes
Traceback (most recent call last):
  File "code.py", line 31, in <module>
OSError: [Errno 116] ETIMEDOUT

Works flawlessy after removing the SSID and password prefix and power cycling.

@anecdata
Copy link
Member

It sounds like you are not connected to wifi. The settings.toml file uses CIRCUITPY_WIFI_SSID and CIRCUITPY_WIFI_PASSWORD for web workflow. Leaving those out of the file, or commenting them. out, will disable web workflow.

You can use those names, or anything you like, for basic connection outside of web workflow. But in any case, the wifi.radio.connect() call must use credentials that are defined correctly somewhere.

You can verify that you are connected by printing or checking wifi.radio.ipv4_address in the code - it should be a valid IPv4 and not None or 0.0.0.0

@matoushybl
Copy link
Author

I am pretty sure, I am connected to the wifi. I am sorry for misleading communication.
Here is what is working for me, even across multiple reboots:

import os
import wifi
import socketpool
import ssl
import ipaddress

HOST = "wifitest.adafruit.com"
PATH = "/testwifi/index.html"
PORT = 80
TIMEOUT = 60
MAXBUF = 1024

print("Connecting to wifi")
wifi.radio.connect(os.getenv('WIFI_SSID'), os.getenv('WIFI_PASSWORD'))
print("My IP address is", wifi.radio.ipv4_address)

pool = socketpool.SocketPool(wifi.radio)

print("Create TCP Client Socket")
s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

s.settimeout(TIMEOUT)

print("Connecting")
s.connect((HOST, PORT))

size = s.send(f"GET {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")

# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])

s.close()
print("Done-ish.")

settings.toml:

WIFI_SSID = "SSID"
WIFI_PASSWORD = "PASSWORD"

output:

code.py output:
Connecting to wifi
My IP address is 10.15.0.27
Create TCP Client Socket
Connecting
Sent 69 bytes
Received 314 bytes bytearray(b'HTTP/1.1 200 OK\r\nServer: nginx/1.18.0 (Ubuntu)\r\nDate: Thu, 26 Jan 2023 21:29:07 GMT\r\nContent-Type: text/html\r\nContent-Length: 69\r\nLast-Modified: Thu, 09 Dec 2021 17:26:22 GMT\r\nConnection: keep-alive\r\nETag: "61b23c3e-45"\r\nAccept-Ranges: bytes\r\n\r\nThis is a test of Adafruit WiFi!\nIf you can read this, its working :)')
Done-ish.

Here is what doesn't work for me:

import os
import wifi
import socketpool
import ssl
import ipaddress

HOST = "wifitest.adafruit.com"
PATH = "/testwifi/index.html"
PORT = 80
TIMEOUT = 60
MAXBUF = 1024

print("Connecting to wifi")
wifi.radio.connect(os.getenv('CIRCUITPY_WIFI_SSID'), os.getenv('CIRCUITPY_WIFI_PASSWORD'))
print("My IP address is", wifi.radio.ipv4_address)

pool = socketpool.SocketPool(wifi.radio)

print("Create TCP Client Socket")
s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

s.settimeout(TIMEOUT)

print("Connecting")
s.connect((HOST, PORT))

size = s.send(f"GET {PATH} HTTP/1.1\r\nHost: {HOST}:{PORT}\r\n\r\n".encode())
print("Sent", size, "bytes")

# just get the first hunk and call it a day
buf = bytearray(MAXBUF)
size = s.recv_into(buf)
print('Received', size, "bytes", buf[:size])

s.close()
print("Done-ish.")

settings.toml:

CIRCUITPY_WIFI_SSID = "SSID"
CIRCUITPY_WIFI_PASSWORD = "PASSWORD"

First output is as follows - everything works:

Connecting to wifi
My IP address is 10.15.0.27
Create TCP Client Socket
Connecting
Sent 69 bytes
Received 314 bytes bytearray(b'HTTP/1.1 200 OK\r\nServer: nginx/1.18.0 (Ubuntu)\r\nDate: Thu, 26 Jan 2023 21:31:03 GMT\r\nContent-Type: text/html\r\nContent-Length: 69\r\nLast-Modified: Thu, 09 Dec 2021 17:26:22 GMT\r\nConnection: keep-alive\r\nETag: "61b23c3e-45"\r\nAccept-Ranges: bytes\r\n\r\nThis is a test of Adafruit WiFi!\nIf you can read this, its working :)')
Done-ish.

After power cycling the board:

Connecting to wifi
My IP address is 10.15.0.27
Create TCP Client Socket
Connecting
Sent 69 bytes
Traceback (most recent call last):
  File "code.py", line 32, in <module>
OSError: [Errno 116] ETIMEDOUT

This behavior is consistent over multiple reboots and runs of the code.

@anecdata
Copy link
Member
anecdata commented Jan 27, 2023

When web workflow is on and it works the first time... what are the conditions preceding that (what had it been doing until that time? random?) vs. after a fresh power cycle? Is it the same if the board is reset using microcontroller.reset() or supervisor.reload() when web workflow is on? Does adding a delay early in the code do anything helpful? Reducing the timeout?

I haven't seen this behavior since a later beta-5, but I don't know what core change improved it at that time. But presumably some condition(s) in your environment might be bringing it out.

@anecdata
Copy link
Member

BTW, I added some debugging to requests around line 650 or so in the current version:

        # We may fail to send the request if the socket we got is closed already. So, try a second
        # time in that case.
        retry_count = 0
        last_exc = None
        while retry_count < 2:
            retry_count += 1
            socket = self._get_socket(host, port, proto, timeout=timeout)
            print(f"#### DEBUG {time.monotonic():.3f} _get_socket {host} {port} {proto} {timeout}")
            ok = True
            try:
                self._send_request(socket, host, method, path, headers, data, json)
                print(f"#### DEBUG {time.monotonic():.3f} _send_request SENT {socket} {host} {method} {path} {headers}")  # {data} {json}
            except OSError as exc:
                print(f"#### DEBUG {time.monotonic():.3f} _send_request")
                traceback.print_exception(exc, exc, exc.__traceback__)
                ok = False
            if ok:
                # Read the H of "HTTP/1.1" to make sure the socket is alive. send can appear to work
                # even when the socket is closed.
                if hasattr(socket, "recv"):
                    result = socket.recv(1)
                else:
                    result = bytearray(1)
                    try:
                        socket.recv_into(result)
                    except OSError as exc:
                        print(f"#### DEBUG {time.monotonic():.3f} recv_into")
                        traceback.print_exception(exc, exc, exc.__traceback__)
                        pass
                if result == b"H":
                    # Things seem to be ok so break with socket set.
                    break
                else:
                    print(f"#### DEBUG {time.monotonic():.3f} result NOT == {b'H'}")
            else:
                print(f"#### DEBUG {time.monotonic():.3f} NOT OK")
            self._close_socket(socket)
            socket = None

        if not socket:
            raise OutOfRetries("Repeated socket failures") from last_exc

that gives a little more detail on the underlying errors (need to add imports for time and traceback)

@anecdata
Copy link
Member
anecdata commented Feb 5, 2023

@matoushybl Can you try adding a delay like time.sleep(5) to the top of your code, before attempting any network operations?

@matoushybl
Copy link
Author

I tried adding the delay, but it didn't change anything. I have also updated to CircuitPython 8.0.0.

@anecdata
Copy link
Member
anecdata commented Feb 7, 2023

with the 5-second delay in, it doesn't work even after a power cycle or microcontroller.reset()?

(for example, web workflow enabled with settings.toml, this example code:
adafruit/circuitpython#7333 (comment)
then power cycle or reset)

@matoushybl
Copy link
Author

That code works, interesting. I wasn't able to make it fail during a short testing.

@anecdata
Copy link
Member
anecdata commented Feb 9, 2023

Good news. The key is that after a reset (reset button, power cycle, or microcontroller.reset()), there has to be enough of a delay before network operations to allow the Pico W's CYW43 wifi module to start up correctly. That means code with the delay inserted, or reset into the REPL and then by the time any code can be manually loaded or typed into the REPL the wifi module has had enough time to start up fully. If it doesn't start up correctly, no amount of code or time or soft reloads (like control-D) will fix it. The module only works correctly after a reset when the wifi module is allowed to start up fully.

Hopefully, this can be addressed in the core so code doesn't need these delays. There is an open issue for that (7333). But closing this library issue for now since we believe it's the same as the core issue, and the workaround is in place can be used for now. If you continue to experience this issue, please re-open.

@anecdata anecdata closed this as completed Feb 9, 2023
@dhalbert
Copy link
Contributor
dhalbert commented Feb 9, 2023

@anecdata I'm confused. Are you saying that the delay solves the problem? But I would consider that a workaround, since we'd have to document that. We need some test that says the CYW43 is really ready, and then we can do that invisibly.

I wonder if there is some transaction going on with the access point that needs to finish, but the CYW43 is not revealing that it's not in a good state before that.

@anecdata
Copy link
Member
anecdata commented Feb 9, 2023

I just thought this library issue was redundant with the core issue. Any real solution isn't in the library. We can re-open if that's more appropriate.

I don't know what aspect of the CYW43 startup is going bad (and staying bad) when not enough time is allowed.

@dhalbert
Copy link
Contributor
dhalbert commented Feb 9, 2023

ah sorry, I missed that this was the library issue

@anecdata
Copy link
Member

@matoushybl This has been fixed with adafruit/circuitpython#7589 and should be in the S3 Absolute Newest soon.

@matoushybl
Copy link
Author

Thanks! I am looking forward to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0