-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Espressif boards wifi scanning instability upon reload (soft reset) #6866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you test with With import time
import supervisor
import wifi
def scan_wifi():
for network in wifi.radio.start_scanning_networks():
print("{0:02X}:{1:02X}:{2:02X}:{3:02X}:{4:02X}:{5:02X}".format(*network.bssid), end=" ")
print("{:>2d}".format(network.channel), end=" ")
print(network.rssi, network.authmode, network.country, network.ssid)
wifi.radio.stop_scanning_networks()
scan_wifi()
time.sleep(5)
supervisor.reload() |
I perhaps should mention:
|
Lack of |
I can confirm similar behavior on an ESP32-S3 sitting on my desk right next to the router. I'm basically just doing wifi.radio.connect(secrets["ssid"], secrets["password"]) and sometimes after resetting it just errors with
and indeed scanning at this point returns no networks. A hard reset will bring it back to working normally. |
Cannot reproduce it on my beetle ESP32-C3.. So, S2 and S3 have the issue but not C3. |
Cannot reproduce using adafruit feather s2 on 8.0.0 beta, ran anecdata's snippet for an hour np. Tested during adafruit's Community Help Desk on Saturday morning 9/10/22 |
Running @anecdata's demo code on an On a hard reset, the first attempt to use the wifi works, it can scan nearby networks and connect to one with it's ssid/password. It can then do standard networking, get and post data, etc. Softresets thereafter fail to find any wifi network on the scan. To do an extra test I erased the flash, reloaded the binary and tried the demo code again and the issue persists. Now I've noticed that while most scans fail to find any network, every once in a while it will manage to see one or two networks on a scan. After moving the device around, and just watching the few networks it does see, it never notices any network with an rssi below -59 after the soft reload, but on a hard reset it noticed networks all the way down to -88. (These are probably soft signal strength boundaries). There's also enough nearby networks that router distance shouldn't be an issue. But it does apparently contribute. Still, I can set the metro next to the router and frequently will fail to connect so the issue isn't exclusively that. In |
That brings up a question about the RSSI data actually being flawed. In my test I have 2 AP's sitting right next to each other which usually provide near identical signal according to my wifi manager app on my tablet and phone... but scanning with that code within esp32-s2 showed an unusual amount of deviation between my AP's to the point it gives me reason to question the accuracy. |
The networks however should still be visible and connectable, regardless of RSSI.. |
Oh right, that's an inconsequential issue if it doesn't even work for you, my apologies. |
I grabbed the nightly from I left the code running while I went about my day and came back to soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
File "code.py", line 7, in <module>
MemoryError: Failed to allocate Wifi memory hitting Entering the REPL I ran the following:
And then exiting with |
Have you tried manually adding gc.collect and mem_free in different spots? Could help narrow down where the leak is coming from. |
No, I believe gc.collect impacts the memory allotted to the program space that code.py uses, whereas given the error, MemoryError: Failed to allocate Wifi memory it appears to be during the core's startup and initialization and how it allocates memory for the code managing the wifi components so gc.collect shouldn't have access to that block. But just in case I tried it as well (because it was a great suggestion and even if it didn't solve it, it does help confirm that it's associate with the core's allocation) and put that call at the end just before the sleep+soft reboot calls and the program still ran out of memory after a period of soft rebooting. So the useful info I have is that a Metro ESP32-S2 seems to experience the original bug @bill88t had reported, and that another likely attribute of the bug is a memory error after enough soft reboot's take place, and now that gc.collect() does not impact the |
@KeithTheEE yes, the memory error is in wifi init, and arises from an esp-idf call. Using the |
Running the same sample code, but adding the line, ESPIDF Heap caps free: 58808
Code done running.
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free: 56688
Code done running.
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free: 58636 On the first couple of loops. After running for a long while it fails with the memory error. Right before it fails due to memory error it has far less memory: Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free: 4420
Code done running.
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free: 4384
Code done running.
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
File "code.py", line 3, in <module>
MemoryError: Failed to allocate Wifi memory
Code done running.
Press any key to enter the REPL. Use CTRL-D to reload. The two edge cases--start and end of the program--show changes in memory that are not representative of the typical memory change. Most loops show 32 (bytes? free heap caps?) less than the previous call. It does vary, sometimes. Usually on the scale of one loop will drop 28, and then a loop or two later it'll drop 36. Occasionally (and I noticed this took place around the same times it saw wifi networks) it would jump up by 1000 or so. Sometimes it'd drop back after two or three loops, sometimes it'd slowly drop back down at roughly 32 unit intervals. No
|
It appears (to me) that: (1) there is a small memory leak in the CP core or (more likely) in the esp-idf; (2) |
Curiously, on my print(f"{gc.mem_free()} {espidf.heap_caps_get_total_size()} {espidf.heap_caps_get_free_size()} {espidf.heap_caps_get_largest_free_block()}") Which brings up additional complicating factors: • differences in network environment; • possibly differences in behavior between different ESP32-S2 chips / modules / antennas. Addendum: realized I do have web workflow enabled on this board. If web workflow is not enabled (no |
I did not have web workflow enabled either, so that might be a key bit to narrow down the issue |
Doing a bisect I believe I managed to track down the origin of the problem with #6531 Tested without wifi-workflow enabled, using a script that scans and reloads, and stops if it doesn't find any ssid. |
Since it works with web workflow enabled (device is auto-connected to an AP), I tried connecting before the scan w/o web workflow, and it gets results the first time then |
I will test again on s2 and c3 tonight. This time w/ and wo/ web workflow on both, and plot |
I am using a Magtag (with an esp32-s2) and the code that connects to my router with 7.3.3 will not connect with 8.0.0-beta. No network with this ssid. When I look at some recent commits, I see some tinkering with the resetting (or not resetting) of wifi. Maybe this is the cause. |
@bill88t Would it be helpful if I ran your WiFiTester script for the extra data? I tried running it on my Feather ESP32-S3 4MB Flash / 2MB PSRAM but I didn't know how to handle the .env part (forgive my ignorance on this). This bug is totally stalling my project on the Feather ESP32-S3. :/ I don't seem to have the option for this platform to downgrade to CircuitPython 7.3.3 either. I've ordered a Feather ESP32-S2 to see if I can reproduce this issue with my router using both 7.3.3 and 8.0.0 on the S2 platform. |
@saketvora It would be useful to get a reading from ESP32-S3 as to confirm it's behaviour.
|
This issue most certainly though affects all esp boards. Someone should update the tags too. |
A temporary workaround would be to simply force the board to always .reset() upon code completion as this bug occurs only on soft reloads.
You can try to build it if you'd like. Not gonna claim it will work, but you can try. Building |
Thank you @bill88t Sorry, I just downloaded the latest WifiTester code from the GitHub link today but the results.json doesn't seem as well formatted for me. Here's the raw data:
|
@saketvora Wait.
From these results it means it can see at least some wifi's most of the times. So slightly reduced range? And since these results are vastly different from mine, could you please run it again with:
set at the top of the script? This will make it do many more runs. Note: To set the test to run again:
|
I have been playing with an esp32 proS3 and wifi has been almost completely unusable BUT only when I have the board connected to my desktop via the usb-c cable. At first I thought it was because I was screened into the board's terminal but it didn't matter in the end. Using the same code, without change, it connects to WiFi almost 100% of the time after reboots when the board isn't connected to the the PC. I have an ssd1306 connected which display the wifi status, which is how I confirm its connection (ip address) in addition I keep a ping active on another computer to the board. Configuration: Adafruit CircuitPython 8.0.0-beta.0 on 2022-08-18; ProS3 with ESP32S3 Memory Info - gc.mem_free()8153840 Bytes Flash - os.statvfs('/')Size: 12224512 Bytes |
@charlien What channel is the ProS3 connecting to? Please try it again having it connect on a channel 9 or above. |
I figured it out. At first I did attempt channel 9, but that didn't work so I had it iterate through all channels until it could find an address but that failed as well. The problem comes back to the same reason why I have to remote my logitech 2.4GHz mouse receiver. USB interference (ref: https://www.usb.org/sites/default/files/327216.pdf). I should've known since the WiFi module only supports 2.4GHz. Thanks for checking in though @UnexpectedMaker, enjoying the small footprint board so far! This also solved the ability for it to scan other networks as well, for the same reason. |
Oh fantastic that you found the cause - bummer about what it is though! |
This explains a lot of weird WIFI stuff on my workbench where I have a USB3.0 hub that I plug my boards into... 🤦♂️ |
Testing on: Turning web workflow on makes the scanning (and connection) more reliable. The following code, executed before a wifi scan or wifi connect, also fixes the exceptions for me: wifi.radio.enabled = False
wifi.radio.enabled = True It seems that something in the non-web-workflow path in CP 8.x isn't fully init-ing or de-init-ing wifi. And it seems, based on Neradoc's bisect, that it's related to either the esp-idf 4.4 update itself or to one of the small handful of ancillary changes made at the same time. Update: Not quite that simple... the list of networks returned in a scan is shorter after a reload than it is on a fresh reset (and they're all on channel 1 and have very strong signals). Connections still look fine if wifi is disabled and re-enabled, and no scan is done. But, if the code disables and re-enables wifi, then connects, and then scans wifi... subsequent connection attempts won't work after reload. i.e., the presence of a scan affects post-reload behavior for scan and connection. |
From my experience I agree with your findings. Scan only detected strong signals and only ones that were on the same channel. I had my AP's set on autoscan and the time they both showed up they were both on channel 6. All the other 10 or so weak SSID's that normally populate from my neighbors were absent. I manually set one AP to channel 10 and the other to channel 11. Rescanned and only the AP on channel 10 showed up. It's all very random. I'm on the Adafruit Feather S2 2M PSRAM for testing in this particular issue, I have 3 of them btw and all exhibit identical symptoms. Adafruit CircuitPython 8.0.0-beta.0-72-ga7b10d41b on 2022-09-26; Adafruit Feather ESP32S2 with ESP32S2 |
Same caveat as my comment on issue 6791: the |
Similar observations here with an ESP32-S3 (4MB Flash, 2MB PSRAM) on 8.0.0 beta 1. I switched my WiFi to channel 1 for the time being, and that has helped -- it was on 11 and I couldn't even always get a connection after a hard-reset (only 70% success rate, give or take), and even if it did connect, it never reconnected after a soft-reset. With the lower channel, it seems consistently able to get a connection after a hard-reset. With web workflow disabled, initial connection is consistent, but WiFi scans get almost no results after a soft-reset - only strong signals, and only those on lower channels (I mostly only see channel 1). If my WiFi is on channel 11, it's completely unable to connect after a soft-reset, even with the AP being 6 feet away. Only channel 1, it'll usually reconnect after a soft-reset. With web workflow enabled, it consistently connects and WiFi scans consistently return results even after a soft-reset (including some APs on channels 6 and 11). |
|
(This'll cover most of the details I outline in the linked discord message that way the information is here in one place, and will have some added information that I hadn't finished putting together until now.) Implementing While I had the
|
I have now had time to test with a Metro ESP32-S2, using variants of @anecdata's test program. web workflow is off. import time
import microcontroller
import supervisor
import wifi
def scan_wifi():
for network in wifi.radio.start_scanning_networks():
print("{0:02X}:{1:02X}:{2:02X}:{3:02X}:{4:02X}:{5:02X}".format(*network.bssid), end=" ")
print("{:>2d}".format(network.channel), end=" ")
print(network.rssi, network.authmode, network.country, network.ssid)
wifi.radio.stop_scanning_networks()
while True: # change to an if to do a reload or reset
time.sleep(5)
#wifi.radio.enabled = False
#wifi.radio.enabled = True
scan_wifi()
time.sleep(5)
supervisor.reload()
#microcontroller.reset() As others have seen, doing a full reset each time around shows all the available AP's. Or just repeatedly scanning every few seconds does the same, without a reload or reset. Changing to I tried lengthening the sleeps, but it made no difference. I am not going to look at the ESP-IDF internal heap consumption until this is debugged, since I think fixing this may fix the storage leak. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I have made this problem go away by using the stock |
CircuitPython version
Code/REPL
Behavior
On first boot:
After exiting to REPL and reloading:
Meanwhile if I do it in real REPL it works (somewhat, it only sees the network 10cm away from it),
even after many reloads:
To further test, I made code.py the Code/REPL provided code, and pressed Ctrl-D a few times:
Description
Even if the Pi400 network which is 10cm from it, is seen, it can't connect to it.
Additional information
On first boot and circuitpy 7.x it works fine.
Does not go away if I toggle wifi.radio.enabled.
The text was updated successfully, but these errors were encountered: