10000 Espressif boards wifi scanning instability upon reload (soft reset) · Issue #6866 · adafruit/circuitpython · GitHub
[go: up one dir, main page]

Skip to content

Espressif boards wifi scanning instability upon reload (soft reset) #6866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bill88t opened this issue Sep 5, 2022 · 54 comments · Fixed by #7023
Closed

Espressif boards wifi scanning instability upon reload (soft reset) #6866

bill88t opened this issue Sep 5, 2022 · 54 comments · Fixed by #7023
Assignees
Milestone

Comments

@bill88t
Copy link
bill88t commented Sep 5, 2022

CircuitPython version

Adafruit CircuitPython 8.0.0-beta.0-30-g14498f793 on 2022-09-04; Waveshare ESP32-S2-Pico with ESP32S2

It's a build from adafruit-circuit-python.s3.amazonaws.com

Code/REPL

import wifi
a = wifi.radio.start_scanning_networks()
for i in a: print(str(i.ssid))
wifi.radio.stop_scanning_networks()
del wifi

Behavior

On first boot:

image

After exiting to REPL and reloading:

image

Meanwhile if I do it in real REPL it works (somewhat, it only sees the network 10cm away from it),
even after many reloads:

image

To further test, I made code.py the Code/REPL provided code, and pressed Ctrl-D a few times:

image

Description

Even if the Pi400 network which is 10cm from it, is seen, it can't connect to it.

Additional information

On first boot and circuitpy 7.x it works fine.
Does not go away if I toggle wifi.radio.enabled.

@bill88t bill88t added the bug label Sep 5, 2022
@anecdata
Copy link
Member
anecdata commented Sep 5, 2022

Can you test with 8.0.0-beta.0 to rule out some quirk in that daily build?

With Adafruit CircuitPython 8.0.0-beta.0 on 2022-08-18; Adafruit QT Py ESP32S2 with ESP32S2, this code.py runs continously with good, full AP results:

import time
import supervisor
import wifi

def scan_wifi():
    for network in wifi.radio.start_scanning_networks():
        print("{0:02X}:{1:02X}:{2:02X}:{3:02X}:{4:02X}:{5:02X}".format(*network.bssid), end=" ")
        print("{:>2d}".format(network.channel), end=" ")
        print(network.rssi, network.authmode, network.country, network.ssid)
    wifi.radio.stop_scanning_networks()

scan_wifi()
time.sleep(5)
supervisor.reload()

@bill88t
Copy link
Author
bill88t commented Sep 5, 2022

Repeated the exact same code.py I did above, on beta-0:

image

This time it once more worked.. but only once..

Tried with your code snippet too, pretty much same results..
It only once registed:
E4:5F:01:05:51:4E 1 -29 [wifi.AuthMode.WPA, wifi.AuthMode.PSK] XY Pi400
(I do not mind sharing my pi's mac)

@bill88t
Copy link
Author
bill88t commented Sep 5, 2022

I perhaps should mention:

  • There is no .env file for web workflow provided.
  • This board is in a diy airtight box, safe from any tampering.
  • The Pi400 wifi is 10cm away from it.
  • The power is supplied via the Pi400, it is more than adequate.

@anecdata
Copy link
Member
anecdata commented Sep 5, 2022

Lack of .env, or of any valid Station or AP, should not be a problem. Scanning is independent of Station and AP. I wonder if the box material is attenuating wifi signals, but it's still hard to explain why it would only receive on the first try. Can you test it outside of the box, and also closer to some other APs? Do you have another (perhaps different) device you can try inside the box?

@bill88t
Copy link
Author
bill88t commented Sep 5, 2022

Chief, da box plastic.
PXL_20220905_210210099 NIGHT

Tested it outside the box in the same setup, and it's the same.

Tested on fone, near more ap's:
Screenshot_20220906-001830

Also, those char's are new and annoying.
And im back to the daily build because of different board pins on older builds.
Can't keep swapping ljinux pintabs.

@dhalbert dhalbert added this to the 8.0.0 milestone Sep 5, 2022
@pepijndevos
Copy link

I can confirm similar behavior on an ESP32-S3 sitting on my desk right next to the router.

I'm basically just doing

wifi.radio.connect(secrets["ssid"], secrets["password"])

and sometimes after resetting it just errors with

ConnectionError: No network with that ssid

and indeed scanning at this point returns no networks. A hard reset will bring it back to working normally.

@bill88t
Copy link
Author
bill88t commented Sep 8, 2022

Cannot reproduce it on my beetle ESP32-C3..
Did 20 reloads and tried some resets, all fine, no signal reduction.

So, S2 and S3 have the issue but not C3.
I will update the title respectively.

@bill88t bill88t changed the title ESP32-S2 wifi scanning instability upon soft reset ESP32-S2/S3 wifi scanning instability upon reload (soft reset) Sep 8, 2022
@DJDevon3
Copy link
DJDevon3 commented Sep 10, 2022

Cannot reproduce using adafruit feather s2 on 8.0.0 beta, ran anecdata's snippet for an hour np. Tested during adafruit's Community Help Desk on Saturday morning 9/10/22

@KeithTheEE
Copy link

Running @anecdata's demo code on an Adafruit CircuitPython 8.0.0-beta.0 on 2022-08-18; Adafruit Metro ESP32S2 with ESP32S2 and I can replicate the bug.

On a hard reset, the first attempt to use the wifi works, it can scan nearby networks and connect to one with it's ssid/password. It can then do standard networking, get and post data, etc.

Softresets thereafter fail to find any wifi network on the scan.

To do an extra test I erased the flash, reloaded the binary and tried the demo code again and the issue persists.

Now I've noticed that while most scans fail to find any network, every once in a while it will manage to see one or two networks on a scan.

After moving the device around, and just watching the few networks it does see, it never notices any network with an rssi below -59 after the soft reload, but on a hard reset it noticed networks all the way down to -88. (These are probably soft signal strength boundaries). There's also enough nearby networks that router distance shouldn't be an issue. But it does apparently contribute. Still, I can set the metro next to the router and frequently will fail to connect so the issue isn't exclusively that.

In Adafruit CircuitPython 7.3.3 on 2022-08-29; Adafruit Metro ESP32S2 with ESP32S2 this metro runs the demo without issue, noticing the networks consistently on each soft reboot.

@bill88t
Copy link
Author
bill88t commented Sep 10, 2022

Tried again with 8.0.0-beta.0-41-g33a100611 on ESP32-S2:
With some slightly different results (1st reload and 2nd reload are the same):
image
This is with 2 soft reloads, same hardware setup, on ljinux since the results don't differ from the scripts.
It can see the networks that are not too far or too close..

@bill88t
Copy link
Author
bill88t commented Sep 10, 2022

Actually, on first boot, those networks are considered to have, bad reception (< -80)..
image
Something very weird is going on..

@DJDevon3
Copy link
DJDevon3 commented Sep 10, 2022

That brings up a question about the RSSI data actually being flawed. In my test I have 2 AP's sitting right next to each other which usually provide near identical signal according to my wifi manager app on my tablet and phone... but scanning with that code within esp32-s2 showed an unusual amount of deviation between my AP's to the point it gives me reason to question the accuracy.

@bill88t
Copy link
Author
bill88t commented Sep 10, 2022

The networks however should still be visible and connectable, regardless of RSSI..

@DJDevon3
Copy link

Oh right, that's an inconsequential issue if it doesn't even work for you, my apologies.

@KeithTheEE
Copy link

I grabbed the nightly from Adafruit CircuitPython 8.0.0-beta.0-41-g33a100611 on 2022-09-10; Adafruit Metro ESP32S2 with ESP32S2 this morning to check if the bug had been fixed and it hadn't.

I left the code running while I went about my day and came back to

soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 7, in <module>
MemoryError: Failed to allocate Wifi memory

hitting CTRL+D to reload generates the same error.

Entering the REPL I ran the following:

Adafruit CircuitPython 8.0.0-beta.0-41-g33a100611 on 2022-09-10; Adafruit Metro ESP32S2 with ESP32S2
>>> import gc
>>> gc.mem_free()
2047984

And then exiting with CTRL+D to reload code.py the program runs into the Wifi Memory Error again.

@DJDevon3
Copy link
DJDevon3 commented Sep 11, 2022

Have you tried manually adding gc.collect and mem_free in different spots? Could help narrow down where the leak is coming from.

@KeithTheEE
Copy link

No, I believe gc.collect impacts the memory allotted to the program space that code.py uses, whereas given the error,

MemoryError: Failed to allocate Wifi memory

it appears to be during the core's startup and initialization and how it allocates memory for the code managing the wifi components so gc.collect shouldn't have access to that block. But just in case I tried it as well (because it was a great suggestion and even if it didn't solve it, it does help confirm that it's associate with the core's allocation) and put that call at the end just before the sleep+soft reboot calls and the program still ran out of memory after a period of soft rebooting.

So the useful info I have is that a Metro ESP32-S2 seems to experience the original bug @bill88t had reported, and that another likely attribute of the bug is a memory error after enough soft reboot's take place, and now that gc.collect() does not impact the Failed to allocate Wifi memory aspect.

@anecdata
Copy link
Member

@KeithTheEE yes, the memory error is in wifi init, and arises from an esp-idf call. Using the espidf module, could track usage of espidf memory https://docs.circuitpython.org/en/latest/shared-bindings/espidf/index.html. Could also allocate some PSRAM to esp-idf memory using CIRCUITPY_RESERVED_PSRAM= in the .env file, but I really don't think that should be necessary if things are working right... just for a scan.

@KeithTheEE
Copy link

Running the same sample code, but adding the line, print("ESPIDF Heap caps free: ", espidf.heap_caps_get_free_size()) once per program run, I end up with the output:

ESPIDF Heap caps free:  58808

Code done running.
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free:  56688

Code done running.
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free:  58636

On the first couple of loops.

After running for a long while it fails with the memory error. Right before it fails due to memory error it has far less memory:

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free:  4420

Code done running.
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
ESPIDF Heap caps free:  4384

Code done running.
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 3, in <module>
MemoryError: Failed to allocate Wifi memory

Code done running.

Press any key to enter the REPL. Use CTRL-D to reload.

The two edge cases--start and end of the program--show changes in memory that are not representative of the typical memory change.

Most loops show 32 (bytes? free heap caps?) less than the previous call.

It does vary, sometimes. Usually on the scale of one loop will drop 28, and then a loop or two later it'll drop 36.

Occasionally (and I noticed this took place around the same times it saw wifi networks) it would jump up by 1000 or so. Sometimes it'd drop back after two or three loops, sometimes it'd slowly drop back down at roughly 32 unit intervals.

No import wifi

Completely removing the scan_wifi() call, and removing the import wifi function, the code loops on a soft reboot and holds steady at

ESPIDF Heap caps free: 100076 independent of the number of soft reboot's prior.

importing wifi but not calling scan_wifi()

This drops the initial output of espidf.heap_caps_get_free_size() from the above 100076 to 54844, and then every loop thereafter drops that 54844 by around 32.

(roughly estimating) Setting the sleep to 1 second from the 5 seconds in the example code, it should take around half an hour to hit a memory error when import wifi is present in the code.

@anecdata
Copy link
Member

It appears (to me) that: (1) there is a small memory leak in the CP core or (more likely) in the esp-idf; (2) supervisor.reload() does not clear esp-idf memory usage since the exception happens in wifi init (from importing). So microcontroller.reset(), hard reset (button), or power cycle is necessary to get back to a clean state. I don't recall this being an issue in older version of esp-idf, perhaps it was introduced with an update.

@anecdata
Copy link
Member < 8000 /span>
anecdata commented Sep 15, 2022

Curiously, on my Adafruit CircuitPython 8.0.0-beta.0 on 2022-08-18; Adafruit QT Py ESP32S2 with ESP32S2, where repeated scans and reloads show a full complement of APs, the esp-idf free memory stays relatively constant at around 50-51kB. Using:

print(f"{gc.mem_free()} {espidf.heap_caps_get_total_size()} {espidf.heap_caps_get_free_size()} {espidf.heap_caps_get_largest_free_block()}")

Which brings up additional complicating factors: • differences in network environment; • possibly differences in behavior between different ESP32-S2 chips / modules / antennas.

Addendum: realized I do have web workflow enabled on this board. If web workflow is not enabled (no .env file), the issue is replicated (loss of ~20-36 bytes per loop, and no APs found).

@KeithTheEE
Copy link

I did not have web workflow enabled either, so that might be a key bit to narrow down the issue

@Neradoc
Copy link
Neradoc commented Sep 16, 2022

Doing a bisect I believe I managed to track down the origin of the problem with #6531
Which is an IDF update, quite appropriately.

Tested without wifi-workflow enabled, using a script that scans and reloads, and stops if it doesn't find any ssid.

@anecdata
Copy link
Member

Since it works with web workflow enabled (device is auto-connected to an AP), I tried connecting before the scan w/o web workflow, and it gets results the first time then ConnectionError: No network with that ssid on subsequent reloads. So it's something more involved than the connect() that lets scanning work with web workflow enabled.

@bill88t
Copy link
Author
bill88t commented Sep 16, 2022

I will test again on s2 and c3 tonight. This time w/ and wo/ web workflow on both, and plot espidf.heap_caps_get_free_size() over reloads

@bablokb
Copy link
bablokb commented Sep 20, 2022

I am using a Magtag (with an esp32-s2) and the code that connects to my router with 7.3.3 will not connect with 8.0.0-beta. No network with this ssid.

When I look at some recent commits, I see some tinkering with the resetting (or not resetting) of wifi. Maybe this is the cause.

@saketvora
Copy link

@bill88t Would it be helpful if I ran your WiFiTester script for the extra data? I tried running it on my Feather ESP32-S3 4MB Flash / 2MB PSRAM but I didn't know how to handle the .env part (forgive my ignorance on this).
Happy to try and help, just need a pointer or two.

This bug is totally stalling my project on the Feather ESP32-S3. :/ I don't seem to have the option for this platform to downgrade to CircuitPython 7.3.3 either. I've ordered a Feather ESP32-S2 to see if I can reproduce this issue with my router using both 7.3.3 and 8.0.0 on the S2 platform.

@bill88t
Copy link
Author
bill88t commented Sep 21, 2022

Would it be helpful if I ran your WiFiTester script for the extra data? I tried running it on my Feather ESP32-S3 4MB

@saketvora It would be useful to get a reading from ESP32-S3 as to confirm it's behaviour.
For the .env please take a look here
If the board has no CIRCUITPY usb access, you can from within the repl do something like:

>>> from storage import remount
>>> remount("/", False)
>>> a = open("/.env", "w")
>>> a.write("CIRCUITPY_WIFI_SSID='amogus'\n")
>>> a.write("CIRCUITPY_WIFI_PASSWORD='sussy'\n")
>>> a.write("CIRCUITPY_WEB_API_PASSWORD='baka'\n")
>>> a.write("CIRCUITPY_WEB_API_PORT=80")
>>> a.close()
>>> remount("/", True)
>>> import microcontroller
>>> microcontroller.reset()

@bill88t bill88t changed the title ESP32-S2/S3 wifi scanning instability upon reload (soft reset) Espressif boards wifi scanning instability upon reload (soft reset) Sep 21, 2022
@bill88t
Copy link
Author
bill88t commented Sep 21, 2022

This issue most certainly though affects all esp boards. Someone should update the tags too.

@bill88t
Copy link
Author
bill88t commented Sep 21, 2022

@saketvora

This bug is totally stalling my project on the Feather ESP32-S3.

A temporary workaround would be to simply force the board to always .reset() upon code completion as this bug occurs only on soft reloads.

I don't seem to have the option for this platform to downgrade to CircuitPython 7.3.3 either.

You can try to build it if you'd like. Not gonna claim it will work, but you can try. Building
(make sure you clone the 7.3.x branch)

@saketvora
Copy link

Thank you @bill88t
Board: Adafruit ESP32-S3 4MB Flash 2MB PSRAM
Adafruit CircuitPython 8.0.0-beta.0-57-g32d8dd425 on 2022-09-16
WiFi network: 2.4GHz only, WPA2, Asus ZenWifi XT8 (mesh).
Location: 40ft from router through walls (i.e.., not close by or with line of sight)

Sorry, I just downloaded the latest WifiTester code from the GitHub link today but the results.json doesn't seem as well formatted for me. Here's the raw data:

{"version": "8.0.0-beta.0-62-gb6f67be3e", "espmem_after_env": [216700, 215508, 215508, 215508, 215508, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612], "espmem_before": [224164, 223832, 223796, 223776, 223736, 223688, 223668, 223632, 223588, 223576, 223524, 223488, 223452, 223444, 223416, 223380, 223352, 223308, 223280, 223260, 223196, 223160, 223148, 223112, 223092, 223060, 223020, 222976, 222932, 222916, 222884, 222852, 222824, 222768, 222744, 222716, 222652, 222616, 222624, 222596, 222568, 222544, 222480, 222468, 222432, 222408, 222372, 222336, 222300, 222272], "PartialFail_env": 1, "CompleteFail": 0, "want_env": false, "CompleteFail_env": 0, "iterations": 101, "board": "Adafruit Feather ESP32S3 4MB Flash 2MB PSRAM", "og_networks": 15, "cpu": "ESP32S3", "espmem_total": 324940, "Pass": 39, "espmem_after": [224116, 223796, 223760, 223740, 223692, 223652, 223632, 223596, 223552, 223544, 223492, 223452, 223416, 223408, 223380, 223344, 223316, 223280, 223244, 223216, 223160, 223128, 223112, 223076, 223056, 223024, 222988, 222932, 222896, 222884, 222848, 222808, 222792, 222736, 222708, 222680, 222616, 222580, 222588, 222560, 222536, 222508, 222444, 222436, 222400, 222372, 222336, 222292, 222272, 222244], "Pass_env": 48, "espmem_before_env": [216280, 216512, 215508, 215504, 215508, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216796, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612, 216612], "PartialFail": 11}

F438

@bill88t
Copy link
Author
bill88t commented Sep 22, 2022

@saketvora
The data is exported as a simple json by the wifi tester, I manually formatted them to be more readable.

Wait. "CompleteFail": 0 and PartialFail": 11???
So ESP32-S3 behaves differently..
The fact that CompleteFail is 0 means that at none of the 50 .env-less runs did it go blind.
So the bug's different there?

This bug is totally stalling my project on the Feather ESP32-S3.

From these results it means it can see at least some wifi's most of the times.
(PartialFails's on my runs were it seeing only wifi's VERY near it)
And in most cases it scanned just fine.

So slightly reduced range?
Can you not connect to them or something?

And since these results are vastly different from mine, could you please run it again with:

# configure how many runs
env_runs = 200
non_env_runs = 200
# each must be >0

set at the top of the script?

This will make it do many more runs.
(200 with env, 200 without env)

Note: To set the test to run again:

  1. Open the REPL and stay at the >>> prompt
    (this is so the board does not autoreload during the file swaps)
    (If you do not, it may bootloop trying to remount rw, you can recover by entering safe mode)
  2. rename boot_disabled.py back to boot.py
  3. remove the current results file
  4. eject the board
  5. Ctrl+D

@charlien
Copy link
charlien commented Sep 24, 2022

I have been playing with an esp32 proS3 and wifi has been almost completely unusable BUT only when I have the board connected to my desktop via the usb-c cable. At first I thought it was because I was screened into the board's terminal but it didn't matter in the end.

Using the same code, without change, it connects to WiFi almost 100% of the time after reboots when the board isn't connected to the the PC. I have an ssd1306 connected which display the wifi status, which is how I confirm its connection (ip address) in addition I keep a ping active on another computer to the board.

Configuration:

Adafruit CircuitPython 8.0.0-beta.0 on 2022-08-18; ProS3 with ESP32S3
Board ID:unexpectedmaker_pros3
UID:4F21AF24404E

Memory Info - gc.mem_free()

8153840 Bytes

Flash - os.statvfs('/')

Size: 12224512 Bytes
Free: 9877504 Bytes

@UnexpectedMaker
Copy link
UnexpectedMaker commented Sep 30, 2022

@charlien What channel is the ProS3 connecting to? Please try it again having it connect on a channel 9 or above.

@charlien
Copy link
charlien commented Sep 30, 2022

@charlien What channel is the ProS3 connecting to? Please try it again having it connect on a channel 9 or above.

I figured it out. At first I did attempt channel 9, but that didn't work so I had it iterate through all channels until it could find an address but that failed as well.

The problem comes back to the same reason why I have to remote my logitech 2.4GHz mouse receiver. USB interference (ref: https://www.usb.org/sites/default/files/327216.pdf). I should've known since the WiFi module only supports 2.4GHz. Thanks for checking in though @UnexpectedMaker, enjoying the small footprint board so far!

This also solved the ability for it to scan other networks as well, for the same reason.

@UnexpectedMaker
Copy link

@charlien What channel is the ProS3 connecting to? Please try it again having it connect on a channel 9 or above.

I figured it out. At first I did attempt channel 9, but that didn't work so I had it iterate through all channels until it could find an address but that failed as well.

The problem comes back to the same reason why I have to remote my logitech 2.4GHz mouse receiver. USB interference (ref: https://www.usb.org/sites/default/files/327216.pdf). I should've known since the WiFi module only supports 2.4GHz. Thanks for checking in though @UnexpectedMaker, enjoying the small footprint board so far!

This also solved the ability for it to scan other networks as well, for the same reason.

Oh fantastic that you found the cause - bummer about what it is though!

@askpatrickw
Copy link

This explains a lot of weird WIFI stuff on my workbench where I have a USB3.0 hub that I plug my boards into... 🤦‍♂️

@anecdata
Copy link
Member
anecdata commented Oct 4, 2022

Testing on:
Adafruit CircuitPython 8.0.0-beta.1 on 2022-10-01; Adafruit QT Py ESP32S2 with ESP32S2

Turning web workflow on makes the scanning (and connection) more reliable. The following code, executed before a wifi scan or wifi connect, also fixes the exceptions for me:

wifi.radio.enabled = False
wifi.radio.enabled = True

It seems that something in the non-web-workflow path in CP 8.x isn't fully init-ing or de-init-ing wifi. And it seems, based on Neradoc's bisect, that it's related to either the esp-idf 4.4 update itself or to one of the small handful of ancillary changes made at the same time.

Update: Not quite that simple... the list of networks returned in a scan is shorter after a reload than it is on a fresh reset (and they're all on channel 1 and have very strong signals). Connections still look fine if wifi is disabled and re-enabled, and no scan is done. But, if the code disables and re-enables wifi, then connects, and then scans wifi... subsequent connection attempts won't work after reload. i.e., the presence of a scan affects post-reload behavior for scan and connection.

@DJDevon3
Copy link
DJDevon3 commented Oct 5, 2022

From my experience I agree with your findings. Scan only detected strong signals and only ones that were on the same channel. I had my AP's set on autoscan and the time they both showed up they were both on channel 6. All the other 10 or so weak SSID's that normally populate from my neighbors were absent. I manually set one AP to channel 10 and the other to channel 11. Rescanned and only the AP on channel 10 showed up. It's all very random.

I'm on the Adafruit Feather S2 2M PSRAM for testing in this particular issue, I have 3 of them btw and all exhibit identical symptoms. Adafruit CircuitPython 8.0.0-beta.0-72-ga7b10d41b on 2022-09-26; Adafruit Feather ESP32S2 with ESP32S2

@anecdata
Copy link
Member
anecdata commented Oct 5, 2022

Same caveat as my comment on issue 6791: the wifi.radio.enabled sequence isn't repeatable. I tested it somewhat extensively yesterday afternoon, but wasn't able to replicate it later. @KeithTheEE reported a similar inconsistency. Some odd fluke that made it work for a while, but not again, so quite puzzling.

@ghkites
Copy link
ghkites commented Oct 5, 2022

Similar observations here with an ESP32-S3 (4MB Flash, 2MB PSRAM) on 8.0.0 beta 1. I switched my WiFi to channel 1 for the time being, and that has helped -- it was on 11 and I couldn't even always get a connection after a hard-reset (only 70% success rate, give or take), and even if it did connect, it never reconnected after a soft-reset. With the lower channel, it seems consistently able to get a connection after a hard-reset.

With web workflow disabled, initial connection is consistent, but WiFi scans get almost no results after a soft-reset - only strong signals, and only those on lower channels (I mostly only see channel 1). If my WiFi is on channel 11, it's completely unable to connect after a soft-reset, even with the AP being 6 feet away. Only channel 1, it'll usually reconnect after a soft-reset.

With web workflow enabled, it consistently connects and WiFi scans consistently return results even after a soft-reset (including some APs on channels 6 and 11).

@bill88t
Copy link
Author
bill88t commented Oct 5, 2022

wifi.radio.enabled is not a viable workaround even with sleep involved.
Feels like a good luck charm at best.

@KeithTheEE
Copy link

(This'll cover most of the details I outline in the linked discord message that way the information is here in one place, and will have some added information that I hadn't finished putting together until now.)

Implementing wifi.radio.enable toggling worked for me consistently through multiple soft and hard resets, and multiple changes to the code, until I changed the version of circuit python on the board, after I did that I could not get back to making the toggle work even after returning to the original version of circuit python I had been using when it worked.

While I had the wifi.radio.enable toggle working,

Using an Adafruit Metro ESP32-S2 I was on the nightly from Adafruit CircuitPython 8.0.0-beta.0-41-g33a100611 on 2022-09-10, tested the wifi radio enable toggle and it worked. Web workflow is not enabled on my device.

I reliably got the same number of wifi access points as I did from a hard reset. There were some ap's that varied but they were the ones with a weak signal and they were inconsistently in the hard reset scan so it just seemed like it was consistently difficult to see signal. I tried to see if it saw different networks after a hard or soft reset, and just started unplugging and plugging the board in over and over, and looking at and comparing the networks it saw with what it saw after letting it run for a while with soft resets. It reliably saw the same networks regardless of hard or soft resets for me. There were a few networks it wouldn't reliably see, but this was true both on hard and soft resets. They were weak access points so this isn't much of a surprise. The hard resets would see those networks with the same reliability/variability.

The wifi channels I'd see on soft resets after successfully toggling were not only channel 1. I reliably got channels 4, 5, and 11 (they are near enough and had a rssi that was usually > -70 but did have some variability). They made up around half of the access points the ESP32S2 saw.

While the toggle did make the ESP32S2 see the correct number of access points, after running the script through enough soft resets the board still crashed due to the same memory error as before: the espidf use of memory and allocation fails. It still lost around 32 bytes of space after each soft reload. So while the toggle worked it didn't address the memory leak.

Breaking the wifi.radio.enable toggle and trying to get it back

But then I wanted to make sure the 8 beta1 behaved the same, so I upgraded to that, and couldn't get the scan working off of soft resets even with wifi.radio.enable toggled between false and true. So then I erased flash and tried the beta, and toggling the wifi and it didn't work. Then I erased flash and went back to the sept10 nightly and the toggle was not working any more.
So I can't duplicate what I was doing before with the same code on the same version of circuit python.

Then on the off chance that there was some 7.3 variable that made the toggle work, (but really I was grasping at straws) I erased flash, installed that, then without erasing flash I tried the sept10 nightly. That didn't work. Trying the same approach of erasing flash, installing 7.3, I installed the 8 beta 1 without erasing flash and tried that. It didn't work either.

So while the toggle bodge worked briefly, I can't make it work again and I don't know how to coax the platform into a state where it works again. Even while it was working the memory leak still persisted.

@dhalbert
Copy link
Collaborator
dhalbert commented Oct 6, 2022

I have now had time to test with a Metro ESP32-S2, using variants of @anecdata's test program. web workflow is off.

import time
import microcontroller
import supervisor
import wifi

def scan_wifi():
    for network in wifi.radio.start_scanning_networks():
        print("{0:02X}:{1:02X}:{2:02X}:{3:02X}:{4:02X}:{5:02X}".format(*network.bssid), end=" ")
        print("{:>2d}".format(network.channel), end=" ")
        print(network.rssi, network.authmode, network.country, network.ssid)
    wifi.radio.stop_scanning_networks()

while True:  # change to an if to do a reload or reset
    time.sleep(5)
    #wifi.radio.enabled = False
    #wifi.radio.enabled = True
    scan_wifi()
    time.sleep(5)
supervisor.reload()
#microcontroller.reset()

As others have seen, doing a full reset each time around shows all the available AP's. Or just repeatedly scanning every few seconds does the same, without a reload or reset. Changing to supervisor.reload() causes it to show the full complement the first time, and then none or occasionally one on subsequent passes, so there's some odd state thing going on. The .enabled toggle makes no difference.

I tried lengthening the sleeps, but it made no difference.

I am not going to look at the ESP-IDF internal heap consumption until this is debugged, since I think fixing this may fix the storage leak.

@dhalbert dhalbert self-assigned this Oct 6, 2022
@DJDevon3

This comment was marked as resolved.

@dhalbert

This comment was marked as resolved.

@DJDevon3

This comment was marked as resolved.

@dhalbert
Copy link
Collaborator
dhalbert commented Oct 7, 2022

I have made this problem go away by using the stock espressif/esp-idf, v4.4.1 or v4.4.2. No problems even with the tip of circuitpython main: full scans on every reload. Next step is to figure out what is different between about rather simple adafruit fork of esp-idf. Thanks @Neradoc for the bisect mentioned earlier, which was a big hint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

0