8000 Modified bootWiFi2 method to call m_wifi.connectAP forever or until it successfully connects by Brisdo · Pull Request #475 · nkolban/esp32-snippets · GitHub
[go: up one dir, main page]

Skip to content

Modified bootWiFi2 method to call m_wifi.connectAP forever or until it successfully connects #475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 24, 2018

Conversation

Brisdo
Copy link
@Brisdo Brisdo commented Apr 13, 2018

Prior to this change, bootWiFi2 would sometimes return before the ESP32 had successfully connected to WiFi - specifically when a staDisconnected event was received.

This causes BootWiFi::boot() to hang, waiting indefinitely on m_completeSemaphore.wait("boot");. This happens because the only place the semaphore is resolved is in the staGotIp event handler. Unfortunately, the staGotIpevent never occurs because bootWiFi2 has returned and so there are no retry attempts.

This change ensures that ESP32 has successfully connected before BootWiFi returns (or keeps trying forever).

Copy link
Owner
@nkolban nkolban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user changes their password on their access point? This would lock up the application forever as it attempts to connect forever. Wouldn't we want to force a failure that the programmer can then handle in a distinct manner. If we just loop, we will get an issue that says "When I call Bootwifi, it just hangs".

@Brisdo
Copy link
Author
Brisdo commented Apr 13, 2018

Apologies. I didn't do an optimal job of explaining the reason in my original pull request. I've updated it to better explain the situation. BTW, the original pull request read: "Prior to this change, BootWiFi would sometime return before the ESP32 had successfully connected (specifically when a staDisconnected event was received)."

Let me talk through my thinking. What got me on to this is that I had a small project running for a couple weeks. I found I needed to reboot it relatively frequently and sometimes had to reboot it multiple times before it finally worked again. It got me thinking that from an end-users viewpoint, they would see this behavior as being quite unreliable. After some light debugging, I've identified 2 causes (so far):

  1. This problem (where BootWiFi::boot() hangs).
  2. Libcurl sometimes gets into a "mode" where it returns "Couldn't resolve host." every time a call is made. I haven't yet tried to debug this yet other than a) verifying the host is resolvable and working. b) rebooting the ESP32 which eventually resolves the problem.

In my mind, trying to connect once, getting a staDisconnected event and then hanging doing nothing isn't really useful. On the other hand, trying to connect over and over is good - most of the time it will eventually connect and get an IP address. So, in a fashion, this pull request does trade one hang for another. But, I thinks it's a better option.

Can we do better still - to your points about:

What if the user changes their password on their access point? ... Wouldn't we want to force a failure that the programmer ...

A few items:

  1. I don't know how to catch the incorrect password error/event. If there is a way then I think it would be a worthwhile addition.
  2. If there is no way to specifically deal with the password error/event, then I don't think forcing a failure (or hanging) when a staDisconnected event is received is the right approach. It may be my lack of creativity, but in this case, since the ESP32 has no network connection and no screen I don't see what the programmer could do to contact the user and let them know there's a problem that can't be done in other ways. E.g. Maybe you'd flash an LED red to let the user know there's a problem? Well, you could also set that led to flashing red when booting and green once BootWiFi::boot() has finished with almost the same result.

@HanVertegaal
Copy link
Contributor
HanVertegaal commented Apr 13, 2018

@Brisdo, have you included the fix for issue #421 in your code? I find that it reliably connects if the configured AP can be found.

My own (modified) BootWiFi switches to AP mode when the ESP32 fails connect to the configured AP. Once an IP address is obtained from the AP, it will keep trying to reconnect indefinitely if the connection is lost for some reason.

I can imagine one could have the software perform a limited number of retries while connecting to the configured AP, and only then switch to AP mode.

@Brisdo
Copy link
Author
Brisdo commented Apr 13, 2018

@HanVertegaal - yes, I pulled the latest yesterday which includes issue #421.

The problem is in WiFi.cpp in two places:

  1. In WiFi::eventHandler at line 257 if either a SYSTEM_EVENT_STA_GOT_IP or SYSTEM_EVENT_STA_DISCONNECTED event is received
    1.1 Then the m_connectFinished.give semaphore gives at line 264: pWiFi->m_connectFinished.give();
  2. Which causes the do-while to complete in WiFi::connectAP at line 205 while (!m_connectFinished.take(5000, "connectAP"));
    2.1 And WiFi::connectAP returns without connecting or retrying...

Although WiFi::connectAP does return false, BootWiFi::boot() does not check the return value or try again (prior to this pull request) and BootWiFi::boot() hangs without connecting or returning.

So, bottom line is that if WiFi::connectAP receives a DISCONNECTED event it returns without connecting causing BootWiFi::boot() to hang.

255     // If the event we received indicates that we now have an IP address or that a connection was disconnected then unlock the mutex that
256     // indicates we are waiting for a connection complete.
257     if (event->event_id == SYSTEM_EVENT_STA_GOT_IP || event->event_id == SYSTEM_EVENT_STA_DISCONNECTED) {
258       if (event->event_id == SYSTEM_EVENT_STA_GOT_IP) {  // If we connected and have an IP, change the flag.
259         pWiFi->m_apConnected = true;
260         // pWiFi->m_connectFinished.give();
261       } else {
262         pWiFi->m_apConnected = false;
263       }
264       pWiFi->m_connectFinished.give();
265     }
196     m_connectFinished.take("connectAP");   // Take the semaphore to wait for a connection.
197     do {
198         ESP_LOGD(LOG_TAG, "esp_wifi_connect");
199         errRc = ::esp_wifi_connect();
200         if (errRc != ESP_OK) {
201             ESP_LOGE(LOG_TAG, "esp_wifi_connect: rc=%d %s", errRc, GeneralUtils::errorToString(errRc));
202             abort();
203         }
204     }
205     while (!m_connectFinished.take(5000, "connectAP")); // retry if not connected within 5s
206     m_connectFinished.give();
207     
208     	ESP_LOGD(LOG_TAG, "<< connectAP");
209     	return m_apConnected;  // Return true if we are now connected and false if not.
210     } // connectAP

@Brisdo
Copy link
Author
Brisdo commented Apr 13, 2018

Having seen the while (!m_connectFinished.take(5000, "connectAP")); yesterday, another option I thought about was to submit a pull request to remove the disconnect event check at line 257:

|| event->event_id == SYSTEM_EVENT_STA_DISCONNECTED

I decided not to go that route because:

  1. I suspected the impact of changing the way WiFi.cpp works would be broader
  2. I wasn't sure why it was there

@Brisdo
Copy link
Author
Brisdo commented Apr 14, 2018

The Wifi docs, say there's a number of things that can cause a SYSTEM_EVENT_STA_DISCONNECTED event so it alone does not tell you that the password is bad. But, 4. Wi-Fi Connect Phase section says "In a case like this, <SYSTEM_EVENT_STA_DISCONNECTED> will arise and the reason for such a failure will be provided."

@Brisdo
Copy link
Author
Brisdo commented Apr 14, 2018

In WiFi::eventHandler tried printing out the event->event_info.disconnected.reason. With the correct password, when reproducing the problem and the SYSTEM_EVENT_STA_DISCONNECTED event occurred and the reason code was: 2 = WIFI_REASON_AUTH_EXPIRE.

Changed the password to garbage - which also caused SYSTEM_EVENT_STA_DISCONNECTED events, but this time with reason code: 201 WIFI_REASON_NO_AP_FOUND.

So, another option would be for WiFi::connectAP to return a reason code rather than simply true or false. That way the programmer can then handle in a distinct manner. BootWiFi would also need to be changed to handle the reason code by either:

  1. returning the reason code
  2. retrying for most/all reason codes except 201 WIFI_REASON_NO_AP_FOUND (plus possibly 202 WIFI_REASON_AUTH_FAIL and maybe 203 WIFI_REASON_ASSOC_FAIL)

Thoughts?

@HanVertegaal
Copy link
Contributor

Returning the reason sounds like a good idea. The programmer can then make an informed decision on whether to retry or switch to AP mode.

…ceives a SYSTEM_EVENT_STA_GOT_IP event. Otherwise returns wifi_err_reason_t
…ceives a SYSTEM_EVENT_STA_GOT_IP event. Otherwise returns wifi_err_reason_t
@nkolban nkolban merged commit a578b10 into nkolban:master Apr 24, 2018
nkolban added a commit that referenced this pull request Apr 24, 2018
Additional to PR #475.  Addresses BootWiFi::boot hanging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0