8000 Option to use higher gcc optimization levels for builds · Issue #3183 · adafruit/circuitpython · GitHub
[go: up one dir, main page]

Skip to content
< 8000 div id="js-flash-container" class="flash-container" data-turbo-replace>

Option to use higher gcc optimization levels for builds #3183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DavePutz opened this issue Jul 21, 2020 · 2 comments
Closed

Option to use higher gcc optimization levels for builds #3183

DavePutz opened this issue Jul 21, 2020 · 2 comments

Comments

@DavePutz
Copy link
Collaborator
DavePutz commented Jul 21, 2020

Doing some testing, I have seen that using -O2 instead of -Os for boards that have sufficient flash memory can be quite helpful.
Some examples:
On A PyPortal, using the script from https://pastebin.com/BAUS82X9:
With -Os

  neopixel flicker,6-0-0,3.80859
  neopixel rainbow,6-0-0,3.41797
  GPIO on/off benchmark,6-0-0,3.56396
  integer sum,6-0-0,4.53271
  integer multi,6-0-0,6.50293
  float sum,6-0-0,2.94238
  float multi,6-0-0,2.94873
  float divide multi,6-0-0,3.01025

With -O2

  neopixel flicker,6-0-0,1.729
  neopixel rainbow,6-0-0,2.164
  GPIO on/off benchmark,6-0-0,2.172
  integer sum,6-0-0,3.13899
  integer multi,6-0-0,4.224
  float sum,6-0-0,1.799
  float multi,6-0-0,1.804
  float divide multi,6-0-0,1.823

So, improvements from 30% to 54%

To test displayio, I ran a test that draws rectangles in a loop.
With -Os

  >>> exec(open('dispio_test.py').read())
  Start: 3728.96
  End: 3734.75

With -O2

  >>> exec(open('dispio_test.py').read())
  Start: 19.429
  End: 24.671

An improvement of about 10%

Running a ulab sample program on an ItsyBitsy M4 Express:
With -Os

  Computing the RMS value of 100 numbers
  traditional                    :    2.612ms [result=3535.843611]
  ulab, with ndarray, some implementation in python :    0.254ms [result=3535.853624]
  ulab only, with list           :    0.314ms [result=3535.854340]
  ulab only, with ndarray        :    0.065ms [result=3535.854340]

With -O2

  Computing the RMS value of 100 numbers
  traditional                    :    2.150ms [result=3535.843611]
  ulab, with ndarray, some implementation in python :    0.217ms [result=3535.853624]
  ulab only, with list           :    0.256ms [result=3535.854340]
  ulab only, with ndarray        :    0.056ms [result=3535.854340]

An improvement of 15%-18%

These results would indicate that being able to choose a higher optimization level for boards that
have more memory would be worthwhile.

@tannewt
Copy link
Member
tannewt commented Jul 21, 2020

Very cool! Please make these changes.

@dhalbert
Copy link
Collaborator

Addressed by #3190.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0