RFC: Baremetal bootstrapping process (for esp8266 and in general) #1955

pfalcon · 2016-04-01T13:14:42Z

This is to finalize design of bootstrapping process for new baremetal ports, exemplified by esp8266 port.

Requirements:

It should be possible to do filesystem setup/mounting on Python side.
It should be possible to execute user-modifiable configuration code on boot.
It should be possible to execute user application on boot.

pfalcon · 2016-04-01T13:25:00Z

Based on req.1, as there's no filesystem at the time of execution of this code, the only place it could live is a frozen module. Being a frozen module, it's not easily changeable by end users, so cannot be the only boot-executed code. Let this frozen module be named "_boot.py". _boot.py executes, mounts filesystem, and performs other non user configurable initialization. After that, it executes "boot.py" file from mounted filesystem. That module can be freely edited by user. At the end, "boot.py" executes "main.py" if available, which is user application to start on boot.

pfalcon · 2016-04-01T13:30:14Z

So, this proposes 3-stage bootstrapping process (can still be 2-stage, if a particular port mounts filesystem in C code, then _boot.py frozen module can be skipped). The main difference from current implementation, e.g. in stmhal port, is that running next stage is the responsibility of the current stage, and done in the Python code, instead of requiring adhoc support from interpreter. Thus, it's more flexible and gives more freedom to user.

vfl68 · 2016-04-02T07:46:13Z

The boot.py could be used to have common configurations for differents applications. Nice idea.

But I would suggest to have another name, either for boot.py or boot.py, it could be confusing for some people. It could be init.py and boot.py, or boot.py and config.py or something else.
Just something with more differences than just ''

pfalcon · 2016-04-02T09:10:55Z

@vfl68 : This naming convention, "_mod" for "internal" module and "mod" for "user-facing" module is standard in Python.

deshipu · 2016-04-03T12:04:58Z

I'm not entirely sure how the namespaces work for boot.py. I noticed that if I import something in it, the imported name is available in the REPL. Is that accidental, or on purpose?

pfalcon · 2016-04-03T17:08:04Z

Well, that's something to consider - whether _boot.py should be import'ed or execfile()'d. I generally consider import as a cleaner and more general approach (with import, _boot would have its own module namespace). For all other stages (boot.py, main.py), a decision can be made by previous stage, in Python code.

dhylands · 2016-04-03T18:53:51Z

Personally, I _really_ like the fact that boot.py is execfile'd. boot.py can always import a module if you want stuff in its own namespace.

Being able to have a bunch of commands be a part of the environment is really useful when you're using the REPL as a place for experimenting with the hardware.

It means that I have less typing to do on the REPL, which makes using it more pleasant.

dpgeorge · 2016-04-04T13:07:22Z

The original reason for having boot.py separate from main.py was so that C set-up code could be run in between the 2 scripts. Most notably USB set up. If boot.py does not configure any USB mode then it gets its default (MSC+VCP), and this is configured after boot.py exits but before main.py starts.

Also, exec'ing main.py from boot.py, and boot.py from _boot.py would lead to large stack usage (many nlr buffers pushed, lots of other state on the stack).

I don't see anything wrong with the current approach. boot.py is exec'd, then main.py. Simple. Don't put anything in boot.py (or don't even create it) if you don't want to use it. Then just have main.py import (or execfile) your application if you want that workflow.

deshipu · 2016-04-04T14:05:56Z

That makes sense. So, for instance, you could disable the automatic connecting to the last network in the boot.py? Or, say, switch the ADC between measuring the voltage on the ADC pin or the VCC?

pfalcon · 2016-04-05T17:07:36Z

I don't see anything wrong with the current approach. Simple.

No, it's not exactly simple. It brings questions like "Why all these complications with interpreter running various adhoc file, if it can run one, and then that file can run whatever else." That's simple.

Also, exec'ing main.py from boot.py, and boot.py from _boot.py would lead to large stack usage (many nlr buffers pushed, lots of other state on the stack).

Ok, suppose that's the reason (because "being able to run C code" alone doesn't cut, it can be called from Python). So then the is sequence:

    pyexec_frozen_module("_boot");
    pyexec_file("boot.py");
    pyexec_file("main.py");

?

Are we sure about those exec's? Because --

Being able to have a bunch of commands be a part of the environment is really useful when you're using the REPL as a place for experimenting with the hardware.

-- what's good for "experimenting", isn't really good for production, with completely random stuff contaminating the application environment. That can be classified as security issue.

pfalcon · 2016-04-05T17:12:28Z

So, for instance, you could disable the automatic connecting to the last network in the boot.py? Or, say, switch the ADC between measuring the voltage on the ADC pin or the VCC?

Yes, something like that, except both examples aren't good - those matters are handled by lower levels of vendor SDK and are outside of our immediate control. Current issue at hand is that webrepl for example will be completely user-level class, started in boot.py with:

webrepl.start(port=9999, passwd="foo")

A user will be able to comment it out or change any param there.

dhylands · 2016-04-05T18:11:33Z

Being able to have a bunch of commands be a part of the environment is really useful when you're using the REPL as a place for experimenting with the hardware.

-- what's good for "experimenting", isn't really good for production, with completely random stuff contaminating the application environment. That can be classified as security issue.

It only contaminates if you don't clean up after yourself. It's easy to make an exec'd boot.py have a single line which imports stuff (so you get your clean solution). It's no where near as easy to make imported stuff replicate the way it is today.

pfalcon · 2016-04-05T18:18:33Z

It's no where near as easy to make imported stuff replicate the way it is today.

As I mentioned previously, proper pythonic way to do that is to assign attributes to to builtins module.

deshipu · 2016-04-05T18:23:03Z

It's no where near as easy to make imported stuff replicate the way it is today.

As I mentioned previously, proper pythonic way to do that is to assign attributes to to builtins module.

Wouldn't that contaminate pretty much all namespaces, all imported modules and all libraries you use, instead of just making a name available in the REPL?

pfalcon · 2016-04-05T18:31:37Z

Well, "contaminate" is "unexpectedly and unforseenly leaking unrelated data". E.g. you have password = "my_secret_password" in boot.py and then user app which dumps all variables sends it out to the world. Obviously, you can't unexpectedly put something to builtins, only if you really want that.

deshipu · 2016-04-05T18:53:38Z

Any user app would be in a module of its own, with its own local variables, not the ones in REPL. So you don't really "contaminate" it, do you? In fact, it has no way of accessing the REPL's namespace at all, short of doing some dirty introspection tricks.

pfalcon · 2016-04-05T19:05:46Z

There's no "REPL namespace". If setup from #1955 (comment) is used (and that's pretty much stmhal's setup), main.py will have to access to variables set by boot.py.

dhylands · 2016-04-05T19:12:42Z

As I mentioned previously, proper pythonic way to do that is to assign attributes to to builtins module.

Seems that technique is considered "bad" and not pythonic at all.
http://stackoverflow.com/questions/6965090/how-to-add-builtin-functions

And that sort of wipes out your security "issue". Sure if you store your password someplace in memory, that's a security issue, but it doesn't matter where it is its still the same security issue. There's nothing to stop any python code from reading the module source (or extracting the strings from the bytecode). Security by obscurity is no security at all.

pfalcon · 2016-04-05T19:45:27Z

Seems that technique is considered "bad" and not pythonic at all.

By some random dude on the internets? D'oh. But if you read his long and confusing reply, you'll see that essence of his reply is in the last small phrase at the end: "A better method is to use Python's existing import statement". That is, per that dude the only right way is to have a separate module, and nobody cares if you need to do more typing. Even adding new stuff to standard module "isn't right". If he'd knew about your dirty hacks, @dhylands, he'd probably have a real cognitive dissonance. Btw, do you know how to do your trick in CPython? Hint: it's possible. But then compare how it's handled in REPL vs running a script case.

And that sort of wipes out your security "issue".

That definitely does not.

Security by obscurity is no security at all.

You see, it's one thing if "security by obscurity" government issues you a paper passport - anybody who'll got their hands on it may know too much about you. But it at least small closed book which can be secured more or less easily (even if by obscurity), and any stranger who reaches out for it definitely commits a crime. It's different thing if government issues your passport as an A4 sheet with ties to be hanged around your neck.

deshipu · 2016-04-05T19:52:28Z

I have to chime in here about modifying builtins being considered a very bad and "unpythonic" practice. It's what a Ruby programmer would do. It's one of the chief reasons why the web2py framework is considered such a horrible design. But don't rely on our word, just go to the #python channel on Freenode and ask there.

pfalcon · 2016-04-05T20:21:59Z

There's no need to ask anybody's opinion. There's simply the only way to do that in a reliable way (to avoid case when some code sees your function and another code doesn't) - assigning to builtin. For that purpose it exists and it is thus objective reality. Discussions like that are akin to discussing "why Pi is bad". Indeed, there can be unlimited number of cognitive agents which can provide unlimited amount of subjective opinions why it is, but when you need to find out circumference of a circle, you just multiply its diameter by Pi, that's all.

To sum up the discussion so far:

The design based on exec'ing stuff into same namespace shows traits of bad practices.
The argument that it offers interesting usability loopholes is weak, as the alternative is shown, and thus it should not be deciding whether exec model is kept or not.
(There can be argument of optimality - after all, that was deciding argument in the previous part of discussion, but that's exactly what we lack and instead discuss how good is Pi).

dhylands · 2016-04-05T21:23:31Z

I would sum it up as there is the way Paul wants to do it, and the way that everybody else wants to do it, since I've not seen anybody else chime in to support your views.

deshipu · 2016-04-05T21:28:06Z

I want to point out that it was you who proposed using buildins as "more
pythonic", and now, when pointed to the fact it's not, are dismissing this as
unimportant. Fine, ignore the 20 years of experience of the Python community in
what is a good and what is a bad practice, let's invent a completely new
language, why not. You have the power to do that. Let's focus on technical
matters.

Modifying the buildins, or in fact, monkey-patching any globally available
module from outside of this module, leads to very real and practical problems.
Modules are no longer independent units of execution. Programmers who write
them have to anticipate changes in how the runtime behaves, or even worse, may
expect certain changes. Also, because this is global state, you can get
unexpected effects, or even conflicts between modules expecting certain
changes.

Finally, I just did a small test with the boot.py module and some module
that just prints a name, and I discovered that you outright lied in one of the
previous comments. Execing boot.py introduces names in the REPL's
namespace, but not into the namespaces of the modules you are importing. They
have their own module-scope namespaces, like you would expect in proper Python
implementation.

deshipu · 2016-04-05T21:32:17Z

Here's an experiment to run with the current github code. Create two files in the scripts directory. A boot.py file containing this code:

variable = "test"

and a test.py file containing this code:

print(variable)

Now compile and flash the firmware. Go into the repl and try those commands:

>>> variable
'test'
>>> import test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 1, in <module>
NameError: name not defined
>>>

QED

pfalcon · 2016-04-05T21:46:24Z

since I've not seen anybody else chime in to support your views.

Sure, the whole point of this ticket is to re-review and challenge existing bootstrap process, to look for opportunities to make it better. People who just keep repeating "I'm get used to whatever exist and I don't to change anything because I'm get used to it" don't add much to the discussion.

dhylands · 2016-04-05T21:52:44Z

Well stuff visible in boot.py is visible in main.py, but that's the extent of it (becaus main.py is execfile'd as well). So if you import pyb in boot.py, you can use pyb without importing it in main.py. But any modules that main.py imports need to import it for themselves.

So that's clearly much less intrusive than putting things in builtins.

Another solution would be to have a repl.py which was exec'd just before entering the REPL. This would be optional, and the file wouldn't have to exist.

pfalcon · 2016-04-05T21:56:28Z

and now, when pointed to the fact it's not, are dismissing this as unimportant.

Just to clarify, my point during that part of discussion was that, on a scale where adding things to builtin is "bad", what @dhylands does is "much worse". But that's indeed unimportant, because in reality, both of them have their uses. It's just one is natural pythonic way known to everyone (the fact that there's cargo cult against its usage just proves that), while another is rather hackish way.

let's invent a completely new language, why not

I'm exactly against inventing anything "new" without good reason. (I'm all for inventing new if there's).

dhylands · 2016-04-05T21:56:55Z

Sure, the whole point of this ticket is to re-review and challenge existing bootstrap process, to look for opportunities to make it better. People who just keep repeating "I'm get used to whatever exist and I don't to change anything because I'm get used to it" don't add much to the discussion.

I like it the way it is. Not because that's what I'm used to, but because I actually find it useful. That's one of the nice things about programming, I can let the computer do stuff for me. I shouldn't have to type extra stuff just for the sake of typing extra stuff.

pfalcon · 2016-04-05T22:07:39Z

Here's an experiment to run with the current github code.

Your test is incorrect. You first start with applying code from #1955 (comment) . Then, you create following program (main.py):

print(password)

The expected outcome of running that program is error due to undefined variable.

Now create boot.py:

password = 'foo'

Now, the behavior of main.py has changed. If boot.py and main.py were created at different times, the behavior of main.py has changed in unpredictable way. If password variable actually contained a valuable password, then main.py exposed it, even if that was not intent of main.py's author. That's the difference from the case @dhylands mentioned earlier. The app looking like:

import boot
...

would easily raised concern - a user app should not normally deal with system boot code, the code above is easy to automatically screen for, and their review in detail for malicious intent (or just ban conservatively).

But with exec-only way, any application becomes a potential trojan horse and/or unpredictable (yes, probability is low - until people start to target such vulnerability; in general, in security analysis, thing which can be a security issue, is a security issue).

deshipu · 2016-04-05T22:16:47Z

Could take a step back and define what we actually need and want, in form of required features and nice-to-haves, and refrain from proposing solutions until we have a good understanding of the situation? If we start with a solution, we will necessarily divide ourselves into supporters and opponents, and that is hardly productive.

deshipu · 2016-04-05T22:19:43Z

Also, no offense @pfalcon, but you are the person who makes the final decision on this, as the implementer. Your role is special, and you have great power and great responsibility here. Of course it's up to you, but I'm not sure it's a good idea to take sides in this position.

pfalcon · 2016-04-05T22:20:40Z

Could take a step back and define what we actually need and want, in form of required features

I started this ticket with the list of requirements.

deshipu · 2016-04-05T22:23:52Z

Apparently it's not detailed enough, because you are coming up with additional requirements, not listed initially, to make one proposal seem better than another. Like the requirement for "security" that you implicitly add in the example above. Perhaps it should be added to the initial requirements, and we should also think what attack vectors we would like to secure against, so that we are not trying to build on quicksands with fluid requirements.

pfalcon · 2016-04-05T22:25:52Z

Also, no offense @pfalcon, but you are the person who makes the final decision on this

I don't, @dpgeorge does ;-). And I really try to do my best to challenge what he'd done to see if anything can be improved, and that we understand implications of each choice. (And generally mark this as done and not return to it later. Even if people "threaten" to issue CVE against us, we reply: "Yeah, vulnerability. By design.")

dhylands · 2016-04-05T22:35:49Z

I'd like to add a requirement that it should be possible to have the equivalent of CPython's -i flag.

So if I have a file repl.py:

import os
print('Executing repl.py')
x = 3

With CPython you can then do:

515 >python3 -i repl.py 
Executing repl.py
>>> x
3
>>> os.listdir()
['repl.py']
>>>

Obviously, bare-metal doesn't have a command line, so I'm perfectly happy if the filename is hard-code. Then I'm perfectly to have boot.py and main.py be imported.

deshipu · 2016-04-05T22:38:14Z

I don't think we can realistically safeguard about the kind of scenario you describe in your example. Python is a dynamic language with very powerful introspection, and the general approach in its design has always been to give the programmer full power that is available, assuming they are a consenting adult and know what they do, while at the same time avoid opportunities of silly mistakes. This design philosophy means that there are no enforced boundaries in the code, no "private" methods, etc. and that whoever gets to execute code in your process, pretty much has full access to everything in it, if not explicitly, then through introspection mechanisms.

For instance, if the boot.py was ever imported by normal import mechanisms, you can easily access all of its variables importing it again (the imported modules are cached and the code only gets executed the first time they are imported, so there would be no error). An no, you cannot guard against that with a simple "grep 'import boot'", because you can also use __import__ with a calculated variable as a paramater, use one of the alternate forms of "import" such as "from boot import password", "import boot as data" etc., or simply access the variables with sys.modules["boot"] (again using a variable if needed). Finally, you can even have your program dynamically generate bytecode that accesses the relevant data, and have it execed at runtime. Heck, Micropython lets you include assembly that has access to the whole chip!

Even if you managed to safeguard against all those cases (at the cost of severely modifying the design of the language itself), there are other ways to do this that I can't even think of. Python was not designed with this kind of safety in mind, and it's pointless to try to enforce it.

dhylands · 2016-04-05T22:40:51Z

Even if people "threaten" to issue CVE against us, we reply: "Yeah, vulnerability. By design.")

I'm not seeing the vulnerability. With the existing system all you need to do is put your stuff in a separate .py file and import it, and then you get the exact same behavior as if that file were imported in the first place. You'll never protect against people from doing insecure things.

If boot were imported and it contained something like:

password = "mypassword"

then at the REPL, all I have to do is:

import boot
print(boot.password)

I fail to see how importing boot is any more secure than execfile'ing it.

deshipu · 2016-04-05T23:09:45Z

The way this works currently on PyBoard, we have two files that get exec-ed at boot at different stages in the REPL's namespace: boot.py gets executed before the USB initialization, and therefore can affect how it is done, and main.py is executed at the end, possibly calling the user code. This resembles the way that CPython starts, executing site.py for the machine-specific settings, and then transferring the control to the module it was called with, or REPL.

In the case of the ESP8266 port, there is an additional step required for creating and mounting the filesystem. Since that can't be included in any of the files we already have, because they reside on that filesystem which is not yet mounted, there is a need for an additional stage to handle that.

That is pretty much where we stand now, if I am not mistaken. Now, so far we have two propositions:

Introduce an additional frozen module called _boot that would get executed before boot.py and main.py are checked for, and would do the filesystem setup. This would retain the behavior currently available on PyBoard and other ports.
Introduce the _boot module, and also remove all the logic executing the boot.py and main.py, moving it instead into that _boot module, and making it use standard import mechanism. This would have the side effect of the names defined in boot.py and main.py not being directly available in the REPL's namespace and in each of those modules (although they would still be accessible through introspection).

Is this an accurate summary of where we stand now?

deshipu · 2016-04-05T23:12:31Z

Since there is no USB setup on the ESP8266, there is no need to separate the boot.py and the main.py stages anymore. We could have a single stage instead, but that could be potentially confusing to users who know boot.py and main.py from other ports and expect both to work.

dpgeorge · 2016-04-06T11:05:36Z

The reason for having a separate boot.py is the same as why _boot.py is needed: to put board-specific config stuff in (eg FS, REPL, USB set up) so that you have a "standard" environment when executing main.py. When developing on a board you can usually just leave boot.py alone and change main.py. If there was no boot.py then you'd always need to do boring set up stuff at the start of main.py (eg init the USB) and if you forgot, or it had a bug, then you'd need to do a factory reset to restore it.

Having this scheme as standard across boards (boot does board-specific set up, main is the actual application) is, I think, a good way of doing things. Adding frozen _boot is a necessary evil to do board-specific set up when there is not even a filesystem.

Exec'ing these files (instead of importing) is useful because you can add to the globals() dict (which is the REPL namespace), instead of needing to polute the builtins module.

Internal security of the code is a non-issue because anything can be read using machine.mem32, and/or flash.read_blocks.

pfalcon · 2016-04-07T10:14:21Z

Adding frozen _boot is a necessary evil to do board-specific set up when there is not even a filesystem.

It's not evil, it's great achievement of allowing people to bootstrap even such low-level things a root fielsystem from Python side. "Evils" are things like large stack usage, lack of tail recursion support which interfere with that.

Internal security of the code is a non-issue because anything can be read using machine.mem32, and/or flash.read_blocks.

No, anything can't. Only something which accesses machine.mem32 or uses reflection can, and that can be automatically screened for. For example, package manager can have a switch "don't install insecure packages", which would scan source for those features and refuse to install offending packages. Such automated auditing is not possible (or much more complicated) in the presence of contaminated namespaces. That's why CPython doesn't use execfile'ing anywhere in its initializing sequence (accept for one corner case, which is carefully made to apply only to REPL, and explicitly excluded when running scripts). That alone would be enough to do it like CPython - a lot of people thought well how it should be done there. If we want to ignore that and invent our own stuff though, ok, if there's a good reason. Unfortunately, I see only one benefit - saving few bytes on creating namespace for boot.py.

deshipu · 2016-04-07T10:31:13Z

Screen for this:

import sys
name = "m\x61ch"
name += "ine"
__import__(name)
memory = getattr(sys.modules[name], name[0] + name[-1] + name[0] + '32')

There are literally thousands of ways you can do this, and it's impossible to automatically tell what it does without actually executing it first. This kind of "security" is not possible without completely redesigning the language, perhaps making it more like Java.

pfalcon · 2016-04-07T10:48:49Z

Screen for this:

__import__(name)

Very simple:

uses reflection

@deshipu , it's now my turn to get pissed off ;-). And tell you: please try to understand the discussion and use it as a chance to learn. For example, when you say "This resembles the way that CPython starts, executing site.py", you have checked that in CPython source code. Or otherwise, sorry, you have no idea what you're talking about, and then talking doesn't make sense.

deshipu · 2016-04-07T11:09:57Z

Are you suggesting that site.py is not executed by CPython on startup? Because the documentation (and my experience) says that it is, and I trust the documentation, because like in most mature tools, it's both up-to-date and complete.

You say that using __import__ is "using reflection" for you. How are you going to screen for that in your secure code analyzer? I can always do getattr(__builtins__, "__import__"), of course using computed variables instead of literal strings. You want to forbid access to __builtins__? No problem, I can fish it out of locals(). You are going to ban that? I can get it from any imported module.

In fact, I don't even need to call that __import__. It's enough that the machine module was imported anywhere else by any other module, and it will be accessible in sys.modules.

Finally, the kind of people who you want to protect, those who just download random code from the Internet and use it in their security-critical applications without any kind of audit, are going to happily ignore any warnings your tools display, if only the instructions wherever they downloaded the code from tell them to ignore them. This is the curl | sudo bash territory, and no amount of wishful thinking is going to make it secure.

deshipu · 2016-04-07T11:11:22Z

Oh, and let's not forget, that I can always just open the file with the source code, and read the password from that.

Upon start-up, _boot module is executed from frozen files to do early initialization, e.g. create and mount the flash filesystem. Then "boot.py" is executed if it exists in the filesystem. Finally, "main.py" is executed if exists to allow start-on-boot user applications. This allows a user to make a custom boot file or startup application without recompiling the firmware, while letting to do early initialization in Python code. Based on RFC #1955.

pfalcon · 2016-04-10T11:25:17Z

Implemented in esp8266 master.

Add support for PyGamer to Stage library

pfalcon added rfc Request for Comment port-esp8266 labels Apr 1, 2016

pfalcon closed this as completed Apr 10, 2016

tannewt added a commit to tannewt/circuitpython that referenced this issue Jun 19, 2019

Merge pull request micropython#1955 from pewpew-game/pygamer-stage

5e5252c

Add support for PyGamer to Stage library

Uh oh!

RFC: Baremetal bootstrapping process (for esp8266 and in general) #1955

RFC: Baremetal bootstrapping process (for esp8266 and in general) #1955

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!