-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
RFC: Baremetal bootstrapping process (for esp8266 and in general) #1955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Based on req.1, as there's no filesystem at the time of execution of this code, the only place it could live is a frozen module. Being a frozen module, it's not easily changeable by end users, so cannot be the only boot-executed code. Let this frozen module be named "_boot.py". _boot.py executes, mounts filesystem, and performs other non user configurable initialization. After that, it executes "boot.py" file from mounted filesystem. That module can be freely edited by user. At the end, "boot.py" executes "main.py" if available, which is user application to start on boot. |
So, this proposes 3-stage bootstrapping process (can still be 2-stage, if a particular port mounts filesystem in C code, then _boot.py frozen module can be skipped). The main difference from current implementation, e.g. in stmhal port, is that running next stage is the responsibility of the current stage, and done in the Python code, instead of requiring adhoc support from interpreter. Thus, it's more flexible and gives more freedom to user. |
The boot.py could be used to have common configurations for differents applications. Nice idea. But I would suggest to have another name, either for boot.py or boot.py, it could be confusing for some people. It could be init.py and boot.py, or boot.py and config.py or something else. |
@vfl68 : This naming convention, "_mod" for "internal" module and "mod" for "user-facing" module is standard in Python. |
I'm not entirely sure how the namespaces work for boot.py. I noticed that if I import something in it, the imported name is available in the REPL. Is that accidental, or on purpose? |
Well, that's something to consider - whether _boot.py should be import'ed or execfile()'d. I generally consider import as a cleaner and more general approach (with import, _boot would have its own module namespace). For all other stages (boot.py, main.py), a decision can be made by previous stage, in Python code. |
Personally, I _really_ like the fact that boot.py is execfile'd. boot.py can always import a module if you want stuff in its own namespace. Being able to have a bunch of commands be a part of the environment is really useful when you're using the REPL as a place for experimenting with the hardware. It means that I have less typing to do on the REPL, which makes using it more pleasant. |
The original reason for having boot.py separate from main.py was so that C set-up code could be run in between the 2 scripts. Most notably USB set up. If boot.py does not configure any USB mode then it gets its default (MSC+VCP), and this is configured after boot.py exits but before main.py starts. Also, exec'ing main.py from boot.py, and boot.py from _boot.py would lead to large stack usage (many nlr buffers pushed, lots of other state on the stack). I don't see anything wrong with the current approach. boot.py is exec'd, then main.py. Simple. Don't put anything in boot.py (or don't even create it) if you don't want to use it. Then just have main.py import (or execfile) your application if you want that workflow. |
That makes sense. So, for instance, you could disable the automatic connecting to the last network in the |
No, it's not exactly simple. It brings questions like "Why all these complications with interpreter running various adhoc file, if it can run one, and then that file can run whatever else." That's simple.
Ok, suppose that's the reason (because "being able to run C code" alone doesn't cut, it can be called from Python). So then the is sequence:
? Are we sure about those exec's? Because --
-- what's good for "experimenting", isn't really good for production, with completely random stuff contaminating the application environment. That can be classified as security issue. |
Yes, something like that, except both examples aren't good - those matters are handled by lower levels of vendor SDK and are outside of our immediate control. Current issue at hand is that webrepl for example will be completely user-level class, started in boot.py with:
A user will be able to comment it out or change any param there. |
It only contaminates if you don't clean up after yourself. It's easy to make an exec'd boot.py have a single line which imports stuff (so you get your clean solution). It's no where near as easy to make imported stuff replicate the way it is today. |
As I mentioned previously, proper pythonic way to do that is to assign attributes to to |
Wouldn't that contaminate pretty much all namespaces, all imported modules and all libraries you use, instead of just making a name available in the REPL? |
Well, "contaminate" is "unexpectedly and unforseenly leaking unrelated data". E.g. you have |
Any user app would be in a module of its own, with its own local variables, not the ones in REPL. So you don't really "contaminate" it, do you? In fact, it has no way of accessing the REPL's namespace at all, short of doing some dirty introspection tricks. |
There's no "REPL namespace". If setup from #1955 (comment) is used (and that's pretty much stmhal's setup), main.py will have to access to variables set by boot.py. |
Seems that technique is considered "bad" and not pythonic at all. And that sort of wipes out your security "issue". Sure if you store your password someplace in memory, that's a security issue, but it doesn't matter where it is its still the same security issue. There's nothing to stop any python code from reading the module source (or extracting the strings from the bytecode). Security by obscurity is no security at all. |
By some random dude on the internets? D'oh. But if you read his long and confusing reply, you'll see that essence of his reply is in the last small phrase at the end: "A better method is to use Python's existing import statement". That is, per that dude the only right way is to have a separate module, and nobody cares if you need to do more typing. Even adding new stuff to standard module "isn't right". If he'd knew about your dirty hacks, @dhylands, he'd probably have a real cognitive dissonance. Btw, do you know how to do your trick in CPython? Hint: it's possible. But then compare how it's handled in REPL vs running a script case.
That definitely does not.
You see, it's one thing if "security by obscurity" government issues you a paper passport - anybody who'll got their hands on it may know too much about you. But it at least small closed book which can be secured more or less easily (even if by obscurity), and any stranger who reaches out for it definitely commits a crime. It's different thing if government issues your passport as an A4 sheet with ties to be hanged around your neck. |
I have to chime in here about modifying builtins being considered a very bad and "unpythonic" practice. It's what a Ruby programmer would do. It's one of the chief reasons why the |
There's no need to ask anybody's opinion. There's simply the only way to do that in a reliable way (to avoid case when some code sees your function and another code doesn't) - assigning to To sum up the discussion so far:
|
I would sum it up as there is the way Paul wants to do it, and the way that everybody else wants to do it, since I've not seen anybody else chime in to support your views. |
I want to point out that it was you who proposed using buildins as "more Modifying the buildins, or in fact, monkey-patching any globally available Finally, I just did a small test with the |
Here's an experiment to run with the current github code. Create two files in the
and a
Now compile and flash the firmware. Go into the repl and try those commands:
QED |
Sure, the whole point of this ticket is to re-review and challenge existing bootstrap process, to look for opportunities to make it better. People who just keep repeating "I'm get used to whatever exist and I don't to change anything because I'm get used to it" don't add much to the discussion. |
Well stuff visible in boot.py is visible in main.py, but that's the extent of it (becaus main.py is execfile'd as well). So if you import pyb in boot.py, you can use pyb without importing it in main.py. But any modules that main.py imports need to import it for themselves. So that's clearly much less intrusive than putting things in builtins. Another solution would be to have a repl.py which was exec'd just before entering the REPL. This would be optional, and the file wouldn't have to exist. |
Just to clarify, my point during that part of discussion was that, on a scale where adding things to
I'm exactly against inventing anything "new" without good reason. (I'm all for inventing new if there's). |
I like it the way it is. Not because that's what I'm used to, but because I actually find it useful. That's one of the nice things about programming, I can let the computer do stuff for me. I shouldn't have to type extra stuff just for the sake of typing extra stuff. |
Your test is incorrect. You first start with applying code from #1955 (comment) . Then, you create following program (main.py):
The expected outcome of running that program is error due to undefined variable. Now create boot.py:
Now, the behavior of main.py has changed. If boot.py and main.py were created at different times, the behavior of main.py has changed in unpredictable way. If password variable actually contained a valuable password, then main.py exposed it, even if that was not intent of main.py's author. That's the difference from the case @dhylands mentioned earlier. The app looking like:
would easily raised concern - a user app should not normally deal with system boot code, the code above is easy to automatically screen for, and their review in detail for malicious intent (or just ban conservatively). But with exec-only way, any application becomes a potential trojan horse and/or unpredictable (yes, probability is low - until people start to target such vulnerability; in general, in security analysis, thing which can be a security issue, is a security issue). |
Could take a step back and define what we actually need and want, in form of required features and nice-to-haves, and refrain from proposing solutions until we have a good understanding of the situation? If we start with a solution, we will necessarily divide ourselves into supporters and opponents, and that is hardly productive. |
Also, no offense @pfalcon, but you are the person who makes the final decision on this, as the implementer. Your role is special, and you have great power and great responsibility here. Of course it's up to you, but I'm not sure it's a good idea to take sides in this position. |
I started this ticket with the list of requirements. |
Apparently it's not detailed enough, because you are coming up with additional requirements, not listed initially, to make one proposal seem better than another. Like the requirement for "security" that you implicitly add in the example above. Perhaps it should be added to the initial requirements, and we should also think what attack vectors we would like to secure against, so that we are not trying to build on quicksands with fluid requirements. |
I don't, @dpgeorge does ;-). And I really try to do my best to challenge what he'd done to see if anything can be improved, and that we understand implications of each choice. (And generally mark this as done and not return to it later. Even if people "threaten" to issue CVE against us, we reply: "Yeah, vulnerability. By design.") |
I'd like to add a requirement that it should be possible to have the equivalent of CPython's -i flag. So if I have a file repl.py:
With CPython you can then do:
Obviously, bare-metal doesn't have a command line, so I'm perfectly happy if the filename is hard-code. Then I'm perfectly to have boot.py and main.py be imported. |
I don't think we can realistically safeguard about the kind of scenario you describe in your example. Python is a dynamic language with very powerful introspection, and the general approach in its design has always been to give the programmer full power that is available, assuming they are a consenting adult and know what they do, while at the same time avoid opportunities of silly mistakes. This design philosophy means that there are no enforced boundaries in the code, no "private" methods, etc. and that whoever gets to execute code in your process, pretty much has full access to everything in it, if not explicitly, then through introspection mechanisms. For instance, if the Even if you managed to safeguard against all those cases (at the cost of severely modifying the design of the language itself), there are other ways to do this that I can't even think of. Python was not designed with this kind of safety in mind, and it's pointless to try to enforce it. |
I'm not seeing the vulnerability. With the existing system all you need to do is put your stuff in a separate .py file and import it, and then you get the exact same behavior as if that file were imported in the first place. You'll never protect against people from doing insecure things. If boot were imported and it contained something like:
then at the REPL, all I have to do is:
I fail to see how importing boot is any more secure than execfile'ing it. |
The way this works currently on PyBoard, we have two files that get exec-ed at boot at different stages in the REPL's namespace: In the case of the ESP8266 port, there is an additional step required for creating and mounting the filesystem. Since that can't be included in any of the files we already have, because they reside on that filesystem which is not yet mounted, there is a need for an additional stage to handle that. That is pretty much where we stand now, if I am not mistaken. Now, so far we have two propositions:
Is this an accurate summary of where we stand now? |
Since there is no USB setup on the ESP8266, there is no need to separate the |
The reason for having a separate Having this scheme as standard across boards (boot does board-specific set up, main is the actual application) is, I think, a good way of doing things. Adding frozen Exec'ing these files (instead of importing) is useful because you can add to the Internal security of the code is a non-issue because anything can be read using |
It's not evil, it's great achievement of allowing people to bootstrap even such low-level things a root fielsystem from Python side. "Evils" are things like large stack usage, lack of tail recursion support which interfere with that.
No, anything can't. Only something which accesses machine.mem32 or uses reflection can, and that can be automatically screened for. For example, package manager can have a switch "don't install insecure packages", which would scan source for those features and refuse to install offending packages. Such automated auditing is not possible (or much more complicated) in the presence of contaminated namespaces. That's why CPython doesn't use execfile'ing anywhere in its initializing sequence (accept for one corner case, which is carefully made to apply only to REPL, and explicitly excluded when running scripts). That alone would be enough to do it like CPython - a lot of people thought well how it should be done there. If we want to ignore that and invent our own stuff though, ok, if there's a good reason. Unfortunately, I see only one benefit - saving few bytes on creating namespace for boot.py. |
Screen for this:
There are literally thousands of ways you can do this, and it's impossible to automatically tell what it does without actually executing it first. This kind of "security" is not possible without completely redesigning the language, perhaps making it more like Java. |
Very simple:
@deshipu , it's now my turn to get pissed off ;-). And tell you: please try to understand the discussion and use it as a chance to learn. For example, when you say "This resembles the way that CPython starts, executing site.py", you have checked that in CPython source code. Or otherwise, sorry, you have no idea what you're talking about, and then talking doesn't make sense. |
Are you suggesting that You say that using In fact, I don't even need to call that Finally, the kind of people who you want to protect, those who just download random code from the Internet and use it in their security-critical applications without any kind of audit, are going to happily ignore any warnings your tools display, if only the instructions wherever they downloaded the code from tell them to ignore them. This is the |
Oh, and let's not forget, that I can always just open the file with the source code, and read the password from that. |
Upon start-up, _boot module is executed from frozen files to do early initialization, e.g. create and mount the flash filesystem. Then "boot.py" is executed if it exists in the filesystem. Finally, "main.py" is executed if exists to allow start-on-boot user applications. This allows a user to make a custom boot file or startup application without recompiling the firmware, while letting to do early initialization in Python code. Based on RFC #1955.
Implemented in esp8266 master. |
Add support for PyGamer to Stage library
This is to finalize design of bootstrapping process for new baremetal ports, exemplified by esp8266 port.
Requirements:
The text was updated successfully, but these errors were encountered: