8000 To evaluate or not to evaluate ... that's the question! · Issue #61 · pyscript/polyscript · GitHub
[go: up one dir, main page]

Skip to content

To evaluate or not to evaluate ... that's the question! #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WebReflection opened this issue Oct 16, 2023 · 6 comments
Closed

To evaluate or not to evaluate ... that's the question! #61

WebReflection opened this issue Oct 16, 2023 · 6 comments

Comments

@WebReflection
Copy link
Contributor
WebReflection commented Oct 16, 2023

Background

This module is based on coincident orchestration and a great achievement of that module, despite requiring inevitably special headers to allow SharedArrayBuffer, is that no code evaluation ever happens: it's all about proxies, Atomics or standard Web features, without "eval" like security concerns ever.

At the same time, because of those required special headers and to make this module as portable as possible when it comes to workers support, without needing to rely on foreign headers at all when that's desired, this module pre-bundles some file before publishing, also providing a nonce that 3rd party bundlers can use to eventually allow the xworker Blob code whenever CSP rules are both present and rightly strict / paranoid in terms of which code is trusted and which one isn't.

Such nonce is indeed embedded in the package.json as sha-256 value and, for whoever is wondering what is that about indeed, that's the checksum for the ./worker/xworker.js file that the build artifact creates before this module reaches npm (or just to even pass integration tests).

Current State

I had strong discussions against evaluating any code in the worker space because of CSP constraints and possible attached shenanigans, but while I am still fully convinced not evaluating anything in the worker space is actually desirable, automatically defeating tons of different classes of potential attacks to the code, this choice, and my position, is very likely limiting the DX related to hooks and/or the ability to actually pass callbacks to the worker as JS things to do and execute at the right time, like it is already for the main thread story, helped by the fact in there there's no need to survive postMessage related dances, so it's easier to add callbacks to onBeforeRun and friends without ever worry about CSP contraints.

However, while working, thinking, refactoring, and working on other things around this PR, I've wondered myself why shouldn't developers be able to define for either main or workers the very same list of hooks, also providing a way in the worker to deal with the current interpreter before (or after) any code runs at all, enabling tons of features that otherwise need to branch between main and workers all the time ... heck, I even think that PR is wrong in somehow branching main VS thread, as in an ideal world we want to just define these hooks, and stop caring about which one is running where, as long as there is any mean/way to disambiguate the hook is running in a main thread, or a worker one:

  • onReady
  • onBeforeRun
  • onBeforeRunAsync
  • onAfterRun
  • onAfterRunAsync
  • codeBeforeRun
  • codeBeforeRunAsync
  • codeAfterRun
  • codeAfterRunAsync

If we had these normalized no matter where, simply adding some detail to their callbacks when possible (I still think code is about python or other interpreters code, and these already provide a way to disambiguate, as it s for pyscript module, but other hooks can also provide as global IS_WORKER flag if needed), the only hook that'd be left over to date is the onWorkerReady one, because this is a hook that makes sense only on the main thread, as that provides the xworker referece to eventually attach utilities before the worker even runs via sync or anything else we want or need in the future.

In short, I wonder if just having all hooks normalized for all worlds makes sense, but then again, callbacks posted as strings to workers won't be able to carry their own outer scope like it is for main, so that it is still somehow desirable to disambiguate hooks by main one and thread one, where callbacks without outer scope dependencies can work regardless for both cases, but it's still possible to fine-tune those callbacks that wouldn't.

The code story would be the same for both main and thread, so that even in main it would be possible to add code related hooks, instead of just JS one, but basically anyone would be free to actually hook their own code or plugin at any time around the lifecycle of a polyscript script enabled script or custom type ... after all, for workers that would be just enabling extra trust to an already inevitably evaluated part of the project.

What do you you think?

/cc @antocuni @fpliger @ntoll @JeffersGlass @bugzpodder

@WebReflection
Copy link
Contributor Author
WebReflection commented Oct 16, 2023

I've read 3 times what I wrote and I know some part might feel confusing ... I'll try to hopefully explain some part:

No outer scope in Workers

import { hooks } from '@pyscript/core';

let i = 0;
// this hooks works only in *main*
hooks.onBeforeRun(() => { console.log(i++) });

// in workers, this can be serialized only as fn.toString()
// producing just "() => { console.log(i++) }" as string.
// that `i++` trusts an outer scope entry or a global one
// failing miserably, so that simple hook is not
// cross main/worker usable/reliable

Disambiguation in JS code

To solve previous caveat, we can pass a worker boolean (or any other name) as wrapper utility, asking users to branch out their intents:

import { hooks } from '@pyscript/core';

let i = 0;
hooks.onBeforeRun(({ worker }) => {
  if (worker) {
    // possible workaround
    if (!('i' in globalThis)) globalThis.i = 0;
    // do we really go down this branching path
    // in user provided hooks, or even our own hooks?
  }
  console.log(i++);
});

The issue here is that the branching burden would be all over the place, for both main or worker only meant code (see py-terminal plugin as example) ... and I personally don't fully like it, but in a world where some project prefer worker or main first, I think this might be a path forward, as it incentives a default choice yet allowing exception to be handled ... I'm a bit torn about this approach though ...

Disambiguation in Python code

The pyscript module, as example, provides a RUNNING_IN_WORKER export boolean value so that it's easy to do different things in there when such value is True or False, and the current py-terminal plugin uses that detail to behave differently ... I felt like that was OK and it solved also the stdlib bundling branching behavior, where we can with ease provide either different exports, with same names, or different behavior, erroring when desired in either main or worker scenarios ... I am also not sure about this, but it worked well to me to date.

That's it, these are the things that make me wonder if my idea makes any sense, if you understand its constraints and/or caveats, and if it would be a better way forward in general, not just for PyScript, but for the sake of anyone using even just polyscript to create amizing things out there!

Thank you for reasoning with me, happy to answer any outstanding question or concern, or happy to listen to even better ideas than this one!

@fpliger
Copy link
Contributor
fpliger commented Oct 16, 2023

@WebReflection I glanced over the comments and it deserves more time/attention. Just wanted to acknowledge that.

I'll try to do that tomorrow before schedule gets too hectic

@ntoll
Copy link
Member
ntoll commented Oct 24, 2023

Hi, just catching up with all the discussions after last week's travels... ✈️ 🌍 👍

Here's the key quote from your initial comment @WebReflection:

in an ideal world we want to just define these hooks, and stop caring about which one is running where.

This certainly reduces the cognitive burden on developers, in the same way they always have a windows or document reference, that works in the same way, no matter if they're in the main thread or on a worker.

But I also hear you when say that callbacks run on workers won't have their outer context available to them, because they're serialized to strings via the postMessage to the worker. This is clearly a difference between running on the main thread or on a worker.

So we can only go so far to help folks stop caring about where their code is run.

I believe, for most people, most of the time, and in most situations, this won't be a problem. However, for a significant minority of (likely highly technical) users who are expecting an outer context, we should very clearly flag this difference. That such behaviour is imposed upon us by the browser, for very good security reasons, means we can say something along the lines of, "that's how browsers work, and all we can do is just flag the side effects of this on your code". Given what I hope will be clear documentation and guidance on our part, I hope such technical users will be able to adjust their code to "fit".

@WebReflection
Copy link
Contributor Author

so, if we agree on normalizing and branching instead of going full (repeated) separation such as main and worker namespaces, I need to "rewrite" and change a lot of things, but I think whatever decision we make will be the right one, as long as we don't change mind and we are OK in allowing code evaluation in the worker by default ... if anything, this caveat is the only one that keeps me with my feet on the ground believing that maybe @antocuni sugggestion was the best one to reason about, otherwise we also risk to evaluate a lot of code maybe meant to work on main only or on worker only ...

As summary, I do want to allow evaluation on workers too, but maybe we're better off making it explicit as proposed in this MR which can be updated as follow up after this discussion: #58

I do like that MR already, it's surely verbose but it helps separating all concerns and logic around the two very different worlds.

@WebReflection
Copy link
Contributor Author

FYI I've updated both description and code, after rebase, for the current proposed MR: #58 (comment)

The more I think about it, the more it feels like the right way to move forward, as it's never ambiguous, it can avoid repeated entries by spreading objects, whenever that's needed, it clearly define a path for callbacks that work on main and those that don't + it allows different Python code too to run in main (in the future, see future proof section of the description).

@WebReflection
Copy link
Contributor Author

We've moved the idea forward with latest polyscript so we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0