-
Notifications
You must be signed in to change notification settings - Fork 814
Add multi-process support #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# This needs to be chosen before the first metric is constructed, | ||
# and as that may be in some arbitrary library the user/admin has | ||
# no control over we use an enviroment variable. | ||
if 'prometheus_multiproc_dir' in os.environ: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than hard coding the choice of _ValueClass here, could the class be defined by the users and loaded from anywhere on the path? #70 could be implemented in a separate package entirely and be developed/updated on it's own schedule. The shelf implementation can certainly ship with client_python, but users of other multi-process python servers could be free to experiment without needing to land code here.
Something like the following, shamelessly stolen from django/utils/module_loading.py with error handling stripped dangerously:
module_path, class_name = os.environ.get(PROMETHEUS_VALUE_CLASS, 'prometheus_client.core._MutexValue').rsplit('.', 1)
module = import_module(module_path)
_ValueClass = getattr(module, class_name)
Then if #70 was in it's own package prometheus_uwsgi
, you might start the server with PROMETHEUS_VALUE_CLASS=prometheus_uwsgi.UwsgiCacheValue
or PROMETHEUS_VALUE_CLASS=
prometheus_client.core.MultiProcessValueand
prometheus_multiproc_dir=/path/to/multiproc_dir` to use the shelf implementation described in this PR (removing the _ since it would become part of the public API)
Those _ValueClass implementations would be responsible for configuring themselves from the environment or wherever, which feels non-pythonic. Somehow configuring the system explicitly at process startup seems like the best path, and the developers could pass in whatever class they want, but I haven't dug in enough to the rest of the implementation to suggest what I would want that API to look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable idea.
Somehow configuring the system explicitly at process startup seems like the best path
That won't work, as by the time your code is running all this code has likely already been imported and many metrics already exist. That's why I'm using environment variables, as they're set at exec()
time before any python code has a chance to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work, as by the time your code is running all this code has likely already been imported and many metrics already exist.
That's true assuming you want to set the _ValueClass
once and only once and never have it specified in code. You could circumvent this by using a client pattern or parameterizing metric instantiation with a _ValueClass
as well. I think it would make the interface nicer to use the environment variable by default, but also allow:
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request', value_class=prometheus_client.ShelveValue)
or something of the sort.
What is the status of this? Is it going to be pulled? I need the functionality and will have to dl the branch if not. Thanks! |
The core functionality is fine, however the performance isn't good enough to be used in production. I haven't had the time to spare to find a better backing store for that. |
31828e2
to
2ffb7e9
Compare
This works by having a shelve per process that are continously updated with metrics, and a collector that reads from them. Histogram buckets need accumulation and special handling for _count, and for gauges we need to offer a number of options.
2ffb7e9
to
0dbf528
Compare
This uses an mmaped file to store the data. Fsync is not called, and concurrent readers are permitted (which some shelve backends did not permit). Microbenchmarks indicate 1.2-1.5us per instrumentation event. This compares to 3-13ms from the shelve solution, so 3 orders of magnitude better. The 1.2-1.5us is about double normal multi-process instrumentation, due to the 2nd mutex required. Python doesn't offer an RWLock which would let this be reduced. Overall this performance is acceptable, as if someone was doing something extremely performance critical with instrumentation they wouldn't be using Python.
0dbf528
to
9e1f993
Compare
FWIW, I started to work on performance improvement of your branch last week and simply went with JSON instead of shelve/BSDdb, which also gave an improvement of a few orders of magnitude. (I don't know the exact number right now but it was somewhere around a 2x slowdown compared to non-multiprocessing.) Maybe that's an alternative that involves less custom code. |
That's about the same slowdown as this code, which is presumably dominated by the 2 mutexes in use (why does using a mutex take most of a microsecond?). Hopefully we'll get that faster some time, but I doubt you got too much better than the 80 lines of code I managed it in. |
Sorry, I did not mean to say that your code was bad or something. Actually, the binary format is very simple and the code is easy to understand. My point is that if it does not make a difference in terms of performance, maybe it's more future-proof and pragmatic to simply go with JSON -- if only so that future readers of the code don't have to worry about learning about a new serialisation format. It's not a big deal at this point, but it might become one if features are added to serialisation format; that's what I'm saying. |
Let's see where it goes. It's all self-contained, and we don't need to worry about backwards compatibility. The tricky stuff is all in the filenames anyway. |
This works by having a shelve per process that are continously updated
with metrics, and a collector that reads from them.
Histogram buckets need accumulation and special handling for _count,
and for gauges we need to offer a number of options.
Fixes #30