-
Notifications
You must be signed in to change notification settings - Fork 815
Handle multiprocess setups using preloading and equivilents #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That sounds like data corruption. When you restart the app and blow away the data does this reoccur? To be on the safe side, are you running on a big or little endian system? |
little endian |
If you have further debugging information, please reopen. |
Hello, we are having the same problem. Possibly useful bits of our code: STATUS_TIME = Summary('<name>', 'Time (in seconds) spent in <STATUS>', ['status'])
# ...
def _record_time_in_current_status(self):
# ...
STATUS_TIME.labels(status=status).observe(duration) # <-- this often works but sometimes fails. Our gunicorn config is: from prometheus_client import multiprocess
# *** Global configuration *** #
bind = '0.0.0.0:5000'
workers = 2
# *** Server hooks *** #
def child_exit(server, worker):
# Needed for metrics to be consistent when multiple workers exists.
# -> https://github.com/prometheus/client_python#multiprocess-mode-gunicorn
multiprocess.mark_process_dead(worker.pid) Is there any other information we can provide? |
Is it the exact same number of bytes it's reporting? |
No for us the buffer is expected to be I forgot to mention we have started seeing these exceptions when we started testing our app with many concurrent users. |
Can you figure out which file is the issue, and attach it here? |
We lost the file as the service is restated but I'll try to get it if/when this happens again. |
Thanks for the very quick response 😄 |
Just to verify, the worker processes are fork&execed? Pre-fork approaches don't work with the current multi-proc. |
Hello, I finally had the time to check. It does look like gunicorn master Does that mean gunicorn is not actually a supported use case or are we using it wrong? |
I was looking into why multiprocessing requires fork & exec and not just fork. By adding log statements to import logging
log = logging.getLogger('test')
def _MultiProcessValue(__pid=os.getpid()):
...
log.error("~~~" "PID: %s", pid)
class _MmapedValue(object):
...
def __init__(...):
...
self._lock = Lock()
log.error("~~~" "PId2: %s => %s", pid, os.getpid())
... In order to configure gunicorn to clean up live gauges, the gunicorn configuration module imports At this stage, By removing the import in the config module (I removed cleanup for testing, a better way is needed) we can see that Is it possible to re-work |
We have same problem in production and I agreed to @stefano-pogliani's analysis. |
It should work, something else must be enabling pre-loading on you. We don't support multiprocessing currently. We need each child to start with a fresh interpreter.
I'm not sure that works for all use cases. |
I checked the code in guicorn from various versions and none of them seem to start workers with fresh interpreters, they all just I even tried raising an exception at import time of our app: the workers attempt to import the app and fail while the master proceeds fine, which suggests it does not import the app module. Any idea of what else I could check or of a known working gunicorn with Prometheus app I can compare ours with? |
To be more precise, no prometheus_client code should be imported until after the fork. What I've had in mind is to check the pid on every metric change, and reset things if it changes. This would be a 10% performance hit. |
So just to get things straight: Using prometheus_client in a multiprocessing environment, as used in uwsgi or gunicorn, is not recommended at the moment - right? And to make things clear to myself: A solution is not that easy because the most solutions would result in performance issues when tracking lots of metrics? I'm issuing the latter question because I'm thinking about developing a custom client using an approach as seen in the php library for prometheus where Redis is used as an in-memory datastore for metrics. |
That works fine as long as you're not preloading, the
There'll be a performance hit, but it's tolerable. It's more that I haven't gotten around to it yet.
Anything involving the network is going to be orders of magnitude slower. |
Thank you for that very swift reponse!
Okay! That makes it clearer to me. As I'm still struggling with the vast amount of uwsgi config options, I'll give gunicorn (without preloading) a try and see, if I can get better results here. Thanks again! |
FWIW I've run into this problem, and after some debugging discovered that it only presents itself when you add the As a workaround we found that importing the client library code inside the def child_exit(server, worker):
from prometheus_client import multiprocess
multiprocess.mark_process_dead(worker.pid) |
Hmm, wouldn't that still have the problem for any child processes started after the first one exits? |
It's strange but it doesn't appear to be a problem. I tested this by killing off workers and what I saw is that the child processes started to replace them all got their own metric data files correctly associated with their PIDs. I have to say I don't fully understand why this is the case. |
Can ye try #169? |
We've got one confirmation that #169 works as expected. Has anyone else tested it to confirm it fixes their use case? |
I tested the code in the PR and it seems to be working properly now. Thanks 👍 |
That's merged now, docs still need an update. |
Sometimes application raise error.
At line https://github.com/prometheus/client_python/blob/master/prometheus_client/core.py#L361
We run application via uwsgi:
The text was updated successfully, but these errors were encountered: