8000 Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython · Issue #1770 · python-trio/trio · GitHub
[go: up one dir, main page]

Skip to content
Trio run loop generates cyclic garbage, which can cause more frequent GC stalls on CPython #1770
@belm0

Description

@belm0

symptom

at a regular period (e.g. 10s or 20s), there are significant stalls (e.g. 30 ms) where it appears the main thread is pre-empted

In terms of trio.abc.Instrument, the lost time appears between before_task_step and after_task_step, though tasks are doing nothing with threads or time.sleep(). In a single-threaded app with many heterogeneous tasks, the stall events appear to be distributed randomly among the tasks.

I've come up with a minimal program to reproduce the effect. Note that it is CPU-intensive (a loop of very small trio.sleep()), as CPU use > 70% seems necessary to induce the problem.

import numba
import trio

async def _run():
    T = .000015
    t_start = trio.current_time()

    while True:
        t0 = trio.current_time()
        await trio.sleep(T)
        delta = trio.current_time() - t0
        if delta > .005:
            elapsed = trio.current_time() - t_start
            print(f'{(elapsed) // 60:02.0f}:{elapsed % 60:02.0f}  {delta * 1000:5.1f}')

trio.run(_run)

Note that the repro imports numba, and that is the spooky part. With the program as written, observed output (timestamp, stall in ms):

$  python repro.py
00:00   29.1
00:11   35.5
00:21   33.4
00:32   33.8

However, removing the numba import, the stall durations are significantly reduced (6 ms) and at a different period (3 ms):

$  ./.venv/bin/python repro.py
00:00    5.2
00:03    6.5
00:05    6.7
00:08    6.4
00:10    6.1

And here is importing numpy instead of numba (10 ms stalls, 4ms period):

$  ./.venv/bin/python repro.py
00:00    8.9
00:04   11.8
00:08   10.5
00:11   10.1
00:15   11.2

conditions

seen in Trio 0.17.0

it's observed by multiple people running the repro, across multiple architectures (at least Linux/amd64 and and OS X)

So far it's specific to Trio. I've tried equivalent loops using asyncio and time.sleep(), which do not exhibit the problem.

original bug filed to numba: numba/numba#6347

theories

  • T1. kernel scheduling artifact? - is this just an expected effect as a process approaches 100% CPU, as the OS/kernel need cycles to do internal work? But why doesn't the problem show up with asyncio or time.sleep() implementations? Why does the stall period depend on unused package imports?
  • T2. weird GIL interaction among main thread and one of Trio's background threads? - The GIL is essentially a scheduler, I've read.
  • ...?

next steps

  • try earlier versions of Trio - result: no change (tested back to 0.11.0)
  • ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0