8000 GitHub - ThunderAgent-org/ThunderAgent: A simple, fast and robust program-aware agentic inference system. · GitHub
[go: up one dir, main page]

Skip to content < 7FFF span style="width: 0%;" data-view-component="true" class="Progress-item progress-pjax-loader-bar left-0 top-0 color-bg-accent-emphasis">

ThunderAgent-org/ThunderAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

130 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ThunderAgent

Fast, simple and program-aware agentic inference system.

| Wiki | Blog | Paper |


About

ThunderAgent is a fast and easy-to-use library for agentic inference and rollout.

ThunderAgent is fast with:

  • Agentic program-aware scheduler that increases KV-cache hit rate and reduces memory imbalance across nodes, increasing agentic inference throughputs 1.5-3.6x across multiple agentic workflows.
  • Tool-call lifecycle management with automatic resource reclaim for more stable and reliable long-running rollouts

ThunderAgent is flexible and easy to use with:

  • OpenAI-compatible API passthrough with only one changing, adding Program_id to the sending API.

  • Multiple inference support for vLLM and SGLang

  • Multiple agentic RL training example like Search-R1 agent with slime and mini-swe-agent with SkyRL.

  • Real-time visualization of agentic trajectory metrics including total tokens, tool-use time, and per-program profiling.

Overview

ThunderAgent sits between agent clients and the infrastructure layer as an agentic workflow scheduler. On one hand, it improves inference throughput of vLLM/SGLang across multiple GPU nodes through program-aware scheduling. On the other hand, it provides a unified tool management interface for resources like Docker containers and remote APIs.

ThunderAgent Architecture

Inference & Evaluation Results

ThunderAgent improves vLLM throughput by 1.5–3.6× across diverse agentic workloads including SWE-Agent, OpenHands, and ToolOrchestra.

Inference Pipeline Results

Demo

TA_demo_final.mp4

Getting Started

Install ThunderAgent from source:

git clone git@github.com:HaoKang-Timmy/ThunderAgent.git
cd ThunderAgent
pip install -e .

How to use? Choose one backend you like, for example vllm.

uv pip install vllm --torch-backend=auto # install vllm

vllm serve Qwen/Qwen3-32B --port 8000 # serve a model

thunderagent --backend-type vllm --backends http://localhost:8000 --port 9000 --metrics --profile # launch ThunderAgent, make sure to send request through 9000.

How to embed with your own agentic workflow?

# original openai sender
openai.client.chat.completions.create(
            model=self.config.model_name,
            messages=messages,
          )
# ThunderAgent openai sender
extra_body = {}
extra_body["program_id"] = "unique_id"
# if you use docker for your agentic workflow
# extra_body["docker_ids"] = ["docker_id1", "docker_id2", ...]
openai.client.chat.completions.create(
            model=self.config.model_name,
            messages=messages,
            extra_body = extra_body
          )

Contributing

We welcome and value any contributions and collaborations. Please create a pull request.

Citation

If you use ThunderAgent for your research, please cite our paper:

@misc{kang2026thunderagentsimplefastprogramaware,
      title={ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System}, 
      author={Hao Kang and Ziyang Li and Xinyu Yang and Weili Xu and Yinfang Chen and Junxiong Wang and Beidi Chen and Tushar Krishna and Chenfeng Xu and Simran Arora},
      year={2026},
      eprint={2602.13692},
      archivePrefix={arXiv},
      primaryClass={cs.OS},
      url={https://arxiv.org/abs/2602.13692}, 
}

Contact Us

For enterprises interested in adopting or deploying ThunderAgent at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at hkang342@gatech.edu.

License

This repository is available under the MIT license. See the LICENSE.md file for details.

0