Open
Description
Please read this first
- Have you read the docs? Yes – Agents SDK docs
- Have you searched for related issues? Yes – nothing covering the
cached_tokens=None
→ Pydantic error with LiteLLM + Gemini 2.5.
Describe the bug
When Runner.run()
executes an agent whose model is a Gemini 2.5-pro instance wrapped by LiteLLM, the pipeline dies during cost-calculation with
Error in Explainer (revision): 1 validation error for InputTokensDetails
cached_tokens
Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.11/v/int_type
The log shows LiteLLM repeatedly inserting cached_tokens=None
into the request metadata; Pydantic 2.11’s InputTokensDetails
model rejects None
because the field is typed as int
.
The fallback code then silently downgrades the model to o3-2025-04-16
, masking the issue in production.
Debug information
Item | Value |
---|---|
Agents SDK | v0.0.16 |
LiteLLM | v1.71.1 |
Python | 3.12.7 |
Pydantic | 2.11.03 |
OS | macOS 14.4 (Apple Silicon) |
Model string | gemini/gemini-2.5-pro-preview-05-06 |
Repro steps
"""
Minimal repro for cached_tokens=None crash with LiteLLM + Gemini 2.5.
Save as repro.py and run `python repro.py` (GOOGLE_API_KEY or GEMINI_API_KEY must be set).
"""
from agents import Agent, Runner
from agents.extensions.models.litellm_model import LitellmModel
import litellm, os, asyncio, json
# Suppress NULL fields – does *not* avoid the bug
litellm.drop_params = True
gemini = LitellmModel(
model="gemini/gemini-2.5-pro-preview-05-06",
api_key=os.getenv("GOOGLE_API_KEY") or os.getenv("GEMINI_API_KEY")
)
echo_agent = Agent(
name="Echo",
instructions="Return the user's message verbatim in JSON: {\"echo\": \"...\"}",
model=gemini,
)
async def main():
# Any prompt triggers the validation failure
result = await Runner.run(echo_agent, [{"role": "user", "content": "ping"}])
print(result.final_output)
asyncio.run(main())
Observed output
Error in Explainer (revision): 1 validation error for InputTokensDetails
cached_tokens
Input should be a valid integer ...
Commenting out the Gemini/LiteLLM model and falling back to any OpenAI model (o3-2025-04-16
, gpt-4o
) makes the script succeed, confirming the issue is isolated to the Gemini + LiteLLM path.
Expected behavior
- LiteLLM should pass a valid integer (e.g.,
0
) forcached_tokens
instead ofNone
, or - The Agents SDK should coerce
None
→0
before instantiatingInputTokensDetails
.
Either fix would allow Gemini 2.5 to run without crashing and would eliminate silent model downgrades.