Description
The Setup
Hi, I'm encountering an issue when using LiteLLM with OpenAI models (e.g., openai/gpt-4o-mini
) in a multi-step agent pipeline, particularly when a tool function is invoked multiple times in parallel.
I put here a simple example only to show my problem.
This is a SequentialAgent
composed of three sub-agents:
- A first agent that extracts keywords from a user prompt (e.g., short phrases).
- A second agent that uses a
FunctionTool
to look up additional information for each keyword (one tool call per keyword). - A third agent that summarizes the results.
The issue occurs when the second agent uses a tool that is invoked once per keyword, meaning several tool calls are triggered in parallel.
The Problem
When using:
model=LiteLlm(model="openai/gpt-4o-mini")
…more than 1 tool call are executed, but only the first of them is returned back to the LLMagent. The error indeed is:
An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxxx, call_yyy, call_zzz
In this message the first call_id doesn't appear: it means that it was correctly handled
However, when switching to:
model="gemini-2.0-flash"
…the same code works perfectly: all tool calls are received and processed correctly.
Minimal Reproducible Example
from google.adk.agents import LlmAgent, SequentialAgent
from google.adk.models.lite_llm import LiteLlm
from google.adk.tools import FunctionTool
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai import types
# Tool that echoes the keyword back with some metadata
def keyword_info(keyword: str) -> dict:
return {"keyword": keyword, "length": len(keyword)}
tool = FunctionTool(func=keyword_info)
# Agent 1: extracts simple keywords (simulated)
extract_agent = LlmAgent(
name="ExtractAgent",
model=LiteLlm(model="openai/gpt-4o-mini"), # <- works fine here
instruction="Extract keywords from the user input. Output only the list of keywords.",
output_key="keywords"
)
# Agent 2: invokes a function once per keyword
info_agent = LlmAgent(
name="InfoAgent",
model=LiteLlm(model="openai/gpt-4o-mini"), # <- problem happens here, try with model="gemini-2.0-flash"
instruction="For each keyword in 'keywords', call the 'keyword_info' tool to get more information.",
tools=[tool],
output_key="keyword_infos"
)
# Agent 3: summarizes
summarizer_agent = LlmAgent(
name="SummaryAgent",
model=LiteLlm(model="openai/gpt-4o-mini"), # <- works fine here
instruction="Summarize the keyword info in a user-friendly way.",
output_key="final_output"
)
# Chain them
pipeline = SequentialAgent(
name="PipelineAgent",
sub_agents=[extract_agent, info_agent, summarizer_agent]
)
# Session
APP_NAME = "demo_pipeline"
USER_ID = "test_user"
SESSION_ID = "test_session"
session_service = InMemorySessionService()
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID)
runner = Runner(agent=pipeline, app_name=APP_NAME, session_service=session_service)
# Trigger agent
input_text = "Cats, space exploration, artificial intelligence"
content = types.Content(role="user", parts=[types.Part(text=input_text)])
events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content)
for event in events:
if event.is_final_response():
print("Final Output:", event.content.parts[0].text)
Expected Behavior
All tool calls triggered in the second agent (InfoAgent
) should execute and return their results, even if issued in parallel.
Actual Behavior
With LiteLlm + OpenAI, only a subset of the expected tool calls are returned.
Notes
- If I switch to
model="gemini-2.0-flash"
(i.e., without LiteLlm), it works perfectly.
I suspect the issue is in how litellm handles multiple parallel function calls, perhaps in how responses are awaited/aggregated.
Let me know if I'm misunderstanding anything or leaving something out.
EDIT:
In this PR there is the solution, it is merged now.