8000 GitHub · Where software is built
[go: up one dir, main page]

Skip to content
Parallel tool calls with LiteLLM + Open B72C AI: some function calls missing in sequential agent flow #187
Closed
@paulthemagno

Description

@paulthemagno

The Setup

Hi, I'm encountering an issue when using LiteLLM with OpenAI models (e.g., openai/gpt-4o-mini) in a multi-step agent pipeline, particularly when a tool function is invoked multiple times in parallel.
I put here a simple example only to show my problem.

This is a SequentialAgent composed of three sub-agents:

  1. A first agent that extracts keywords from a user prompt (e.g., short phrases).
  2. A second agent that uses a FunctionTool to look up additional information for each keyword (one tool call per keyword).
  3. A third agent that summarizes the results.

The issue occurs when the second agent uses a tool that is invoked once per keyword, meaning several tool calls are triggered in parallel.

The Problem

When using:

model=LiteLlm(model="openai/gpt-4o-mini")

…more than 1 tool call are executed, but only the first of them is returned back to the LLMagent. The error indeed is:

An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxxx, call_yyy, call_zzz

In this message the first call_id doesn't appear: it means that it was correctly handled

However, when switching to:

model="gemini-2.0-flash"

…the same code works perfectly: all tool calls are received and processed correctly.

Minimal Reproducible Example

from google.adk.agents import LlmAgent, SequentialAgent
from google.adk.models.lite_llm import LiteLlm
from google.adk.tools import FunctionTool
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai import types

# Tool that echoes the keyword back with some metadata
def keyword_info(keyword: str) -> dict:
    return {"keyword": keyword, "length": len(keyword)}

tool = FunctionTool(func=keyword_info)

# Agent 1: extracts simple keywords (simulated)
extract_agent = LlmAgent(
    name="ExtractAgent",
    model=LiteLlm(model="openai/gpt-4o-mini"),  # <- works fine here
    instruction="Extract keywords from the user input. Output only the list of keywords.",
    output_key="keywords"
)

# Agent 2: invokes a function once per keyword
info_agent = LlmAgent(
    name="InfoAgent",
    model=LiteLlm(model="openai/gpt-4o-mini"),  # <- problem happens here, try with model="gemini-2.0-flash"
    instruction="For each keyword in 'keywords', call the 'keyword_info' tool to get more information.",
    tools=[tool],
    output_key="keyword_infos"
)

# Agent 3: summarizes
summarizer_agent = LlmAgent(
    name="SummaryAgent",
    model=LiteLlm(model="openai/gpt-4o-mini"), # <- works fine here
    instruction="Summarize the keyword info in a user-friendly way.",
    output_key="final_output"
)

# Chain them
pipeline = SequentialAgent(
    name="PipelineAgent",
    sub_agents=[extract_agent, info_agent, summarizer_agent]
)

# Session
APP_NAME = "demo_pipeline"
USER_ID = "test_user"
SESSION_ID = "test_session"

session_service = InMemorySessionService()
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID)
runner = Runner(agent=pipeline, app_name=APP_NAME, session_service=session_service)

# Trigger agent
input_text = "Cats, space exploration, artificial intelligence"
content = types.Content(role="user", parts=[types.Part(text=input_text)])
events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content)

for event in events:
    if event.is_final_response():
        print("Final Output:", event.content.parts[0].text)

Expected Behavior

All tool calls triggered in the second agent (InfoAgent) should execute and return their results, even if issued in parallel.

Actual Behavior

With LiteLlm + OpenAI, only a subset of the expected tool calls are returned.

Notes

  • If I switch to model="gemini-2.0-flash" (i.e., without LiteLlm), it works perfectly.

I suspect the issue is in how litellm handles multiple parallel function calls, perhaps in how responses are awaited/aggregated.

Let me know if I'm misunderstanding anything or leaving something out.

EDIT:

In this PR there is the solution, it is merged now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0