Replies: 7 comments 5 replies
-
yes. correct |
Beta Was this translation helpful? Give feedback.
-
I am also having same question. Looking into the code I didn't find any build-in methods. If anyone knows that how to do it please let us know. |
Beta Was this translation helpful? Give feedback.
-
Has anyone figured out a solution for this? |
Beta Was this translation helpful? Give feedback.
-
LiteLlm has cost counters |
Beta Was this translation helpful? Give feedback.
-
I had the same problem since I wanted to track some costs/usage stats, so I tried to intercept with I finally came up with something similar: def track_llm_usage(res: ModelResponse | CustomStreamWrapper, model):
# Log token usage and cost before returning
try:
usage = res.get("usage", {})
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
total_tokens = usage.get("total_tokens", prompt_tokens + completion_tokens)
# Calculate cost if possible
cost = litellm.completion_cost(completion_response=res) or 0.0
print(f"[acompletion] Model: {model}")
print(f"[acompletion] Prompt tokens: {prompt_tokens}")
print(f"[acompletion] Completion tokens: {completion_tokens}")
print(f"[acompletion] Total tokens: {total_tokens}")
print(f"[acompletion] Cost: ${cost:.6f}")
print(f"[acompletion] Time: {(res._response_ms / 1000):.2f}s")
except Exception as e:
print(f"Error logging token usage: {e}")
return res
class KLiteLLMClient(LiteLLMClient):
async def acompletion(
self, model, messages, tools, **kwargs
):
return track_llm_usage(await super().acompletion(
model=model,
messages=messages,
tools=tools,
**kwargs,
), model=model)
root_agent = LlmAgent(
name="weather_time_agent",
model=LiteLlm(model="gemini/gemini-2.5-flash-preview-04-17",
llm_client=KLiteLLMClient()
),
# [...]
) hope it helps! |
Beta Was this translation helpful? Give feedback.
-
search within terminal using ctrl f "total_tokens". hope that helps |
Beta Was this translation helpful? Give feedback.
-
If you are using LiteLlm as an agent model, add LiteLlm(
'azure/gpt-4.1',
stream_options={"include_usage": True}
) and you can get usage tokens by events = runner.run_async(
session_id=session.id,
user_id=session.user_id,
new_message=self.__messages_to_agent_new_message(messages),
run_config=RunConfig(streaming_mode=StreamingMode.SSE),
)
async for event in events:
# cache_tokens_details=None cached_content_token_count=None candidates_token_count=None candidates_tokens_details=None prompt_token_count=172 prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: 'TEXT'>, token_count=172)] thoughts_token_count=44 tool_use_prompt_token_count=None tool_use_prompt_tokens_details=None total_token_count=216 traffic_type=None
print(event.usage_metadata) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Are there any built-in functions to determine cost / use of queries using the ADK? At the moment, I am using the free exp models (gemini-2.0-flash-exp). I need to understand what the costs/usage will be for my flows when put into production.
Beta Was this translation helpful? Give feedback.
All reactions