feat: Add Audio Extraction #3720

ronantakizawa · 2025-12-04T15:20:19Z

Adds a new interpret_audio action that enables agents to extract and transcribe audio content from web pages using OpenAI's Whisper API.

Problem

Browser-use can't understand audio content on a website. I tried to use browser-use to do tasks that required listening to an audio file on page and it couldn't.

Solution

This feature allows browser-use agents to understand and process audio elements they encounter during web navigation.

Functionality:

- Extracts audio URLs from HTML <audio> elements on the current page
- Downloads audio files (with redirect support for CDN-hosted content)
- Transcribes audio using OpenAI Whisper API
- Optionally summarizes transcriptions using the agent's LLM
- Returns structured transcription results to the agent

Testing script

  import asyncio
  import os
  from browser_use import Agent, BrowserSession
  from browser_use.llm import ChatOpenAI

  async def test_archive_audio():
      assert os.getenv('OPENAI_API_KEY'), "Set OPENAI_API_KEY"

      browser = BrowserSession()
      llm = ChatOpenAI(model='gpt-4o')

      agent = Agent(
          task="Go to https://archive.org/details/testmp3testfile and transcribe the audio",
          llm=llm,
          browser_session=browser,
          max_steps=10,
      )

      try:
          result = await agent.run()
          print(f"\n✅ Transcription result:\n{result}")
      finally:
          await browser.kill()

  asyncio.run(test_archive_audio())

Summary by cubic

Add interpret_audio to let agents transcribe and optionally summarize audio from web pages using OpenAI Whisper. This helps browser-use agents understand audio/video content they encounter.

New Features
- Extracts audio source from audio/video elements (via attributes, CDP, or JS). Works across iframes and shadow DOM, with optional element index.
- Downloads with redirects, resolves relative URLs, supports base64 data URLs, and returns clear errors for blob/streaming URLs.
- Transcribes with OpenAI Whisper; optional summary via the page-extraction LLM; cleans up temp files; returns structured results and memory notes.
- Docs updated to include interpret_audio in available tools.
Dependencies
- Added aiofiles>=24.1.0.

^{Written for commit 0357cb7. Summary will update automatically on new commits.}

CLAassistant · 2025-12-04T15:20:27Z

All committers have signed the CLA.

cubic-dev-ai

1 issue found across 4 files

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="browser_use/tools/service.py">

<violation number="1" location="browser_use/tools/service.py:964">
P2: Data URLs (`data:`) are excluded from relative URL conversion but not handled explicitly. Attempting to download a data URL via httpx will fail. Consider adding similar handling for data URLs (either decode them directly or return an appropriate error).</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Ask questions if you need clarification on any suggestion

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

browser_use/tools/service.py

ronantakizawa and others added 4 commits December 3, 2025 23:26

feat: add audio extraction feature

65903b6

Merge branch 'browser-use:main' into ronantakizawa/audioextraction

8009fb8

fix: remove tests

feacb5d

fix: fix implementation

8687d0b

cubic-dev-ai bot reviewed Dec 4, 2025

View reviewed changes

browser_use/tools/service.py Show resolved Hide resolved

ronantakizawa added 3 commits December 4, 2025 08:37

fix: add cubic-dev fixes

007f470

feat: add proper testing for audio features

3550358

fix: fix failed tests

0357cb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Audio Extraction #3720

feat: Add Audio Extraction #3720

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add Audio Extraction #3720

Are you sure you want to change the base?

feat: Add Audio Extraction #3720

Conversation

Uh oh!

Problem

Solution

Functionality:

Testing script

Summary by cubic

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants