🔊 Audio
DocArray supports many different modalities including Audio.
This section will show you how to load and handle audio data using DocArray.
Moreover, you will learn about DocArray's audio-specific types, to represent your audio data ranging from AudioUrl to AudioBytes and AudioNdArray.
Note
This requires a pydub dependency. You can install all necessary dependencies via:
Additionally, you have to install ffmpeg (see more info here):
Load audio file
First, let's define a class which extends BaseDoc and has a url attribute of type AudioUrl, and an optional tensor attribute of type AudioTensor.
Tip
Check out our predefined AudioDoc to get started and play around with our audio features.
Next, you can instantiate an object of that class with a local or remote URL:
from docarray import BaseDoc
from docarray.typing import AudioUrl, AudioNdArray
class MyAudio(BaseDoc):
url: AudioUrl
tensor: AudioNdArray = None
frame_rate: int = None
doc = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
Loading the content of the audio file is as easy as calling .load() on the AudioUrl instance.
This will return a tuple of:
- An
AudioNdArrayrepresenting the audio file content - An integer representing the frame rate (number of signals for a certain period of time)
Output
📄 MyAudio : 2015696 ...
╭──────────────────────┬───────────────────────────────────────────────────────╮
│ Attribute │ Value │
├──────────────────────┼───────────────────────────────────────────────────────┤
│ url: AudioUrl │ https://github.com/docarray/docarray/blob/main/tes │
│ │ ... (length: 90) │
│ tensor: AudioNdArray │ AudioNdArray of shape (30833,), dtype: float64 │
│ frame_rate: int │ 44100 │
╰──────────────────────┴───────────────────────────────────────────────────────╯
AudioTensor
DocArray offers several AudioTensors to store your data to:
If you specify the type of your tensor to one of the above, it will be cast to that automatically:
from docarray import BaseDoc
from docarray.typing import AudioTensorFlowTensor, AudioTorchTensor, AudioUrl
class MyAudio(BaseDoc):
url: AudioUrl
tf_tensor: AudioTensorFlowTensor = None
torch_tensor: AudioTorchTensor = None
doc = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
doc.tf_tensor, _ = doc.url.load()
doc.torch_tensor, _ = doc.url.load()
assert isinstance(doc.tf_tensor, AudioTensorFlowTensor)
assert isinstance(doc.torch_tensor, AudioTorchTensor)
AudioBytes
Alternatively, you can load your AudioUrl instance to AudioBytes, and your AudioBytes instance to an AudioTensor of your choice:
from docarray import BaseDoc
from docarray.typing import AudioBytes, AudioTensor, AudioUrl
class MyAudio(BaseDoc):
url: AudioUrl = None
bytes_: AudioBytes = None
tensor: AudioTensor = None
doc = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
doc.bytes_ = doc.url.load_bytes() # type(doc.bytes_) = AudioBytes
doc.tensor, _ = doc.bytes_.load() # type(doc.tensor) = AudioNdarray
Vice versa, you can also transform an AudioTensor to AudioBytes:
from docarray.typing import AudioBytes
bytes_from_tensor = doc.tensor.to_bytes()
assert isinstance(bytes_from_tensor, AudioBytes)
Save audio to file
You can save your AudioTensor to an audio file of any format as follows:
Play audio in a notebook
You can play your audio sound in a notebook from its URL or tensor, by calling .display() on either one.
Play from url:
Play from tensor:
Getting started - Predefined AudioDoc
To get started and play around with your audio data, DocArray provides a predefined AudioDoc, which includes all of the previously mentioned functionalities:
class AudioDoc(BaseDoc):
url: Optional[AudioUrl] = None
tensor: Optional[AudioTensor] = None
embedding: Optional[AnyEmbedding] = None
bytes_: Optional[AudioBytes] = None
frame_rate: Optional[int] = None
You can use this class directly or extend it to your preference:
from docarray.documents import AudioDoc
from typing import Optional
# extend AudioDoc
class MyAudio(AudioDoc):
name: Optional[str] = None
audio = MyAudio(
url='https://github.com/docarray/docarray/blob/main/tests/toydata/hello.mp3?raw=true'
)
audio.name = 'My first audio doc!'
audio.tensor, audio.frame_rate = audio.url.load()