Audio Analysis
Audio Analysis
Author: Uman Sheikh
Audio Analysis
Before We Start:
Please note that we will be using librosa a python package. It is commonly used to analyze
audio waves. Unfortunately, while writing these notes, librosa isn’t supported by python
3.8-3.11 So, we will use python 3.5-3.7
Introduction:
A Python library called Librosa was created for the analysis of music and audio. It focuses
specifically on recording audio data so that it can be converted into a data block. The
examples and documentation, however, are useful for comprehending how to deal with
audio data science projects.
Librosa is primarily utilized for working with audio data, such as when creating music (using
LSTMs) or performing automatic speech recognition. It offers the components required to
build music information retrieval systems.
To conduct some audio dataset analysis, we'll use Librosa. Installing the Librosa Python
Library will prepare our environment for analysis.
Installation:
We can simply install librosa using pip or pip3 in Linux and Mac. Let me clear that librosa is
currently supporting python 3.5-3.7 due to numba package.
Loading Audio File:
First import librosa and then use librosa.load(“filename.wav”). It will return a tuple (x, sr).
where x is numpy array and sr is a number containing sampling rate of x.
Audio Analysis
Wave show of audio file:
Let’s see the wave of audio file that we have loaded. By using matplotlib.pyplot as plt and
librosa.display we will display audio waves.
Above code will produce following results based on your provided audio file
Audio Analysis
Short-time Fourier transform (STFT):
The STFT represents a signal in the time-frequency domain by computing discrete Fourier
transforms (DFT) over short overlapping windows. This function returns a complex-valued
matrix.
After getting the complex matrix we will convert it into dB-Scaled Spectrogram.
We are using specshow and colorbar to represent STFT. Output of above code is given
below.
Audio Analysis
Above code represents STFT with respect to time and frequency.
Chroma feature:
The chroma feature is a description that summarizes the tonal component of an audio
musical stream. Chroma characteristics can therefore be seen as a crucial prerequisite for
high-level semantic analysis, such as chord recognition or estimating harmonic similarity.
Conclusion:
Tabular, textual, or visual data are not always the focus of data science projects. You could
often utilize unconventional data, like audio. I've shown you a few audio analysis techniques
in this book, but you can learn more by reading other papers, studies, and projects.
Uman Sheikh