0% found this document useful (0 votes)

93 views5 pages

Separate Vocals From A Track Using Python - DEV Community

Uploaded by

sehibiyaoblaise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views5 pages

Separate Vocals From A Track Using Python - DEV Community

Uploaded by

sehibiyaoblaise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

17/03/2024 03:06 Separate vocals from a track using python - DEV Community

Vicente G. Reyes
Posted on 13 févr. 2023

Separate vocals from a track using python

#python #music #programming

What I've planned to do in the past was learn how to separate vocals from a track
programmatically and not depend on software-as-a-service to perform the separation
of vocals from a track. This article shows how to separate the vocals of a song from
the instruments using my new favorite library, Librosa. You can check out the Google
Colab Notebook here.

The idea sparked when I wanted to separate individual tracks of a song, so I went to
Product Hunt and discovered melody ml. This discovery started the urge to learn ML
for music, hence the discovery of the Python library, librosa.

By the way, I ran out of RAM, which made my notebook explode.

GIF

https://dev.to/highcenburg/separate-vocals-from-a-track-using-python-4lb5 1/9
17/03/2024 03:06 Separate vocals from a track using python - DEV Community

Icen Reyes
@icenreyes · Follow

Something crashed! waaaaaaaaa

10:34 AM · Jan 31, 2023

Reply Copy link to post

Install and import dependencies

pip install librosa matplotlib IPython

import librosa
from librosa import display
import numpy as np
import IPython.display as ipd
import matplotlib as plt

Load and display the song.

I used My Last Serenade by KSE as I wondered how the growling or shouting parts
of the song would come out.

y, sr = librosa.load('My Last Serenade.wav')

ipd.Audio(data=y[90*sr:110*sr], rate=sr)

We slice a 20 second snippet in the chorus of the song. We show the audio using
ipd.Audio (tbh, this is a bit exhausting). Photo is shown below because I couldn't find a
way to upload audio here on DEV.

https://dev.to/highcenburg/separate-vocals-from-a-track-using-python-4lb5 2/9
17/03/2024 03:06 Separate vocals from a track using python - DEV Community

We separate a complex-valued spectrogram D into its magnitude

(S) and phase (P) components, convert the time stamps into
frames, plot the data, then display the full spectrogram of the
data
S_full, phase = librosa.magphase(librosa.stft(y))
idx = slice(*librosa.time_to_frames([90*110], sr=sr))
fig, ax = plt.pyplot.subplots()
img = display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max), y_a
fig.colorbar(img, ax=ax)

Line by line explanation

S_full, phase = librosa.magphase(librosa.stft(y)) - we separate the magnitude
and phase of the track using short-time fourier transform by representing a signal in
the time-frequency domain by computing discrete Fourier Transforms(DFT)(y)

idx = slice(librosa.time_to_frames([90110], sr=sr)) - slice a the part of the song

then convert it to stft frames using the time_to_frames function of librosa

img = display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max),

y_axis='log', x_axis='time', sr=sr, ax=ax) - display the spectrogram of the 20
second sliced part of the song by converting the amplitude spectrogram to a dB-scaled
spectrogram of the magnitude, then compares the magnitude and phase of the track
and returns a new array containing the element-wise maxima then it plots the y and x
axis

Below is the image of the spectrum:

https://dev.to/highcenburg/separate-vocals-from-a-track-using-python-4lb5 3/9
17/03/2024 03:06 Separate vocals from a track using python - DEV Community

Decomposing the spectrogram

S_filter = librosa.decompose.nn_filter(S_full, aggregate=np.median, metric='cos
S_filter = np.minimum(S_full, S_filter)

Line by line explanation

S_filter = librosa.decompose.nn_filter(S_full, aggregate=np.median,
metric='cosine', width=int(librosa.time_to_frames(2, sr=sr))) - we filter the
vocals by its nearest neighbors, aggregate their median values, compare their frames
using cosine similarity and contain those frames to be separated by 2 seconds and
suppress other sounds from the spectrum

S_filter = np.minimum(S_full, S_filter) - we get the calculated data in the memory

of the S_full and S_filter variables to get the minimum value.

Display the background and foreground spectrum of the audio

margin_i, margin_v = 3, 11
power = 3

mask_i = librosa.util.softmask(S_filter, margin_i * (S_full - S_filter), power=

mask_v = librosa.util.softmask(S_full - S_filter, margin_v * S_filter, power=po

S_foreground = mask_v * S_full

S_background = mask_i * S_full

Line by line explanation

margin_i, margin_v = 3, 11 - we use margins to reduce loss in sound in the vocals
and instrumented masks

power = 3 - returns the soft mask computed in a numerically stable way

S_foreground = mask_v * S_full and S_background = mask_i * S_full - multiply the

masks with the input spectrum to separate the components

Plotting the full spectrum, background and foreground spectrum

fig, ax = plt.pyplot.subplots(nrows=3, sharex=True, sharey=True)
img = display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max), y_a
ax[0].set(title='Full Spectrum')
ax[0].label_outer()

display.specshow(librosa.amplitude_to_db(S_background[:, idx], ref=np.max), y_a

ax[1].set(title='Background Spectrum')
https://dev.to/highcenburg/separate-vocals-from-a-track-using-python-4lb5 4/9
17/03/2024 03:06 Separate vocals from a track using python - DEV Community
ax[1].label_outer()

display.specshow(librosa.amplitude_to_db(S_foreground[:, idx], ref=np.max), y_a

ax[2].set(title='Foreground Spectrum')
ax[2].label_outer()

fig.colorbar(img, ax=ax)

Recover the foreground audio from the masked spectrogram

and playback the audio
y_foreground = librosa.istft(S_foreground * phase)
ipd.Audio(data=y_foreground[90*sr:110*sr], rate=sr)

Line by line explanation

y_foreground = librosa.istft(S_foreground * phase) - inverses the short-time
fourier transform
ipd.Audio(data=y_foreground[90*sr:110*sr], rate=sr) - plays back the vocals from
the track

Conclusion
This seemed easy at first thought and when I was reading the documentation but
digging under the code made me realize that this idea was a little more complex. But,
what made me continue was when I read about nearest neighbors in one part of the
documentation which made me realize that I will be getting my hands on Machine
Learning in the future with this library.
https://dev.to/highcenburg/separate-vocals-from-a-track-using-python-4lb5 5/9

Breaking Down The Mix - Using Python and Neural Networks To Separate Audio Tracks - by John MicMico - Artificial Intelligence in Plain English
No ratings yet
Breaking Down The Mix - Using Python and Neural Networks To Separate Audio Tracks - by John MicMico - Artificial Intelligence in Plain English
9 pages
Digital Signal Processing Report
No ratings yet
Digital Signal Processing Report
20 pages
ML Assignment 2 Report
No ratings yet
ML Assignment 2 Report
59 pages
Reading Audio Data
No ratings yet
Reading Audio Data
8 pages
Librosa - Audio and Music Signal Analysis in Python SCIPY 2015
No ratings yet
Librosa - Audio and Music Signal Analysis in Python SCIPY 2015
7 pages
Python Audio Analysis Library
No ratings yet
Python Audio Analysis Library
3 pages
Librosa: Audio and Music Signal Analysis in Python: Jones01 Vanderwalt11
No ratings yet
Librosa: Audio and Music Signal Analysis in Python: Jones01 Vanderwalt11
8 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
Audio Analysis in Python 1676006837
No ratings yet
Audio Analysis in Python 1676006837
5 pages
FROMTXTTIMESERIESTOWAVEFILESANDSPECTROGRAMEXTRACTION SEISMIC JupyterNotebook
No ratings yet
FROMTXTTIMESERIESTOWAVEFILESANDSPECTROGRAMEXTRACTION SEISMIC JupyterNotebook
29 pages
Department of Electronics 2020-2021: Prof. Shilpa Achaliya
No ratings yet
Department of Electronics 2020-2021: Prof. Shilpa Achaliya
15 pages
Python Audio Processing Guide
No ratings yet
Python Audio Processing Guide
4 pages
Distinguishing Between Two Human Voices Using AI
No ratings yet
Distinguishing Between Two Human Voices Using AI
11 pages
Karthsmlmd - Jupyter Notebook
No ratings yet
Karthsmlmd - Jupyter Notebook
18 pages
Pad Assignment 2
No ratings yet
Pad Assignment 2
12 pages
Reading and Writing WAV Files in Python - Real Python
No ratings yet
Reading and Writing WAV Files in Python - Real Python
86 pages
Matlab Speech Segmentation Guide
No ratings yet
Matlab Speech Segmentation Guide
3 pages
Sec 5 - Audio Signal Acquisition - Record & Load mp3
No ratings yet
Sec 5 - Audio Signal Acquisition - Record & Load mp3
9 pages
Speech Understanding Content
No ratings yet
Speech Understanding Content
10 pages
4-2 Reading Wave Files: 本網頁根據 Chrome 測試，如果你不是使用 Chrome，可能無法正確呈現唷！
No ratings yet
4-2 Reading Wave Files: 本網頁根據 Chrome 測試，如果你不是使用 Chrome，可能無法正確呈現唷！
7 pages
APPFDL
No ratings yet
APPFDL
9 pages
Audio Fingerprinting With Python and Numpy
No ratings yet
Audio Fingerprinting With Python and Numpy
13 pages
Audio Feature Extraction Guide
No ratings yet
Audio Feature Extraction Guide
25 pages
Music Analysis Tools for Developers
No ratings yet
Music Analysis Tools for Developers
2 pages
UrbanSound8K Dataset: Automatic Sound Recognition (ASR) Project With CNN and ANN Models
No ratings yet
UrbanSound8K Dataset: Automatic Sound Recognition (ASR) Project With CNN and ANN Models
31 pages
A$SP Assignment
No ratings yet
A$SP Assignment
8 pages
Audiosegment Readthedocs Io en Latest
No ratings yet
Audiosegment Readthedocs Io en Latest
23 pages
Sound Processing
No ratings yet
Sound Processing
9 pages
Audio Deep Learning Made Simple (Part 2) - Why Mel Spectrograms Perform Better - Towards Data Science
No ratings yet
Audio Deep Learning Made Simple (Part 2) - Why Mel Spectrograms Perform Better - Towards Data Science
16 pages
MSC Data Science - 02 PDF
No ratings yet
MSC Data Science - 02 PDF
37 pages
Eng 6 Audio Signals: Bevan Baas, Andre Knoesen
No ratings yet
Eng 6 Audio Signals: Bevan Baas, Andre Knoesen
30 pages
Update On Speech Recognition System Using LibriSpeech
No ratings yet
Update On Speech Recognition System Using LibriSpeech
3 pages
Music Source Separation: Francisco Javier Cifuentes Garc Ia
No ratings yet
Music Source Separation: Francisco Javier Cifuentes Garc Ia
7 pages
#Task-3 Code
No ratings yet
#Task-3 Code
3 pages
PythonNotesForProfessionals (759 826)
No ratings yet
PythonNotesForProfessionals (759 826)
68 pages
Ita Posgrad EA 268 Lab-1
No ratings yet
Ita Posgrad EA 268 Lab-1
4 pages
Audio Processing for Musicians
No ratings yet
Audio Processing for Musicians
4 pages
Hanoi University of Science and Technology
No ratings yet
Hanoi University of Science and Technology
9 pages
Package Audio': R Topics Documented
No ratings yet
Package Audio': R Topics Documented
10 pages
AIM:-Audio Signal Feature Extrac On Objec Ve
No ratings yet
AIM:-Audio Signal Feature Extrac On Objec Ve
5 pages
Dual Attention Network For Pitch Estimation of Monophonic Music
No ratings yet
Dual Attention Network For Pitch Estimation of Monophonic Music
6 pages
Adaptive Noise Cancellation Report
No ratings yet
Adaptive Noise Cancellation Report
10 pages
Speech Processing
No ratings yet
Speech Processing
320 pages
DSP Project 2
No ratings yet
DSP Project 2
10 pages
Mrac Paper1a
No ratings yet
Mrac Paper1a
11 pages
Audio Noise Detection
No ratings yet
Audio Noise Detection
29 pages
Sound Lab: Power Spectra: Background
No ratings yet
Sound Lab: Power Spectra: Background
4 pages
MATLAB Audio Processing Guide
No ratings yet
MATLAB Audio Processing Guide
7 pages
Applsci 10 04214
No ratings yet
Applsci 10 04214
18 pages
Wu 2019
No ratings yet
Wu 2019
4 pages
Audio Classification with ANN
No ratings yet
Audio Classification with ANN
1 page
Kyma Mindmaps v0.5
No ratings yet
Kyma Mindmaps v0.5
66 pages
Automatic Bass Line Transcription
100% (1)
Automatic Bass Line Transcription
3 pages
Fir and I I R Filters Worksheet Answers
No ratings yet
Fir and I I R Filters Worksheet Answers
9 pages
Audiosignalprocessing
No ratings yet
Audiosignalprocessing
11 pages
Friday Lunchtime Lecture: Making Music With Open Data
No ratings yet
Friday Lunchtime Lecture: Making Music With Open Data
31 pages
Importing Audio and Video in Matlab
No ratings yet
Importing Audio and Video in Matlab
5 pages
Create A Voice Recorder Using Python
No ratings yet
Create A Voice Recorder Using Python
6 pages
Cython Openmp
No ratings yet
Cython Openmp
8 pages
Cython Tutorial - How To Speed Up Python - InfoWorld
No ratings yet
Cython Tutorial - How To Speed Up Python - InfoWorld
10 pages
Frequency and Pitch
No ratings yet
Frequency and Pitch
3 pages
Learn The Architecture - Optimizing C Code With Neon Intrinsics 102467 0201 02 en
No ratings yet
Learn The Architecture - Optimizing C Code With Neon Intrinsics 102467 0201 02 en
40 pages
DSP Course: Key Concepts & Techniques
No ratings yet
DSP Course: Key Concepts & Techniques
49 pages
02 - C IP-20N Radio Frequency Units - 12.5 - f2
No ratings yet
02 - C IP-20N Radio Frequency Units - 12.5 - f2
56 pages
Diode Detectors For RF Measurement: Part 1: Rectifier Circuits, Theory and Calculation Procedures
No ratings yet
Diode Detectors For RF Measurement: Part 1: Rectifier Circuits, Theory and Calculation Procedures
116 pages
Wavelet Transform
100% (1)
Wavelet Transform
27 pages
BD387-VHF UHF Digital Signal Boosters DH14 Generic-Fiplex
No ratings yet
BD387-VHF UHF Digital Signal Boosters DH14 Generic-Fiplex
3 pages
FMCW Radar For Slow Moving Target Detection: Design and Performance Analysis
No ratings yet
FMCW Radar For Slow Moving Target Detection: Design and Performance Analysis
4 pages
Two-Port Network & Resonance Analysis
No ratings yet
Two-Port Network & Resonance Analysis
10 pages
Linear Array Factor
No ratings yet
Linear Array Factor
18 pages
Communications System Essentials
0% (1)
Communications System Essentials
6 pages
Resonance in RLC Circuits Explained
No ratings yet
Resonance in RLC Circuits Explained
77 pages
5G Beyond Radio Access A Flatter Sliced Network
No ratings yet
5G Beyond Radio Access A Flatter Sliced Network
20 pages
5.2 Single-Stub Matching: - Matching Using TL - Open or Shorted Stub (TL)
No ratings yet
5.2 Single-Stub Matching: - Matching Using TL - Open or Shorted Stub (TL)
21 pages
HTX-28R Ps
No ratings yet
HTX-28R Ps
3 pages
Mikrofon Bezprzewodowy Mipro ACT707SE-Manual
No ratings yet
Mikrofon Bezprzewodowy Mipro ACT707SE-Manual
9 pages
3000S Catalog
No ratings yet
3000S Catalog
5 pages
Cleartone Tetra Cm9000
No ratings yet
Cleartone Tetra Cm9000
3 pages
MTi211020011-01E1-FCC PART 15.247 BR&DER Report
No ratings yet
MTi211020011-01E1-FCC PART 15.247 BR&DER Report
65 pages
3D Series X Band Air Rotating
No ratings yet
3D Series X Band Air Rotating
2 pages
An-851 Application Note: A Wimax Double Downconversion If Sampling Receiver Design
No ratings yet
An-851 Application Note: A Wimax Double Downconversion If Sampling Receiver Design
8 pages
Essentials of Digital Signal Processing (2014)
90% (10)
Essentials of Digital Signal Processing (2014)
763 pages
Lab 3
No ratings yet
Lab 3
8 pages
HITB Labs: Practical Attacks Against 3G/4G Telecommunication Networks
100% (1)
HITB Labs: Practical Attacks Against 3G/4G Telecommunication Networks
63 pages
Buku Manual HT Firstcom: Read/Download
No ratings yet
Buku Manual HT Firstcom: Read/Download
2 pages
GT CISPR25 v1.8
No ratings yet
GT CISPR25 v1.8
55 pages
Comet Antenna Catalog Eibradio
No ratings yet
Comet Antenna Catalog Eibradio
16 pages
Remote Firing Device Guide
100% (1)
Remote Firing Device Guide
2 pages
Qorvo DW3000-2934245
No ratings yet
Qorvo DW3000-2934245
57 pages
Satellite System Parts Acquisition
No ratings yet
Satellite System Parts Acquisition
10 pages
5th Sem Octave - DSP Manual
No ratings yet
5th Sem Octave - DSP Manual
68 pages

Separate Vocals From A Track Using Python - DEV Community

Uploaded by

Separate Vocals From A Track Using Python - DEV Community

Uploaded by

17/03/2024 03:06 Separate vocals from a track using python - DEV Community

Separate vocals from a track using python

By the way, I ran out of RAM, which made my notebook explode.

Something crashed! waaaaaaaaa

10:34 AM · Jan 31, 2023

Reply Copy link to post

Install and import dependencies

Load and display the song.

y, sr = librosa.load('My Last Serenade.wav')

We separate a complex-valued spectrogram D into its magnitude

Line by line explanation

idx = slice(*librosa.time_to_frames([90*110], sr=sr)) - slice a the part of the song

img = display.specshow(librosa.amplitude_to_db(S_full[:, idx], ref=np.max),

Below is the image of the spectrum:

Decomposing the spectrogram

Line by line explanation

S_filter = np.minimum(S_full, S_filter) - we get the calculated data in the memory

Display the background and foreground spectrum of the audio

mask_i = librosa.util.softmask(S_filter, margin_i * (S_full - S_filter), power=

S_foreground = mask_v * S_full

Line by line explanation

power = 3 - returns the soft mask computed in a numerically stable way

S_foreground = mask_v * S_full and S_background = mask_i * S_full - multiply the

Plotting the full spectrum, background and foreground spectrum

display.specshow(librosa.amplitude_to_db(S_background[:, idx], ref=np.max), y_a

display.specshow(librosa.amplitude_to_db(S_foreground[:, idx], ref=np.max), y_a

Recover the foreground audio from the masked spectrogram

Line by line explanation

You might also like

idx = slice(librosa.time_to_frames([90110], sr=sr)) - slice a the part of the song