Digital Signal Processing
Assignment P29
——Abhishek Bindra—Anuj Dubey—Daravathu Vennela——
Nov 13, 2024
1
Task 1:
The first step involves loading the audio signal from a file named safety.wav, and it
contains a sound uttered by a male speaker, saying the word ”safety”. The signal was
recorded in a computer lab, resulting in noticeable background noise, and by plotting the
waveform in the time domain, we can visually examine where the speech occurs, as well
as the presence of silence and noise.
• Imports: Used numpy for handling numerical data, matplotlib.pyplot for plot-
ting the waveform, and wave for reading the WAV audio file.
• Load Audio Data: Opened safety.wav in read-binary mode. Retrieved sample
rate (sampling frequency) and number of samples (total frames), then converted
audio frames to a NumPy array (audio data) of 16-bit integers.
• Duration Calculation: Calculated the duration of the audio in seconds by divid-
ing the total number of samples by the sample rate.
• Time Axis Creation: Created a time array for x-axis values, spanning the dura-
tion of audio data based on the sample rate.
• Energy Calculation in Frames: Calculated energy over 100 ms frames, using a
frame size based on the sample rate. This allows for segmenting the signal to detect
voiced/unvoiced sections.
• Threshold for Voiced Detection: Set a threshold for detecting voiced frames,
based on the 60th percentile of calculated frame energies. Marked frames with
energy above this threshold as voiced.
• Plot Waveform with Voiced Segments: Configured plot size for clarity and
plotted time vs. audio data. Highlighted voiced segments in orange, added grid
lines, title, and axis labels.
Observations: The time-domain plot reveals distinct sections where speech occurs,
which are characterized by higher amplitudes and highlighted as voiced. Silent or back-
ground noise sections show lower amplitude values, with no highlighting. This provides
a foundation for further analysis and segmentation of the signal.
2
Task 2:
In this task, we load an audio signal from a file, process it, and visualize both its time-
domain waveform and frequency spectrum using a custom FFT implementation.
• Imports: Used numpy for numerical computations, matplotlib.pyplot for plot-
ting, wave for reading the WAV audio file, and struct for unpacking the audio data
into a numerical format.
• Load Audio Function:
– Opened the WAV file using wave.open().
– Extracted important parameters: number of frames, channels, sample width,
and frame rate.
– Read raw audio data and unpacked it using struct.unpack().
– Normalized the signal to scale the amplitude between -1 and 1.
– Generated a time array for the x-axis, based on the sample rate and number
of samples.
• FFT Function: Implemented a custom FFT using the divide-and-conquer method.
– Checked if the signal length is a power of 2; if not, padded the signal to the
next power of 2.
– Recursively computed FFT for even and odd indexed samples.
– Combined the results of the even and odd FFTs to compute the final spectrum.
• Plot Time-Domain Signal:
– Plotted the raw audio signal against time, showing the amplitude variations
in the time domain.
– Added labels, gridlines, and a legend to the plot.
• Compute and Plot Frequency Spectrum:
– Generated frequency bins based on the sample rate and number of samples.
– Applied the custom FFT function to calculate the frequency spectrum.
– Plotted the magnitude of the positive frequencies (first half of the spectrum)
on a logarithmic scale.
Observations: The time-domain plot shows the variations in amplitude of the raw
audio signal, allowing for visualization of the sound’s intensity over time. The frequency
spectrum, computed using the custom FFT, reveals the distribution of energy across
different frequencies, helping to identify prominent frequency components of the signal.
3
Task 3:
In this task, the speech signal is segmented into voiced, unvoiced, and silence portions
using two key features: Short-Term Energy (STE) and Zero-Crossing Rate (ZCR).
• Short-Term Energy (STE):
– Definition: STE represents the intensity of the signal within a short time
frame. It is calculated as the sum of the squared amplitudes of the signal
samples in each frame.
– Formula:
X−1
n+N
E= x[i]2
i=n
where x[i] is the amplitude of the signal at sample i, and N is the frame size.
– Usage:
∗ Voiced Segments: High energy due to louder speech portions.
∗ Unvoiced Segments: Moderate energy for fricatives and consonants.
∗ Silence: Low energy corresponding to background noise or quiet sections.
• Zero-Crossing Rate (ZCR):
– Definition: ZCR measures the rate at which the signal amplitude crosses
zero (changes sign) within a frame. It indicates the frequency of amplitude
variations.
– Formula:
1 X−1
n+N
ZCR = ∗ |sgn(x[i]) − sgn(x[i − 1])|
2N i=n
where:
1 if x[i] > 0,
sgn(x[i]) = −1 if x[i] < 0,
0 if x[i] = 0.
– Usage:
∗ Voiced Segments: Low ZCR due to periodic nature of voiced sounds.
∗ Unvoiced Segments: High ZCR reflecting noise-like consonants.
4
• Segmentation Process:
– The signal is divided into non-overlapping frames of 20 ms (frame size = 0.02×
sampling rate).
– For each frame, STE and ZCR are computed.
– Frames are classified as:
∗ Voiced: E > 0.05 (high energy) and ZCR < 0.1 (low ZCR).
∗ Unvoiced: E > 0.05 (moderate energy) and ZCR ≥ 0.1 (high ZCR).
∗ Silence: E < 0.05 (low energy).
• Visualization:
– The time-domain waveform is plotted with regions marked for voiced, un-
voiced, and silence.
– Each segment type is highlighted with unique colors:
∗ Voiced: High energy and low ZCR.
∗ Unvoiced: Moderate energy and high ZCR.
∗ Silence: Low energy irrespective of ZCR.
Task 4:
• Define Parameters and Segment Signal
– frame size is set to represent 20 ms of audio data.
– energy threshold and zcr threshold are used as reference values for
distinguishing voiced, unvoiced, and silent frames.
– The segment signal function is called to segment the audio based on energy
and ZCR thresholds.
• Initialize Plot for Segmented Signal
– time frames array is created to map each frame to its corresponding time.
– A plot is created to display the time-domain waveform of the audio signal.
• Plotting ZCR Distribution
– The compute zcr function calculates the ZCR for each frame.
– This plot shows the ZCR Distribution.
5
Task 5:
The code designs and visualizes a low-pass FIR (Finite Impulse Response) filter using
the windowing method (Hamming window). Here are the details:
• Filter Design
– Cutoff Frequency: We have set this to 1 kHz, determining the highest fre-
quency the filter will allow to pass through with minimal attenuation.
– Sampling Rate (Nyquist Rate): This is half the sampling rate of the input
signal, defining the highest frequency that can be captured without aliasing.
We have normalized the cutoff frequency against this Nyquist rate.
– Filter Length (Number of Taps): We have chosen 101 taps, representing
the number of filter coefficients. Higher tap counts give sharper roll-off around
the cutoff frequency but increase computational complexity.
• Windowing Method
– Sinc Function: We have used a sinc function to calculate the ideal low-pass
filter’s impulse response, as it is well-suited to filter design and provides a
sharp frequency cut-off.
– Centering: We have set the filter’s center coefficient to ensure proper align-
ment of the cutoff frequency, which is critical for maintaining signal phase
accuracy.
– Hamming Window: We have multiplied the sinc function coefficients by
a Hamming window. This windowing smooths the edges, reducing spectral
leakage and side lobes in the filter’s frequency response.
Task 6:
• apply filter Function:
– The apply filter function accepts two arguments: the input signal and the
filter coeffs (FIR filter coefficients).
– The function uses the np.convolve() function to apply the FIR filter to the
signal. This operation convolves the signal with the filter’s impulse response,
resulting in the signal being smoothed by the filter coefficients.
– The mode=’same’ argument ensures that the length of the output signal matches
the length of the input signal, preventing distortion at the boundaries.
6
– The output of the convolution is the filtered signal, which represents the
original signal with the high-frequency components reduced or eliminated by
the low-pass filter.
• Apply FIR Filter:
– The FIR filter, designed with a cutoff frequency of 1 kHz, is applied to the
signal using the apply filter function.
– The result is stored in the variable filtered signal, which contains the signal
after filtering out high-frequency noise or undesired components.
• Time-Domain Plot:
– A plot is generated comparing the original signal and the filtered signal in
the time domain.
– The original signal is plotted with a reduced opacity (alpha=0.6), making
it less prominent to highlight the filtered signal.
– The filtered signal is plotted with higher opacity (alpha=0.8) to clearly
distinguish it from the original signal.
– This comparison visually demonstrates how the FIR filter smooths the signal,
especially by reducing high-frequency variations and noise.
• Frequency-Domain Plot:
– The FFT function is used to compute the frequency spectrum of both the
original and the filtered signals.
– The original spectrum displays the frequency content of the unfiltered signal,
indicating the magnitude of each frequency component.
– The filtered spectrum displays the frequency content of the signal after
the FIR filter has been applied. Since the filter is low-pass, we expect to
see reduced magnitude at higher frequencies, demonstrating that the filter
attenuates the high-frequency components.
– Both spectra are plotted against frequency (in Hz), and only the first half of
the spectrum is shown ([:len(freqs)//2]) because the frequency spectrum
of real-valued signals is symmetric.
– This frequency-domain comparison highlights the effectiveness of the filter in
attenuating frequencies above the cutoff frequency (1 kHz).
The noise reduction in this case is achieved by applying a low-pass FIR filter, which
attenuates high-frequency components of the signal while allowing lower frequencies to
7
pass. The filter is designed to have a cutoff at 1 kHz, effectively removing rapid fluctua-
tions and high-frequency noise. By convolving the signal with the filter coefficients, the
high-frequency components are smoothed, resulting in a cleaner signal. In the frequency
domain, this is evident as the filtered signal shows reduced magnitudes at higher frequen-
cies, confirming that the noise, typically found in these high-frequency ranges, has been
attenuated.
Task 7:
• Compute and Plot the PSD: We compute the Power Spectral Density (PSD) for
each segment of the speech signal (voiced, unvoiced, and silenced) using the FFT.
The PSD reveals how power is distributed across frequencies, helping to differentiate
between the segments:
– Voiced segments: These show prominent energy at lower frequencies, re-
flecting the periodic nature of speech (vocal cord vibrations).
– Unvoiced segments: These show a more broadband distribution of energy
across higher frequencies, typical of sounds like ”s” or ”sh”.
– Silenced segments: These have very little to no energy across the frequency
spectrum, representing periods of silence.
• Discussion of PSD Differences: The PSD plots of each segment highlight the
following differences:
– Voiced speech: Shows distinct peaks at lower frequencies (fundamental and
harmonics).
– Unvoiced speech: Exhibits more spread-out energy in the higher frequencies.
– Silence: Shows negligible energy, making it easy to distinguish from the oth-
ers.
This helps in classifying the segments based on their frequency characteristics.
• Effectiveness of the Low-Pass FIR Filter: The low-pass FIR filter is used to
remove high-frequency noise while preserving the speech content:
– Before filtering: The PSD reveals more power at higher frequencies, indi-
cating noise.
– After filtering: The high-frequency components are reduced, and the PSD
shows that the filter successfully preserves the speech content in the lower
frequencies.
8
References
[1] Oppenheim, A. V., & Schafer, R. W. (2009). Discrete-Time Signal Processing (3rd
ed.). Pearson.
[2] Sharma, P., & Bedi, P. (2014). Power Spectral Density Estimation for Noise Reduc-
tion in Speech Processing. Journal of Signal Processing Systems, 75(2), 177–190.
[3] Proakis, J. G., & Manolakis, D. G. (2007). Digital Signal Processing: Principles,
Algorithms, and Applications (4th ed.). Pearson.
[4] Gonzalez, R. C., & Woods, R. E. (2002). Digital Image Processing (2nd ed.). Prentice
Hall.
Conclusion
The Colab notebook containing the implementation and further details of the work can
be accessed using the following link:
https://colab.research.google.com/drive/1IEY8ev10xfQZtZiRODDEqonEA6CPtP7V?usp=sharing