[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (10)

Search Parameters:
Keywords = dereverberation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 3138 KiB  
Article
Iteratively Refined Multi-Channel Speech Separation
by Xu Zhang, Changchun Bao, Xue Yang and Jing Zhou
Appl. Sci. 2024, 14(14), 6375; https://doi.org/10.3390/app14146375 - 22 Jul 2024
Viewed by 386
Abstract
The combination of neural networks and beamforming has proven very effective in multi-channel speech separation, but its performance faces a challenge in complex environments. In this paper, an iteratively refined multi-channel speech separation method is proposed to meet this challenge. The proposed method [...] Read more.
The combination of neural networks and beamforming has proven very effective in multi-channel speech separation, but its performance faces a challenge in complex environments. In this paper, an iteratively refined multi-channel speech separation method is proposed to meet this challenge. The proposed method is composed of initial separation and iterative separation. In the initial separation, a time–frequency domain dual-path recurrent neural network (TFDPRNN), minimum variance distortionless response (MVDR) beamformer, and post-separation are cascaded to obtain the first additional input in the iterative separation process. In iterative separation, the MVDR beamformer and post-separation are iteratively used, where the output of the MVDR beamformer is used as an additional input to the post-separation network and the final output comes from the post-separation module. This iteration of the beamformer and post-separation is fully employed for promoting their optimization, which ultimately improves the overall performance. Experiments on the spatialized version of the WSJ0-2mix corpus showed that our proposed method achieved a signal-to-distortion ratio (SDR) improvement of 24.17 dB, which was significantly better than the current popular methods. In addition, the method also achieved an SDR of 20.2 dB on joint separation and dereverberation tasks. These results indicate our method’s effectiveness and significance in the multi-channel speech separation field. Full article
(This article belongs to the Special Issue Advanced Technology in Speech and Acoustic Signal Processing)
Show Figures

Figure 1

Figure 1
<p>The overall structure of the initial separation.</p>
Full article ">Figure 2
<p>The structure of iBeam-TFDPRNN.</p>
Full article ">Figure 3
<p>Spectrographic comparison of the separated speech at different stages.</p>
Full article ">
15 pages, 473 KiB  
Article
Crossband Filtering for Weighted Prediction Error-Based Speech Dereverberation
by Tomer Rosenbaum, Israel Cohen and Emil Winebrand
Appl. Sci. 2023, 13(17), 9537; https://doi.org/10.3390/app13179537 - 23 Aug 2023
Viewed by 999
Abstract
Weighted prediction error (WPE) is a linear prediction-based method extensively used to predict and attenuate the late reverberation component of an observed speech signal. This paper introduces an extended version of the WPE method to enhance the modeling accuracy in the time–frequency domain [...] Read more.
Weighted prediction error (WPE) is a linear prediction-based method extensively used to predict and attenuate the late reverberation component of an observed speech signal. This paper introduces an extended version of the WPE method to enhance the modeling accuracy in the time–frequency domain by incorporating crossband filters. Two approaches to extending the WPE while considering crossband filters are proposed and investigated. The first approach improves the model’s accuracy. However, it increases the computational complexity, while the second approach maintains the same computational complexity as the conventional WPE while still achieving improved accuracy and comparable performance to the first approach. To validate the effectiveness of the proposed methods, extensive simulations are conducted. The experimental results demonstrate that both methods outperform the conventional WPE regarding dereverberation performance. These findings highlight the potential of incorporating crossband filters in improving the accuracy and efficacy of the WPE method for dereverberation tasks. Full article
(This article belongs to the Section Acoustics and Vibrations)
Show Figures

Figure 1

Figure 1
<p>Performance of conventional WPE for different filter lengths: (<b>a</b>) FWSegSNR—optimal length is 15. (<b>b</b>) CD—optimal length is 14. (<b>c</b>) PESQ—optimal length is 20.</p>
Full article ">Figure 2
<p>Performance evaluation (Ext.) for <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mi>bb</mi> </msub> <mo>=</mo> <mn>15</mn> </mrow> </semantics></math> (optimal FWSegSNR) in terms of (<b>a</b>) FWSegSNR, (<b>b</b>) CD, and (<b>c</b>) PESQ.</p>
Full article ">Figure 3
<p>Performance evaluation (Ext.) for <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mi>bb</mi> </msub> <mo>=</mo> <mn>14</mn> </mrow> </semantics></math> (optimal CD) in terms of (<b>a</b>) FWSegSNR, (<b>b</b>) CD, and (<b>c</b>) PESQ.</p>
Full article ">Figure 4
<p>Performance evaluation (Ext.) for <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mi>bb</mi> </msub> <mo>=</mo> <mn>20</mn> </mrow> </semantics></math> (optimal PESQ) in terms of (<b>a</b>) FWSegSNR, (<b>b</b>) CD, and (<b>c</b>) PESQ.</p>
Full article ">Figure 5
<p>Performance evaluation (Pres.) for <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mi>bb</mi> </msub> <mo>=</mo> <mn>15</mn> </mrow> </semantics></math> (optimal FWSegSNR) in terms of (<b>a</b>) FWSegSNR, (<b>b</b>) CD, and (<b>c</b>) PESQ.</p>
Full article ">Figure 6
<p>Performance evaluation (Pres.) for <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mi>bb</mi> </msub> <mo>=</mo> <mn>14</mn> </mrow> </semantics></math> (optimal CD) in terms of (<b>a</b>) FWSegSNR, (<b>b</b>) CD, and (<b>c</b>) PESQ.</p>
Full article ">Figure 7
<p>Performance evaluation (Pres.) for <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mi>bb</mi> </msub> <mo>=</mo> <mn>20</mn> </mrow> </semantics></math> (optimal PESQ) in terms of (<b>a</b>) FWSegSNR, (<b>b</b>) CD, and (<b>c</b>) PESQ.</p>
Full article ">
16 pages, 4394 KiB  
Article
Acoustic Feature Extraction Method of Rotating Machinery Based on the WPE-LCMV
by Peng Wu, Gongye Yu, Naiji Dong and Bo Ma
Machines 2022, 10(12), 1170; https://doi.org/10.3390/machines10121170 - 6 Dec 2022
Cited by 2 | Viewed by 1314
Abstract
Fault diagnosis plays an important role in the safe and stable operation of rotating machinery, which is conducive to industrial development and economic improvement. However, effective feature extraction of rotating machinery fault diagnosis is difficult in the complex sound field with characteristics of [...] Read more.
Fault diagnosis plays an important role in the safe and stable operation of rotating machinery, which is conducive to industrial development and economic improvement. However, effective feature extraction of rotating machinery fault diagnosis is difficult in the complex sound field with characteristics of reverberation and multi-dimensional signals. Therefore, this paper proposes a novel acoustic feature extraction method of the rotating machinery based on the Weighted Prediction Error (WPE) integrating the Linear Constrained Minimum Variance (LCMV). The de-reverberation signal is obtained by inputting multi-channel signals into the WPE algorithm using an adaptive optimal parameters selection function with the sound field changes. Then, the incident angle going from the fault source to the center of the microphone array is calculated from the full-band sound field distribution, and the signal is de-noised and fused using the LCMV. Finally, the fault feature frequency is extracted from the fused signal envelope spectrum. The results of fault data analysis of the centrifugal pump test bench show that the Envelope Harmonic Noise Ratio (EHNR) is more than twice that of the original signal after the WPE-LCMV processing. Compared to the Recursive Least Squares and the Resonance Sparse Signal Decomposition (RLS-RSSD) and the parameter optimized Variational Mode Decomposition (VMD), the EHNR has a higher value for all types of faults after applying the WPE-LCMV processing. Furthermore, the proposed method can effectively extract the frequency of bearing faults. Full article
(This article belongs to the Section Machines Testing and Maintenance)
Show Figures

Figure 1

Figure 1
<p>The principle of predicting the late reverberation using WPE.</p>
Full article ">Figure 2
<p>Process of fault feature extraction method based on the WPE-LCMV.</p>
Full article ">Figure 3
<p>Layout of the centrifugal pump test bench, the rectangular microphone array and the data collector.</p>
Full article ">Figure 4
<p>Optimal parameter selection results.</p>
Full article ">Figure 5
<p>Time–frequency spectrum: (<b>a</b>) the first microphone signal; and (<b>b</b>) its de-reverberation signal.</p>
Full article ">Figure 6
<p>Sound field distribution of the centrifugal pump.</p>
Full article ">Figure 7
<p>The waveform of (<b>a</b>) the first microphone original signal and (<b>b</b>) the fused signal.</p>
Full article ">Figure 8
<p>The envelope spectrum of (<b>a</b>) the first microphone original signal and (<b>b</b>) after WPE-LCMV analysis.</p>
Full article ">Figure 9
<p>Envelope spectrum after applying (<b>a</b>) the RLS-RSSD analysis and (<b>b</b>) the parameter-optimized VMD.</p>
Full article ">
22 pages, 2743 KiB  
Article
Effective Dereverberation with a Lower Complexity at Presence of the Noise
by Fengqi Tan, Changchun Bao and Jing Zhou
Appl. Sci. 2022, 12(22), 11819; https://doi.org/10.3390/app122211819 - 21 Nov 2022
Cited by 4 | Viewed by 1499
Abstract
Adaptive beamforming and deconvolution techniques have shown effectiveness for reducing noise and reverberation. The minimum variance distortionless response (MVDR) beamformer is the most widely used for adaptive beamforming, whereas multichannel linear prediction (MCLP) is an excellent approach for the deconvolution. How to solve [...] Read more.
Adaptive beamforming and deconvolution techniques have shown effectiveness for reducing noise and reverberation. The minimum variance distortionless response (MVDR) beamformer is the most widely used for adaptive beamforming, whereas multichannel linear prediction (MCLP) is an excellent approach for the deconvolution. How to solve the problem where the noise and reverberation occur together is a challenging task. In this paper, the MVDR beamformer and MCLP are effectively combined for noise reduction and dereverberation. Especially, the MCLP coefficients are estimated by the Kalman filter and the MVDR filter based on the complex Gaussian mixture model (CGMM) is used to enhance the speech corrupted by the reverberation with the noise and to estimate the power spectral density (PSD) of the target speech required by the Kalman filter, respectively. The final enhanced speech is obtained by the Kalman filter. Furthermore, a complexity reduction method with respect to the Kalman filter is also proposed based on the Kronecker product. Compared to two advanced algorithms, the integrated sidelobe cancellation and linear prediction (ISCLP) method and the weighted prediction error (WPE) method, which are very effective for removing reverberation, the proposed algorithm shows better performance and lower complexity. Full article
(This article belongs to the Special Issue Advances in Speech and Language Processing)
Show Figures

Figure 1

Figure 1
<p>Block diagram of the proposed dereverberation method.</p>
Full article ">Figure 2
<p>Simulation scene of a room considering 20 source locations and 8 microphones [<a href="#B34-applsci-12-11819" class="html-bibr">34</a>].</p>
Full article ">Figure 3
<p>Test results related to the filter order <span class="html-italic">L</span> between the proposed method (red line) and ISCLP (blue line).</p>
Full article ">Figure 3 Cont.
<p>Test results related to the filter order <span class="html-italic">L</span> between the proposed method (red line) and ISCLP (blue line).</p>
Full article ">Figure 4
<p>Test results related to <span class="html-italic">T</span><sub>60</sub> on four initialization methods of the PSD of the desired signal with real PSD (green line), [<a href="#B14-applsci-12-11819" class="html-bibr">14</a>,<a href="#B15-applsci-12-11819" class="html-bibr">15</a>] (black line), proposed (red line), ISCLP (blue line), and [<a href="#B34-applsci-12-11819" class="html-bibr">34</a>] (cyan line).</p>
Full article ">Figure 5
<p>Spectrogram comparison in the case of <span class="html-italic">SNR</span> = 10 dB and <span class="html-italic">T</span><sub>60</sub> = 800 ms.</p>
Full article ">
20 pages, 5795 KiB  
Article
A Multi-Source Separation Approach Based on DOA Cue and DNN
by Yu Zhang, Maoshen Jia, Xinyu Jia and Tun-Wen Pai
Appl. Sci. 2022, 12(12), 6224; https://doi.org/10.3390/app12126224 - 19 Jun 2022
Viewed by 1468
Abstract
Multiple sound source separation in a reverberant environment has become popular in recent years. To improve the quality of the separated signal in a reverberant environment, a separation method based on a DOA cue and a deep neural network (DNN) is proposed in [...] Read more.
Multiple sound source separation in a reverberant environment has become popular in recent years. To improve the quality of the separated signal in a reverberant environment, a separation method based on a DOA cue and a deep neural network (DNN) is proposed in this paper. Firstly, a pre-processing model based on non-negative matrix factorization (NMF) is utilized for recorded signal dereverberation, which makes source separation more efficient. Then, we propose a multi-source separation algorithm combining sparse and non-sparse component points recovery to obtain each sound source signal from the dereverberated signal. For sparse component points, the dominant sound source for each sparse component point is determined by a DOA cue. For non-sparse component points, a DNN is used to recover each sound source signal. Finally, the signals separated from the sparse and non-sparse component points are well matched by temporal correlation to obtain each sound source signal. Both objective and subjective evaluation results indicate that compared with the existing method, the proposed separation approach shows a better performance in the case of a high-reverberation environment. Full article
(This article belongs to the Special Issue Immersive 3D Audio: From Architecture to Automotive)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram for the proposed method for multi-source separation.</p>
Full article ">Figure 2
<p>The TF spectrums of different signals: (<b>a</b>) the original speech signal; (<b>b</b>) the recorded signal with <math display="inline"><semantics> <mrow> <msub> <mi>T</mi> <mrow> <mn>60</mn> </mrow> </msub> <mo>=</mo> <mn>300</mn> <mrow> <mo> </mo> <mi>ms</mi> </mrow> </mrow> </semantics></math>.</p>
Full article ">Figure 3
<p>Logarithmic spectrum of an RIR signal with <math display="inline"><semantics> <mrow> <msub> <mi>T</mi> <mrow> <mn>60</mn> </mrow> </msub> <mo>=</mo> <mn>600</mn> <mo> </mo> <mi>ms</mi> <mo>.</mo> </mrow> </semantics></math></p>
Full article ">Figure 4
<p>The TF spectrum of the dereverberated signal.</p>
Full article ">Figure 5
<p>Schematic diagram for sparse and non-sparse component points. The rectangular boxes with a single color represent sparse component points; Rectangular boxes with multiple colors represent non-sparse component points.</p>
Full article ">Figure 6
<p>The normalized statistical histogram of DOA estimates: (<b>a</b>) is for total TF points; (<b>b</b>) is for sparse component points.</p>
Full article ">Figure 7
<p>Illustration of network structure.</p>
Full article ">Figure 8
<p>Average PESQ of the separated signals from sparse component points, non-sparse component points and the whole TF points.</p>
Full article ">Figure 9
<p>Average PESQ of signals separated by the proposed method with or without dereverberation.</p>
Full article ">Figure 10
<p>Average STOI of signals separated by the proposed method with or without dereverberation.</p>
Full article ">Figure 11
<p>Average PESQ in different reverberant rooms.</p>
Full article ">Figure 12
<p>Average SDR and SIR in different reverberant rooms: (<b>a</b>) average SDR; (<b>b</b>) average SIR.</p>
Full article ">Figure 13
<p>MUSHRA test results under the ideal acoustic case with 95% confidence intervals.</p>
Full article ">Figure 14
<p>MUSHRA test results under the reverberant case with 95% confidence intervals.</p>
Full article ">
9 pages, 4738 KiB  
Communication
Deep Learning-Based Estimation of Reverberant Environment for Audio Data Augmentation
by Deokgyu Yun and Seung Ho Choi
Sensors 2022, 22(2), 592; https://doi.org/10.3390/s22020592 - 13 Jan 2022
Cited by 6 | Viewed by 2434
Abstract
This paper proposes an audio data augmentation method based on deep learning in order to improve the performance of dereverberation. Conventionally, audio data are augmented using a room impulse response, which is artificially generated by some methods, such as the image method. The [...] Read more.
This paper proposes an audio data augmentation method based on deep learning in order to improve the performance of dereverberation. Conventionally, audio data are augmented using a room impulse response, which is artificially generated by some methods, such as the image method. The proposed method estimates a reverberation environment model based on a deep neural network that is trained by using clean and recorded audio data as inputs and outputs, respectively. Then, a large amount of a real augmented database is constructed by using the trained reverberation model, and the dereverberation model is trained with the augmented database. The performance of the augmentation model was verified by a log spectral distance and mean square error between the real augmented data and the recorded data. In addition, according to dereverberation experiments, the proposed method showed improved performance compared with the conventional method. Full article
(This article belongs to the Special Issue Acoustic Event Detection and Sensing)
Show Figures

Figure 1

Figure 1
<p>Example of acoustic transmission in a room.</p>
Full article ">Figure 2
<p>Block diagram of (<b>a</b>) conventional method and (<b>b</b>) proposed method.</p>
Full article ">Figure 3
<p>Block diagram of reverberant environment estimation using CNN model.</p>
Full article ">Figure 4
<p>The room impulse response of the experiments.</p>
Full article ">Figure 5
<p>The room transfer function of the experiments.</p>
Full article ">Figure 6
<p>Waveform examples of (<b>a</b>) the recorded signal, (<b>b</b>) the signal artificially generated by RIR, and (<b>c</b>) the signal generated by the proposed method.</p>
Full article ">Figure 7
<p>Spectrogram examples of (<b>a</b>) the recorded signal, (<b>b</b>) the signal artificially generated by RIR, and (<b>c</b>) the signal generated by the proposed method.</p>
Full article ">Figure 8
<p>Block diagram of dereverberation model.</p>
Full article ">Figure 9
<p>Spectrogram examples of dereverberated signal by (<b>a</b>) the RIR method and (<b>b</b>) the proposed method.</p>
Full article ">
11 pages, 990 KiB  
Communication
Late Reverberant Spectral Variance Estimation for Single-Channel Dereverberation Using Adaptive Parameter Estimator
by Zhaoqi Zhang, Xuelei Feng and Yong Shen
Appl. Sci. 2021, 11(17), 8054; https://doi.org/10.3390/app11178054 - 30 Aug 2021
Viewed by 1777
Abstract
The estimation of the late reverberant spectral variance (LRSV) is of paramount importance in most reverberation suppression algorithms. This letter proposes an improved single-channel LRSV estimator based on Habets LRSV estimator by using an adaptive parameter estimator. Instead of estimating the direct-to-reverberation ratio [...] Read more.
The estimation of the late reverberant spectral variance (LRSV) is of paramount importance in most reverberation suppression algorithms. This letter proposes an improved single-channel LRSV estimator based on Habets LRSV estimator by using an adaptive parameter estimator. Instead of estimating the direct-to-reverberation ratio (DRR), the proposed LRSV estimator directly estimates the parameter κ in a generalized statistical model since the experimental results show that even the κ calculated using measured ground truth DRR may not be the optimal parameter for the LRSV estimator. Experimental results using synthetic reverberant signals demonstrate the superiority of the proposed estimator to conventional approaches. Full article
(This article belongs to the Special Issue Sound Field Control)
Show Figures

Figure 1

Figure 1
<p>Plot of signal used in experiment with and without reverberation. (<b>a</b>) with reverberation; (<b>b</b>) without reverberation.</p>
Full article ">Figure 2
<p>Plot of <math display="inline"><semantics> <mover> <mrow> <mi>L</mi> <mi>S</mi> <mi>D</mi> </mrow> <mo>¯</mo> </mover> </semantics></math> as a function of <math display="inline"><semantics> <mi>κ</mi> </semantics></math> for (<b>a</b>–<b>f</b>): <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>1</mn> </msub> <mo>∼</mo> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>6</mn> </msub> </mrow> </semantics></math>. <math display="inline"><semantics> <mover> <mrow> <mi>L</mi> <mi>S</mi> <mi>D</mi> </mrow> <mo>¯</mo> </mover> </semantics></math> for the measured ground truth <math display="inline"><semantics> <mi>κ</mi> </semantics></math>(fullband and subband), proposed <math display="inline"><semantics> <mi>κ</mi> </semantics></math> estimator and conventional <math display="inline"><semantics> <mi>κ</mi> </semantics></math> estimator are presented as reference lines for comparison. (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>1</mn> </msub> </mrow> </semantics></math>; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>2</mn> </msub> </mrow> </semantics></math>; (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>3</mn> </msub> </mrow> </semantics></math>; (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>4</mn> </msub> </mrow> </semantics></math>; (<b>e</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>5</mn> </msub> </mrow> </semantics></math>; (<b>f</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>6</mn> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 3
<p>Mean and standard deviation of log error <math display="inline"><semantics> <mrow> <mi>e</mi> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> </semantics></math> for different RSNR. The means are indicated by symbols (circle, cross, etc.), and the semi-variances are indicated by whisker bars.</p>
Full article ">Figure 4
<p>Plots of the measured ground truth <math display="inline"><semantics> <mi>κ</mi> </semantics></math> and the scanning-optimal <math display="inline"><semantics> <mi>κ</mi> </semantics></math> for each AIR.</p>
Full article ">Figure 5
<p>Plot of scanning-optimal <math display="inline"><semantics> <mi>κ</mi> </semantics></math> using different speech signals for (<b>a</b>–<b>f</b>): <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>1</mn> </msub> <mo>∼</mo> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>6</mn> </msub> </mrow> </semantics></math>. The measured <math display="inline"><semantics> <mi>κ</mi> </semantics></math> is also presented as reference line for comparison. (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>1</mn> </msub> </mrow> </semantics></math>; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>2</mn> </msub> </mrow> </semantics></math>; (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>3</mn> </msub> </mrow> </semantics></math>; (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>4</mn> </msub> </mrow> </semantics></math>; (<b>e</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>5</mn> </msub> </mrow> </semantics></math>; (<b>f</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <mi>I</mi> <msub> <mi>R</mi> <mn>6</mn> </msub> </mrow> </semantics></math>.</p>
Full article ">
20 pages, 5998 KiB  
Article
De-Noising Process in Room Impulse Response with Generalized Spectral Subtraction
by Min Chen and Chang-Myung Lee
Appl. Sci. 2021, 11(15), 6858; https://doi.org/10.3390/app11156858 - 26 Jul 2021
Cited by 3 | Viewed by 1833
Abstract
The generalized spectral subtraction algorithm (GBSS), which has extraordinary ability in background noise reduction, is historically one of the first approaches used for speech enhancement and dereverberation. However, the algorithm has not been applied to de-noise the room impulse response (RIR) to extend [...] Read more.
The generalized spectral subtraction algorithm (GBSS), which has extraordinary ability in background noise reduction, is historically one of the first approaches used for speech enhancement and dereverberation. However, the algorithm has not been applied to de-noise the room impulse response (RIR) to extend the reverberation decay range. The application of the GBSS algorithm in this study is stated as an optimization problem, that is, subtracting the noise level from the RIR while maintaining the signal quality. The optimization process conducted in the measurements of the RIRs with artificial noise and natural ambient noise aims to determine the optimal sets of factors to achieve the best noise reduction results regarding the largest dynamic range improvement. The optimal factors are set variables determined by the estimated SNRs of the RIRs filtered in the octave band. The acoustic parameters, the reverberation time (RT), and early decay time (EDT), and the dynamic range improvement of the energy decay curve were used as control measures and evaluation criteria to ensure the reliability of the algorithm. The de-noising results were compared with noise compensation methods. With the achieved optimal factors, the GBSS contributes to a significant effect in terms of dynamic range improvement and decreases the estimation errors in the RTs caused by noise levels. Full article
(This article belongs to the Section Acoustics and Vibrations)
Show Figures

Figure 1

Figure 1
<p>Dynamic range improvement of the EDC between cross-point A at a noisy RIR and cross-point B at RIR after the GBSS.</p>
Full article ">Figure 2
<p>Comparison of the noisy RIRs and de-noised RIR at 1 kHz by varying the noise spectral floor <math display="inline"><semantics> <mi>β</mi> </semantics></math> for a fixed value of <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mn>0</mn> </msub> </mrow> </semantics></math>. (<b>a</b>) The EDCs of the RIRs. (<b>b</b>) Power spectrum of the RIRs obtained using the FFT. (<b>c</b>,<b>d</b>) RIRs in the time domain.</p>
Full article ">Figure 3
<p>Comparison of the noisy RIRs and de-noised RIR at 1 kHz by varying the noise over-subtraction <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mn>0</mn> </msub> </mrow> </semantics></math> for a fixed value of <math display="inline"><semantics> <mi>β</mi> </semantics></math>. (<b>a</b>,<b>b</b>) Generated EDCs of the RIRs. (<b>c</b>,<b>d</b>) De-noised RIRs with <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mn>0</mn> </msub> </mrow> </semantics></math> 4 and 6 in the time domain.</p>
Full article ">Figure 4
<p>De-noising effects at the knee with various factors of <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mn>0</mn> </msub> </mrow> </semantics></math>. (<b>a</b>) Comparison of the dynamic range at the cross point. (<b>b</b>–<b>d</b>) Comparison of the de-noised RIRs to the reference RIRs at the knee: factor of (<b>b</b>) 3, (<b>c</b>) 4.25, and (<b>d</b>) 6 applied.</p>
Full article ">Figure 5
<p>De-noising effects using the optimal set factors of the GBSS on the RIRs of the meeting room with different added noises filtered at the octave bands. The red dotted line is the EDCs with the pink noise. The green dotted line is the EDCs with white noise.</p>
Full article ">Figure 6
<p>Comparison of the RTs of the filtered RIRs at octave bands with pink noise and white noise with noise levels of −60 dB.</p>
Full article ">Figure 7
<p>Results of the optimal <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mn>0</mn> </msub> </mrow> </semantics></math> for different SNRs of three filtered RIRs in octave bands with two types of noise and various noise levels (−65 dB to −40 dB estimated in broadband RIRs). (<b>a</b>,<b>b</b>) Optimal <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mn>0</mn> </msub> </mrow> </semantics></math> for RIRs with noise levels −60 dB and corresponding SNRs. (<b>c</b>) Relationship of the optimal <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mn>0</mn> </msub> </mrow> </semantics></math> and SNR. (<b>d</b>) Dynamic range improvement.</p>
Full article ">Figure 8
<p>Effects of the GBSS algorithm and compensation method of the measured RIRs filtered in octave bands (250 Hz and 2 kHz): (<b>a</b>,<b>b</b>) comparison results for room A; (<b>c</b>,<b>d</b>) comparison results for room B.</p>
Full article ">Figure 9
<p>Comparison of the GBSS algorithm and the compensation method. (<b>a</b>) Reverberation time relative errors. (<b>b</b>) Early decay time relative errors.</p>
Full article ">
13 pages, 411 KiB  
Article
Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
by Kyoungjin Noh and Joon-Hyuk Chang
Sensors 2020, 20(7), 1883; https://doi.org/10.3390/s20071883 - 28 Mar 2020
Cited by 12 | Viewed by 3554
Abstract
In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under [...] Read more.
In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, the STFT coefficients of the dereverberated multi-channel audio signals are conveyed to the DNN-supported minimum variance distortionless response (MVDR) beamformer in which DNN-supported MVDR beamforming is carried out with the source and noise masks estimated by the DNN. As a result, the single-channel enhanced STFT coefficients are shown at the output and tossed to the CRNN-based SED system, and then, the three modules are jointly trained by the single loss function designed for SED. Furthermore, to ease the difficulty of training a deep learning model for SED caused by the imbalance in the amount of data for each class, the focal loss is used as a loss function. Experimental results show that joint training of DNN-supported dereverberation and beamforming with the SED model under the supervision of focal loss significantly improves the performance under the noisy and reverberant environments. Full article
Show Figures

Figure 1

Figure 1
<p>Block diagram of the proposed system. WPE, weighted prediction error; MVDR, minimum variance distortionless response; SED, sound event detection.</p>
Full article ">Figure 2
<p>Overview of the SED model: (<b>a</b>) convolutional block of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 3 baseline [<a href="#B18-sensors-20-01883" class="html-bibr">18</a>]; (<b>b</b>) CRNN-based SED model; (<b>c</b>) proposed convolutional block. LMFB, log-scale Mel filter bank.</p>
Full article ">
15 pages, 2096 KiB  
Article
A Novel Scheme for Single-Channel Speech Dereverberation
by Nikolaos Kilis and Nikolaos Mitianoudis
Acoustics 2019, 1(3), 711-725; https://doi.org/10.3390/acoustics1030042 - 5 Sep 2019
Cited by 8 | Viewed by 3909
Abstract
This paper presents a novel scheme for speech dereverberation. The core of our method is a two-stage single-channel speech enhancement scheme. Degraded speech obtains a sparser representation of the linear prediction residual in the first stage of our proposed scheme by applying orthogonal [...] Read more.
This paper presents a novel scheme for speech dereverberation. The core of our method is a two-stage single-channel speech enhancement scheme. Degraded speech obtains a sparser representation of the linear prediction residual in the first stage of our proposed scheme by applying orthogonal matching pursuit on overcomplete bases, trained by the K-SVD algorithm. Our method includes an estimation of reverberation and mixing time from a recorded hand clap or a simulated room impulse response, which are used to create a time-domain envelope. Late reverberation is suppressed at the second stage by estimating its energy from the previous envelope and removed with spectral subtraction. Further speech enhancement is applied on minimizing the background noise, based on optimal smoothing and minimum statistics. Experimental results indicate favorable quality, compared to two state-of-the-art methods, especially in real reverberant environments with increased reverberation and background noise. Full article
Show Figures

Figure 1

Figure 1
<p>The proposed dereverberation system. LP: Linear-Prediction; OMP: Orthogonal Matching Pursuit; RIR: Room Impulse Response.</p>
Full article ">Figure 2
<p>The effect of sparsification on the reverberant speech residual.</p>
Full article ">Figure 3
<p>An example of mixing time estimation using the normalized kurtosis method and a hand clap for a frame size of <math display="inline"><semantics> <mrow> <mn>11.6</mn> </mrow> </semantics></math> ms. The estimated mixing time and the theoretical mixing time are very close.</p>
Full article ">Figure 4
<p>Mixing time estimation for different frame sizes.</p>
Full article ">Figure 5
<p>RIR envelope <math display="inline"><semantics> <mrow> <mi>e</mi> <mi>n</mi> <mi>v</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math> estimation for a sample room. Envelope 1 uses the analytical signal via the Hilbert Transform, while Envelope 2 uses a simple low-pass filtering scheme.</p>
Full article ">Figure 6
<p>Time-domain and frequency-domain comparison between the original speech data, the reverberated speech in the “classroom” and the dereverberated output from the three approaches: (<b>c</b>) The Wu and Wang (WW) method; (<b>d</b>) Spendred; and (<b>e</b>) the proposed method.</p>
Full article ">
Back to TopTop