0% found this document useful (0 votes)

24 views18 pages

Sigmoid Functions Gain Function in Speech Enhancement

This paper presents a novel approach to speech enhancement by utilizing optimized sigmoid functions to approximate speech presence probability (SPP) and gain functions, eliminating the need for complex a-priori signal-to-noise ratio (SNR) estimations. The proposed method demonstrates lower computational complexity and improved performance in noisy environments, achieving better speech quality measures compared to existing techniques. Simulation results validate the effectiveness of the optimized sigmoid functions in enhancing speech intelligibility and quality while maintaining low implementation complexity.

Uploaded by

Chandramouli Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views18 pages

Sigmoid Functions Gain Function in Speech Enhancement

Uploaded by

Chandramouli Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Circuits, Systems, and Signal Processing (2024) 43:2891–2908

https://doi.org/10.1007/s00034-023-02549-2

Optimized Sigmoid Functions for Speech Presence

Probability and Gain Function in Speech Enhancement

Hai Huyen Dam1 · Sven Nordholm1 · Pei Chee Yong2 · Siow Yong Low3

Received: 23 November 2022 / Revised: 18 October 2023 / Accepted: 21 October 2023 /

Published online: 22 January 2024
© The Author(s) 2024

Abstract
Speech presence probability (SPP) and gain functions such as Wiener filter or MMSE
estimators require an estimate of the a-priori signal-to-noise ratio (SNR). However,
the estimation of the a-priori SNR is computationally involved and sensitive to noise
variations. This paper proposes to approximate the SPP and the overall gain function
of a speech enhancement system by using sigmoid functions to reduce the need of
estimating the a-prior SNR. By applying an approximation via the sigmoid functions
it is shown that only the a-posteriori estimate of SNR is needed, resulting in a low
complexity system. The sigmoid function is designed with an optimization algorithm
to optimize its parameters with respect to speech quality measures. The optimization
algorithm is based on the idea that the solution obtained for a given problem should
move towards the best solution and avoid the worst solution. The proposed algo-
rithm requires minimal control parameters and does not require any algorithm specific
parameters. Simulation results show that the proposed sigmoid functions achieve good

Sven Nordholm, Pei Chee Yong and Siow Yong Low have been contributed equally to this work.

B Hai Huyen Dam

H.Dam@curtin.edu.au
Sven Nordholm
S.Nordholm@curtin.edu.au
Pei Chee Yong
peichee.yong@nuheara.com
Siow Yong Low
sy.low@soton.ac.uk

1 School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University,

Kent Street, Bentley, Perth, WA 6102, Australia
2 Nuheara, Perth, WA, Australia
3 Connected Intelligence Research Group (CIRG), University of Southampton Malaysia,
79100 Iskandar Puteri, Johor, Malaysia
2892 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

results in terms of speech quality measures when compared with existing methods
while providing significantly lower complexity for implementation.

Keywords Single channel speech enhancement · A-priori SNR estimation · Decision

directed approach · Optimization

1 Introduction

The prevalence of smart devices in our daily lives has pushed for an unprecedented
demand on audio communication systems. As such, the need for a seamless speech
communication system on such devices especially in noisy environments is highly
sought after. An effective way to enhance noisy speech is via single channel speech
enhancement techniques [1, 6, 7]. From the ideas of spectral subtraction by Boll [1],
more optimal methods were developed that optimize MMSE and log MMSE errors [6,
7]. Those methods have highlighted the two main tasks associated with single channel
processing which are noise suppression and speech preservation. However, it is a
challenge to achieve both tasks optimally as suppression and distortion are conflicting
measures, which results in a natural trade-off [13, 14, 19, 20]. For instance, if the
noise estimator makes an erroneous estimation in the noise statistics, it will cause a
mismatch in the noise suppression function. This in turn generates annoying musical
artefacts, which reduce the overall perceptual quality of the enhanced speech [1, 12,
18]. An efficient way to combat the musical noise problem is to improve the noise
spectrum estimation [3, 8, 12]. By using a soft voice activity detector idea based on
the speech presence probability (SPP), significant improvement of the noise spectrum
estimation was achieved. Yong et al. [22] further improved upon those results by using
a modified sigmoid function which incorporates an a-priori SNR estimate to reduce
the latency of the real-time SNR estimation. The modified decision directed approach
[23] overcomes the one-frame delay problem when estimating the a-priori SNR by
matching the estimated clean speech spectrum with the a-priori SNR as opposed to
the previous frame. The reduction in SNR estimation’s latency results in greater noise
suppression and generates less musical noise.
While [22, 23] outlined a means to improve the SNR estimation, the method still
employed the a-priori SNR estimate which was computationally complex and often
gave large variations in the estimate for non-stationary background noise. Enzner [4]
addressed the a-priori SNR problem by using a Bayesian Marginalization technique,
but this required a lot of pre-training. In addition, it required the estimation of the
global a-priori SNR of the speech data for each SNR. The result from this is a look up
table that can be related to the posteriori SNR. Enzner did not device a way to address
the noise estimation problem.
In this paper, we propose to overcome the aforementioned problems by using the
modified sigmoid function to approximate both the speech presence probability (SPP)
and the speech enhancement gain function and illustrated through the Wiener filter.
The benefit is twofold. First, by applying an approximation via the sigmoid functions
it is shown that only the posteriori estimate of SNR is needed, resulting in a lower
complexity system which does not require the a-priori SNR estimation. Secondly, since
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2893

only the posteriori information is needed, the proposed method can directly measure
the variations in non-stationary noise scenarios, thereby reducing its sensitivity to
large variations typically observed in non-stationary noise. The sigmoid function is
designed with four parameters to be optimized. This paper further employs an efficient
optimization algorithm, which optimizes the parameters of the sigmoid function with
respect to speech quality measures. By incorporating speech quality measures in the
optimization, the set of optimized parameters yield the best possible perceptually
enhanced and intelligible speech. The optimization algorithm is based on the idea that
the solution obtained for a given problem should move towards the best solution and
avoid the worst solution. The proposed algorithm requires minimal control parameters
and does not require any algorithm specific parameters.
Simulation results shows the comparison in performance of several speech quality
measures, namely the perceptual evaluation of speech quality (PESQ) measure [17],
the short-time objective intelligibility (STOI) measure [21] and the log-likelihood
ratio (LLR) [15] for (i) the decision directed, (ii) modified decision directed, (iii)
and the system with the sigmoid functions for both the gain function and the speech
present probability. The proposed method is tested on some of the common types of
noise, namely the babble, factory, pink and white noise. The set of sigmoid function
coefficients are optimized with 0 dB SNR and the system is tested for various SNR.
The results demonstrate that the proposed sigmoid functions achieve better results in
terms of PESQ, STOI and LLR when compared with existing methods, namely the
decision directed and modified decision directed with low complexity. In addition, a
trade-off between PESQ, STOI and LLR performance can be achieved between the
two proposed optimized sigmoid gain functions.
The paper is organized as follows: The system mode and the gain function are
discussed in Sect. 2. The a-priori SNR estimation and the speech present probability
are investigated in Sect. 3. The proposed system with the sigmoid function model for
both the SPP and the gain function is given in Sect. 4. The optimization procedure is
given in Sect. 5. Simulation results are given in Sect. 6, and finally, the conclusions are
in Sect. 7.

2 System Model and the Gain Function

The goal of speech enhancement scheme is to estimate the enhanced speech signal
x̂(n), given a noisy signal y(n) = x(n) + v(n), where x(n) and v(n) denote the clean
speech signal and the noise, respectively. By applying the short-time Fourier transform
(STFT) to the time data, the STFT of the noisy signal is given as

Y (k, m) = X (k, m) + V (k, m) (1)

where X (k, m) and V (k, m) denote the STFT of the clean speech signal x (n) and
the uncorrelated additive noise v (n), respectively [6, 7]. Here, k is the frequency bin
index and m is the frame index. The estimated clean speech spectrum X̂ (k, m) is then
obtained as
2894 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

X̂ (k, m) = G(k, m)Y (k, m) (2)

where G(k, m) is a spectral gain function. Our objective is to obtain an efficient and
low complexity method to estimate the gain function G(k, m). In the following, we
discuss the methods to estimate G(k, m).
The gain function G(k, m) is often derived from MMSE or Log-MMSE optimiza-
tion criteria [6], [7], which requires the estimation of the a-priori SNR. One popular
MMSE method results in the Wiener filter [19], where the again function can be
computed as

ξ(k, m)
G WF (k, m) = (3)
1 + ξ(k, m)

and ξ(k, m) is the a-priori SNR, obtained as

λx (k, m)
ξ(k, m) = . (4)
λv (k, m)

Here, λx (k, m) and λv (k, m) represent the clean speech power spectral density and the
noise power spectral density, respectively, which are unknown in practice and hence
required to be estimated.
The gain function derived using the Log-MMSE criteria also requires the estima-
tion of the a-priori SNR [24]. In [2], [24], the sigmoid function was investigated as
the function of the a-priori SNR to model the gain function G(k, m). However, the
estimation of the a-priori SNR is often computationally complex [2]. In the following,
we will discuss the estimation of the a-priori SNR and the speech presence probability
that is used to estimate the noise power spectral density in (4).

3 A-priori SNR Estimation and the Speech Presence Probability

In [5], the a-priori SNR is estimated using the decision direction (DD) method,

| X̂ (k, m − 1)|2
ξ̂ D D (k, m) = max β + (1 − β)P[γ (k, m) − 1], 0 (5)
λˆv (k, m)

where X̂ (k, m − 1) and λˆv (k, m) denote, the estimated clean speech spectrum and the
estimated noise PSD, respectively. In addition, the parameter β denotes the smoothing
factor, P[·] denotes the half-wave rectification and o is the SNR floor. Here, γ (k, m)
is the a-posteriori SNR obtained as

|Y (k, m)|2
γ (k, m) = .
λv (k, m)

The modified decision direction method (MDD) was developed in [24] for the
estimation of the a-priori SNR to improve further the speech quality of the DD method.
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2895

The main difference between the MDD and DD methods is the estimation of the
a-priori SNR which requires the use of the gain function G(k, m − 1) in the previous
iteration to estimate X̂ (k, m − 1),

|G(k, m − 1)Y (k, m)|2
ξ̂ M D D (k, m) = max β + (1 − β)P[γ (k, m) − 1], 0 .(6)
λ̂v (k, m)

In addition, the estimations in (5) and (6) require the estimation of the a-posteriori
SNR and the noise power spectral density λv (k, m). One common method of estimating
λv (k, m) is applying a temporal recursive smoothing to the noisy observation using
the speech presence probability (SPP) p(k, m) [3], [11],

λv (k, m) = p(k, m)λv (k, m − 1) + (1 − p(k, m)) ||Y (k, m)||2 . (7)

Assuming both X (k, m) and V (k, m) have Gaussian distributions, then the SPP is
given by [3],
−1
ξ (k, m)
p(k, m) = 1 + (1 + ξ(k, m)) Q exp −γ (k, m) (8)
1 + ξ (k, m)

P (H0 )
where Q = is the ratio between P (H0 ) the a-priori probability for speech
P (H1 )
absence and P (H1 ) the probability for speech presence.
It can be seen that the speech presence probability p(k, m) yields a value that is
close to one when γ (k, m) is sufficiently large and is small otherwise. In between zero
and one, a soft transition for SPP is desired. As such, a sigmoid function is employed
for the SPP in Eq. (8) [23] as a function of the estimated a-posteriori SNR γ (k, m)
and fixed coefficients

1
psig (k, m) = (9)
1 + e−csig (γ (k.m)−dsig )

where csig and dsig indicate, respectively, the slope and the mean of the sigmoid
function, given by

ξH1 1 + ξH1
csig = , dsig = log Q 1 + ξH1 . (10)
1 + ξH1 ξH1

The value ξH1 denote the a-priori SNR when speech is present.
The estimations of the a-priori SNR ξ (k, m) and the speech presence probability
in (5), (6), (8) are computational expensive and can be sensitive to large variations in
the noise estimate. From (8), it is evident that if only the a-posteriori SNR γ (k, m)
is used, it is easier to control the variations in the noise power. Thus, we propose to
model G (k, m) estimation based on the a-posteriori SNR γ (k, m) for each frequency
bin k and time instance m. This will result in a lower complexity estimator as only the
noise and noisy speech are required to be estimated. In addition, a general sigmoid
2896 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

function is proposed for the SPP in (9) and the sigmoid function coefficient will be
optimized to improve the performance.

4 The Proposed Gain Function and Speech Presence Probability

In this section, we propose to approximate the gain function and the speech presence
probability as general sigmoid functions of the a-posteriori SNR γ (k, m). The gain
function for each frequency bin k and instant time m can be obtained as

2
G SIG (k, m) = max − 1, 0 . (11)
1 + e−a(γ (k,m)−b)

where a and b are some constants. In addition, the speech presence probability can
also be modelled as a general sigmoid function

1
pSIG (k, m) = . (12)
1 + e−c(γ (k,m)−d)

where c and d are constant parameters, which can be optimized. This will result in
lower complexity for the estimation as the a-posterior SNR γ (k, m) is much easier to
estimate.
It has been reported in [23] that if the SPP estimate p(k, m) is used directly in
Eq. (8), then the noise estimate becomes more noisy due to large variations in p(k, m)
which modulates the noise estimate. One way to reduce this variability is to smooth
γ (k, m) or p(k, m). However, the smoothing results in extra delay, which reduces its
noise tracking capability. Here, we quantize pappr (k, m) into four different regions,
i.e.,
⎧
⎪
⎪noise only presence, P1 , pappr ≤ p1
⎪
⎨likely speech presence, P2 , p1 < pappr ≤ p2
p = (13)
⎪
⎪more likely speech presence, P3 , p2 < pappr ≤ p3
⎪
⎩
most likely speech presence, P4 , pappr ≥ p3

where 0 < p1 < p2 < p3 ≤ 1 are different values of the sigmoid function, they
correspond to an instantaneous estimate of the SPP. These quantized values are mapped
to different averaging smoothing constant. For the region where speech is less likely
to present, i.e. when γ ≈ 1 (this means 0 dB), the averaging constant for the noise
estimation should be fast. The result is an even smoothed estimate compared to the
original noise PSD estimate when γ is small, which reduces the likelihood of noise
being overestimated and underestimated locally. For the regions where speech is either
more likely or most likely to present, the soft transitions of pappr might not be sufficient
for the noise PSD estimate to change from using the previous noise PSD estimates
to tracking the current noisy observations and vice versa. Accordingly, to avoid those
pitfalls, quantized decisions are imposed on pappr to realize an improved posterior
SPP estimate.
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2897

1
SPP (Cohen)
0.95 Sigmoid Approximation with a=1, b=1

0.9

Speech Presence Probability 0.85

0.8

0.75

0.7

0.65

0.6

0.55

0.5
-30 -20 -10 0 10 20 30
A-Posteriori SNR [dB]

Fig. 1 Approximation of the Wiener filter using Eq. (11) with a = 1 and b = 1

1
Wiener Filter
0.9 Sigmoid Approximation with a=1, b=1

0.8

0.7
Gain function

0.6

0.5

0.4

0.3

0.2

0.1

0
-30 -20 -10 0 10 20 30
A-Posteriori SNR [dB]

Fig. 2 Approximation of the speech presence probability q = 0.5 in Eq. (8) using Eq. (12) with a = 1 and
b=1

Now we have replaced the a-priori SNR with the posterior SNR through our approx-
imation. How well the approximation works for the Wiener filter (3) and the SPP in
Eq. (8) is shown in an example, by choosing the coefficients x = [a b c d] = [1 1 1 1]
and Q = 0.5, see Figs. 1 and 2. It can be seen that the approximation is very close
for the SPP and relatively close for the Wiener filter approximation with the sigmoid
approximation being slightly more aggressive for a = 1.
However, the main benefit is that we can optimize these coefficients based on data
which generalizes them to a more flexible data based functions. Hence, we investigate
on how to optimize the coefficient vector for unknown, x = [a b c d]. It is proposed
2898 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

that the optimization is made with respect to the maximum achievable speech quality
measures as that will naturally provide the best objective evaluated enhanced speech.
In general, the speech quality assessment can be classified in terms of subjective and
objective measures. Subjective evaluation involves subjective listening test by some
listeners while objective evaluation measures the numerical distance between the ref-
erence signal and the processed signal. One established method of evaluating the
enhanced signal is using perceptual evaluation of speech quality (PESQ). PESQ is an
automatic computation algorithm to replace human subjects in the evaluation of the
mean opinion score (MOS). The PESQ model considers how human perceive speech
and it has been widely used in the evaluation of speech quality. Another popular mea-
sure is the short-time objective intelligibility (STOI) measure, which highly correlates
with the intelligibility of speech. By optimizing with respect to both PESQ and STOI,
the parameters are optimized to give the speech an overall quality improvement and
speech intelligibility. Thus, a multi-objective optimization problem can be formulated
with PESQ and STOI as the objective measure,

max f (x) = PESQ(x) + αSTOI(x)
(14)
subject to xl ≤ x ≤ xu

where xl and xu are the lower and the upper bounds for the coefficient vector x,
respectively, and α is the weighting constant. Different value of α results in different
optimal solution for the Pareto optimality, allowing the trade-off between the two
objective measures.

5 Optimization Procedure

In this paper, the Jaya method [16] with a modified stopping criteria is employed to
obtain the optimal solution to the optimization problem (14). At any iteration k, we
have N number of candidate solutions. Let the best candidate obtain the best value of
f (x) and the worst candidate obtain the worse value of f (x),

xbest,k = arg max f (xk,i )

i (15)
xwor se,k = arg min f (xk,i ).
i

The coefficient vectors of the k + 1 iteration are given as

xk+1,i = xk,i + r1,k,i xbest,k − |xk,i | + r2,k,i xwor st,k − |xk,i | (16)

where r1,k,i and r2,k,i are random numbers in the range [0, 1]. The first term in Eq. (16)
indicates the tendency for the solution to move closer to the best solution while the
second term indicates the tendency to avoid the worst solution. xk+1,i is accepted if
it gets a better solution. The algorithm stops if the difference in the optimal objective
function between the two consecutive iterations is small. The steps for the optimization
algorithm are summarized in Procedure 1.
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2899

Procedure 1: Optimization algorithm

• Step 1: Initialize the coefficient vector x0,i , 1 ≤ i ≤ N for the 0th iteration. Set
k = 0.
• Step 2: Calculate the objective function f (xk,i ). Obtain the best and the worse
solutions xbest,k and xwor st,k as in (15).
• Step 3: Obtain the new set of coefficient vectors for the k + 1 iteration as in (16).
For all the value 1 ≤ i ≤ N , if f (xk+1,i ) < f (xk,i ), then set xk+1,i = xk,i .
Otherwise, xk+1,i ) remains the same as before.
• Step 4: The algorithm converges if there is no improvement in the maximum
objective function or the maximum number of iterations is reached. Otherwise, set
k := k + 1 and return to Step 2.

6 Experimental Results

For the objective evaluation, the noisy speech corpus NOIZEUS with 30 IEEE speech
sequences were employed [9, 10]. The database was chosen as it was developed to
facilitate for algorithm comparison purpose. More information about the NOIZEUS
can be found in [9]. The noisy speech was corrupted with babble, factory, pink and
white noise for a wide range of SNRs. All the results are generated with K = 256
frequency bins with a sampling frequency of f s = 8000. A square-root Hanning
window was used with 50% overlap. Simulations are evaluated with

⎧
⎪
⎪ P1 , pappr ≤ 0.55
⎪
⎨P , 0.55 < p
appr ≤ 0.7
p =
2
⎪
⎪ P , 0.7 < p appr < 0.8
⎪
⎩
3
P4 pappr > 0.8

where Pi = exp (−2.2R) / ti f ∫ indicates the exponential smoothing constant, with
i = [1, 2, 3, 4]. Here, R indicates the STFT frame rate, ti denotes the averaging time
constant, with t1 < t2 < t3 t4 . This means that the averaging time is mapped to the
speech presence probability but the averaging times and thresholds can be modified.
To evaluate the performance of the proposed sigmoid gain function and proposed
SPP, the problem (14) is optimized for the different type of noise, namely the babble,
factory, pink and white noise, with signal-to-noise ratio of 0 dB. For each type of noise,
the optimal set of coefficient is then tested for different levels of SNR. The SNR level is
increased from −5 dB to 10 dB. The results are compared with those obtained from the
decision directed method and the modified decision directed method. As mentioned
earlier, the proposed method has significantly lower complexity than both the decision
directed and modified decision directed methods as it does not require the estimation
of the a-priori SNR.
2900 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

Table 1 PESQ, STOI and LLR performance for different SNR with babble noise and K = 256

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.4871 0.5321 1.3951

Modified decision direct 1.4987 0.5427 1.3803
Optimized with α = 0 1.7029 0.5896 1.1551
0 dB Decision direct 1.8497 0.6533 1.2074
Modified decision direct 1.8589 0.6711 1.1919
Optimized with α = 0 2.0042 0.7051 0.9884
5 dB Decision direct 2.2566 0.7675 0.9981
Modified decision direct 2.2496 0.7881 0.8824
Optimized with α = 0 2.3815 0.8033 0.8106
10 dB Decision direct 2.6527 0.8567 0.7964
Modified decision direct 2.6461 0.8760 0.7451
Optimized with α = 0 2.7111 0.8705 0.6401

6.1 Performance Comparison Between the Proposed Method, The Decision Direct
Method and the Modified Decision Directed Method

Table 1 shows the PESQ, STOI and LLR results for different speech enhancement
methods: (i) the decision directed; (ii) the modified decision directed [22] and (iii)
the result with the optimized gain function G SIG and the weighting constant α = 0.
The coefficients for the gain function G SIG are optimized with SNR= 0 dB and the
results are tests for different SNR levels and babble noise. It can be seen from the table
that the modified decision direct improves the PESQ, STOI and LLR results over the
decision directed method. In addition, the optimized sigmoid gain function together
with the sigmoid SPP improves the PESQ, STOI and LLR values further over the
modified decision directed method. For example, at −5 dB SNR level, the optimized
method with gain function G SIG improves 0.2158 dB for PESQ over the decision
directed method and 0.2041 dB over the modified decision directed method. For STOI
measure, the optimized method improves 0.0575 dB and 0.0469 dB, respectively,
over the decision directed and the modified decision directed methods. For the LLR
measure, the optimized method is 0.24 dB and 0.168 dB lower than the decision
directed and the modified decision directed methods, which means that the optimized
method performs better than the other two methods. For other SNRs, the optimized
method with sigmoid gain functions G SIG also has significant improvement for PESQ,
STOI and LLR over the decision directed and modified decision directed methods.
Tables 2, 3 and 4 show the results for the factory noise, pink noise and white noise for
different SNR and different gain function methods. It can be seen that the optimized
gain functions G SIG have significant improvement for PESQ, STOI and LLR over
the results obtained using the decision directed and the modified decision directed
methods. For example, with SNR=−5 dB and white noise, the optimized method with
the gain function G SIG improves 0.2528 dB and 0.1643 dB for PESQ, respectively,
over the decision direct method and the modified decision directed method. For the
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2901

Table 2 PESQ, STOI and LLR performance for different SNR with factory noise and K = 256

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.6775 0.5527 1.1615

Modified decision direct 1.6861 0.5574 1.1726
Optimized with α = 0 1.7343 0.5814 1.1501
0 dB Decision direct 2.0850 0.6597 1.0122
Modified decision direct 2.0967 0.6764 1.0108
Optimized with α = 0 2.1412 0.7061 0.9627
5 dB Decision direct 2.4651 0.7644 0.8594
Modified decision direct 2.4886 0.7888 0.8485
Optimized with α = 0 2.5202 0.8085 0.7902
10 dB Decision direct 2.8375 0.8547 0.7180
Modified decision direct 2.8672 0.8761 0.6902
Optimized with α = 0 2.8684 0.8872 0.6098

Table 3 PESQ, STOI and LLR performance for different SNR with pink noise and K = 256

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.7107 0.6013 1.1380

Modified decision direct 1.7685 0.6087 1.1351
Optimized with α = 0 1.8725 0.6335 1.1447
0 dB Decision direct 2.1305 0.6940 0.9879
Modified decision direct 2.1777 0.7125 0.9775
Optimized with α = 0 2.2801 0.7512 0.9528
5 dB Decision direct 2.5036 0.8156 0.8251
Modified decision direct 2.5585 0.8156 0.8251
Optimized with α = 0 2.6354 0.8451 0.7749
10 dB Decision direct 2.8600 0.8662 0.7415
Modified decision direct 2.9167 0.8894 0.7058
Optimized with α = 0 2.9394 0.9033 0.6544

STOI measure, the optimized method improves 0.0272 dB and 0.0204 dB, respectively,
over the decision directed and the modified decision directed methods. For the LLR
measure, the optimized method improves 0.09 dB over the decision directed and the
modified decision directed methods. For all the cases, the optimization algorithm
converges quickly which requires only a few iterations for convergence.
Figures 3, 4 and 5 show PESQ, STOI and LLR values for different speech enhance-
ment methods with the babble noise and different SNRs. It can be seen that proposed
method with the gain function G SIG improve the results over the decision directed and
the modified decision directed methods.
2902 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

Table 4 PESQ, STOI and LLR performance for different SNR with white noise and K = 256

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.4704 0.6044 1.4908

Modified decision direct 1.5589 0.6112 1.4858
Optimized with α = 0 1.7232 0.6316 1.3973
0 dB Decision direct 1.9910 0.6969 1.2862
Modified decision direct 2.0552 0.7128 1.2789
Optimized with α = 0 2.1253 0.7331 1.2166
5 dB Decision direct 2.3724 0.7833 1.1388
Modified decision direct 2.4240 0.8016 1.1295
Optimized with α = 0 2.4772 0.8166 1.0483
10 dB Decision direct 2.6890 0.8524 1.0142
Modified decision direct 2.7428 0.8704 0.9869
Optimized with α = 0 2.7832 0.8821 0.8670

2.8
Decision directed method
Modified decision directed
2.6 Optimized with gain function G
SIG

2.4

2.2
PESQ

1.8

1.6

1.4
-5 0 5 10
SNR

Fig. 3 PESQ for different speech enhancement methods with babble noise and different SNR for α = 0

6.2 Trade-Off Investigation Between Perceptual Measures PESQ, STOI and LLR for
Different Weighting Constants ˛ and Different SNR

We now investigate the Pareto trade-off for different weighting factor α on the percep-
tual measures PESQ, STOI and LLR. Table 5 shows the trade-off between PESQ, STOI
and LLR values for different weighting constraint α and the babble noise. The SNR
level increases from −5 dB to 10 dB and the weighting constant α increases from 0 to
15. It can be seen from the table that there is a trade-off between the PESQ and STOI
values. The PESQ values decrease when α increases while the STOI values increase.
This is to be expected as the weighting provides an engineering choice between quality
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2903

0.9
Decision directed method
Modified decision directed
0.85 Optimized with gain function GSIG

0.8

0.75
STOI

0.7

0.65

0.6

0.55

0.5
-5 0 5 10
SNR

Fig. 4 STOI for different speech enhancement methods with babble noise and different SNR for α = 0

1.4
Decision directed method
1.3 Modified decision directed
Optimized with gain function G
SIG,1

1.2

1.1
LLR

0.9

0.8

0.7

0.6
-5 0 5 10
SNR

Fig. 5 LLR for different speech enhancement methods with babble noise and different SNR for α = 0

and intelligibility through the PESQ and STOI measures, respectively. The LLR values
are approximately the same for all the cases with the babble noise. When compared
to the decision directed and modified decision directed performance in Table 1, the
optimized sigmoid gain function has better PESQ, STOI and LLR performance than
the decision directed method and the modified decision directed method.
Tables 6, 7 and 8 show the PESQ, STOI and LLR results for different SNRs and
different weighting constant α with factory noise, pink noise and white noise, respec-
tively. Similar to the case with the babble noise, when α increases, the PESQ value
decreases while the STOI value increases. It can be seen that the weighting α provides
a trade-off between PESQ and STOI in the objective measures [see Eq. (14)]. As α
2904 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

Table 5 Trade-off between PESQ, STOI and LLR for different weighting function α with different SNR
and babble noise
SNR Weighting constant α PESQ STOI LLR

−5 dB Weighting α = 0 1.7029 0.5896 1.1551

Weighting α = 5 1.6758 0.5921 1.1681
Weighting α = 10 1.6438 0.5943 1.1693
Weighting α = 15 1.6282 0.5957 1.1646
0 dB Weighting α = 0 2.0042 0.7051 0.9884
Weighting α = 5 2.0390 0.7101 0.9940
Weighting α = 10 2.0073 0.7138 0.9886
Weighting α = 15 1.9901 0.7154 0.9828
5 dB Weighting α = 0 2.3815 0.8033 0.8106
Weighting α = 5 2.3865 0.8108 0.8025
Weighting α = 10 2.3631 0.8167 0.7894
Weighting α = 15 2.3471 0.8187 0.7822
10 dB Weighting α = 0 2.7111 0.8705 0.6401
Weighting α = 5 2.7424 0.8807 0.6193
Weighting α = 10 2.7278 0.8882 0.5999
Weighting α = 15 2.7127 0.8906 0.5909
The sigmoid coefficients are optimized with SNR=0 and then the performance is tested for different SNR

Table 6 Trade-off between

SNR Weighting constant α PESQ STOI LLR
PESQ, STOI and LLR for
different weighting function α −5 dB Weighting α = 0 1.7343 0.5814 1.1501
with different SNR and factory
noise Weighting α = 5 1.7273 0.5915 1.1496
Weighting α = 10 1.7203 0.5946 1.1383
Weighting α = 15 1.6872 0.5959 1.1515
0 dB Weighting α = 0 2.1412 0.7061 0.9627
Weighting α = 5 2.1173 0.7162 0.9534
Weighting α = 10 2.1039 0.7185 0.9393
Weighting α = 15 2.0739 0.7213 0.9492
5 dB Weighting α = 0 2.5202 0.8085 0.7902
Weighting α = 5 2.5111 0.8176 0.7660
Weighting α = 10 2.4981 0.8199 0.7489
Weighting α = 15 2.4606 0.8237 0.7497
10 dB Weighting α = 0 2.8684 0.8872 0.6098
Weighting α = 5 2.8604 0.8790 0.6589
Weighting α = 10 2.8605 0.8892 0.5872
Weighting α = 15 2.8251 0.8937 0.5819
The sigmoid coefficients are optimized with SNR=0 and then the per-
formance is tested for different SNR
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2905

Table 7 Trade-off between

SNR Weighting constant α PESQ STOI LLR
PESQ, STOI and LLR for
different weighting function α −5 dB Weighting α = 0 1.8725 0.6335 1.1447
with different SNR and pink
noise Weighting α = 5 1.8292 0.6385 1.1520
Weighting α = 10 1.8085 0.6436 1.1512
Weighting α = 15 1.7845 0.6445 1.1525
0 dB Weighting α = 0 2.2801 0.7512 0.9528
Weighting α = 5 2.2614 0.7600 0.9418
Weighting α = 10 2.2449 0.7622 0.9395
Weighting α = 15 2.2170 0.7648 0.9367
10 dB Weighting α = 0 2.6354 0.8451 0.7749
Weighting α = 5 2.6430 0.8521 0.7598
Weighting α = 10 2.6304 0.8531 0.7524
Weighting α = 15 2.6066 0.8563 0.7437
15 dB Weighting α = 0 2.9394 0.9033 0.6544
Weighting α = 5 2.9632 0.9101 0.6226
Weighting α = 10 2.9677 0.9107 0.6075
Weighting α = 15 2.9553 0.9136 0.5924
The sigmoid coefficients are optimized with SNR=0 and then the per-
formance is tested for different SNR

Table 8 Trade-off between

SNR Weighting constant α PESQ STOI LLR
PESQ, STOI and LLR for
different weighting function α −5 dB Weighting α = 0 1.7232 0.6316 1.3973
with different SNR and white
noise Weighting α = 5 1.6893 0.6389 1.4100
Weighting α = 10 1.6396 0.6459 1.4321
Weighting α = 15 1.6178 0.6488 1.4370
0 dB Weighting α = 0 2.1253 0.7331 1.2166
Weighting α = 5 2.1086 0.7382 1.2154
Weighting α = 10 2.0544 0.7448 1.2221
Weighting α = 15 2.0127 0.7477 1.2295
5 dB Weighting α = 0 2.4772 0.8166 1.0483
Weighting α = 5 2.4754 0.8200 1.0387
Weighting α = 10 2.4370 0.8255 1.0181
Weighting α = 15 2.3927 0.8282 1.0180
10 dB Weighting α = 0 2.7832 0.8821 0.8670
Weighting α = 5 2.7902 0.8837 0.8587
Weighting α = 10 2.7780 0.8884 0.8239
Weighting α = 15 2.7382 0.8910 0.8122
The sigmoid coefficients are optimized with SNR=0 and then the per-
formance is tested for different SNR
2906 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

1
Wiener Filter
0.9 SP with babble noise
SP with factory noise
0.8 SP with pink noise
SP with white noise

Gain function 0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-30 -20 -10 0 10 20 30
A-Posteriori SNR [dB]

Fig. 6 Winner filter and optimized Sigmoid function for the gain function with different type of noise. The
sigmoid functions are optimized from data at 0 dB

increases, more weighting is emphasized towards STOI as opposed to PESQ, which

results in a higher value of STOI. The role of α serves as a trade-off between the two
performance measures, which provides flexibility to the user to trade-off between the
two measures. The increased of LLR in tandem with alpha shows that LLR is more
correlated to STOI, which is related to the measure of speech intelligibility.
In addition, the LLR values improves slightly with a higher value of α. Similar to
the babble noise case, the optimized sigmoid gain function achieves good trade-off
performance when compared with the decision direct and modified decision directed
methods. In addition, the proposed gain function has a lower complexity when com-
pared with existing methods as it does not require the estimation of the a-priori SNR.

6.3 Approximation of the Gain Function and the Speech Present Probability using
the Sigmoid Function for different Type of Noise

Figures 6 and 7 show the optimal sigmoid functions for the gain function and the speech
present probability for different a-posteriori SNR. The optimal sigmoid functions for
the gain function and the speech present probably are optimized together from data at
0 dB SNR for different type of noise, namely the babble noise, factory noise, pink and
white noises. It can be seen from the figures that the optimized sigmoid function for
the gain function and the speech present probability approximations follow the shape
of the Wiener filter in Eq. (11) and the speech present probability in (8). In addition,
the sigmoid functions for the factory and babble noises are slightly more aggressive
than the sigmoid functions for the white and pink noises. The sigmoid functions are
then tested for different SNR levels from −5 dB to 10 dB. It can be seen in Sects. 6.1
and 6.2 that the sigmoid models achieve good results for all the cases with a lower
computational complexity as it does not require the estimation of the a-priori SNR.
Circuits, Systems, and Signal Processing (2024) 43:2891–2908 2907

1
SPP (Cohen)
0.95 SP with babble noise
SP with factory noise
0.9 SP with pink noise

Speech Presence Probability

SP with white noise
0.85

0.8

0.75

0.7

0.65

0.6

0.55

0.5
-30 -20 -10 0 10 20 30
A-Posteriori SNR [dB]

Fig. 7 Optimized sigmoid function for speech presence probability with different type of noise. The sigmoid
functions are optimized from data at 0 dB

7 Conclusions

This paper proposes the use of sigmoid function for both the speech presence proba-
bility (SPP) and the overall gain function of a speech enhancement system as a means
to achieve low complexity and efficient implementation. The former serves to bet-
ter the SNR estimation and the latter provides an overall perceptually smooth gain
function. The advantage of the proposed system is that it avoids the estimation the
a-priori SNR resulting in an improved noise estimate. An efficient optimization algo-
rithm is employed to solve the optimization problem, which optimizes the parameters
of the sigmoid functions with respect to the speech quality measures. The optimiza-
tion algorithm is based on the idea that the solution obtained for a given problem
should move towards the best solution and avoid the worst solution. The presented
algorithm requires minimal control parameters and does not require any algorithm spe-
cific parameters. Simulation results show that the proposed sigmoid functions achieve
improved performance when compared with existing methods with low complexity.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
2908 Circuits, Systems, and Signal Processing (2024) 43:2891–2908

References
1. S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech
Signal Process. 27, 113–120 (1979)
2. K.Y. Chan, S. Nordholm, S.Y. Low, P.C. Yong, K.F.C. Yiu, A hybrid descent method for optimal
sigmoid filter design. IEEE Signal Process. Lett. 21(4), 478–482 (2014)
3. I. Cohen, Noise spectrum estimation in adverse environments: Improved minima controlled recursive
averaging. IEEE Trans. Speech Audio Process. 11(5), 466–475 (2003)
4. G. Enzner, P. Thune, Bayesian MMSE filtering of noisy speech by SNR marginalization with global
PSD priors. IEEE/ACM Trans. Audio Speech Lang. Process. 26(12), 2289–2304 (2018)
5. Y. Ephraim, D. Malah, Speech enhancement using a minimummean square error short-time spectral
amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
6. Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral
amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
7. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral ampli-
tude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
8. T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity
and low tracking Delay. IEEE Trans. Audio Speech Language Process. 20(4), 1383–1393 (2012)
9. Y. Hu, P. Loizou, Subjective evaluation and comparison of speech enhancement algorithms. Speech
Commun. 49, 588–601 (2007)
10. P. Loizou, Speech Enhancement Theory and Practice (CRC Press, Boca Raton, FL, 2007)
11. S. Y. Low, An insight into the rise time of exponential smoothing for speech enhancement methods,
in IEEE International Conference Signal Image Process Applications, pp. 30–33 (2021)
12. R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics.
IEEE Trans. Speech Audio Process. 9, 504–512 (2001)
13. L. Nahma, P.C. Yong, H.H. Dam, S. Nordholm, An adaptive a-priori SNR estimator for perceptual
speech enhancement. EURASIP J. Audio Speech Music Process. 1, 1 (2019)
14. K. Paliwal, B. Schwerin, K. Wo, Speech enhancement using a minimum mean-square error short-time
spectral modulation magnitude estimator. Speech Commun. 54(2), 282–305 (2012)
15. S. Quackenbush, T. Barnwell, M. Clements, Objective Measures of Speech Quality (Prientice Hall,
Englewood Cliffs, 1988)
16. R.V. Rao, Jaya: a simple and new optimizaton algorithm for solving constrained and unconstrained
optimization problems. Int. J. Eng. Comput. 7, 19–34 (2016)
17. A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a
new method for speech quality assessment of telephone networks and codec. IEEE Int. Conf. Acoust.
Speech Signal Process. 2, 749–752 (2001)
18. T. Rohdenburg, V. Hohmann, B. Kollmeier, Objective perceptual quality measures for the evaluation
of noise reduction schemes, in 9th International Workshop on Acoustic Echo and Noise Control, pp.
169–172 (2005)
19. P. Scalart, Speech enhancement based on a-priori signal to noise estimation, in IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP’96), 629–632 (1996)
20. M.K. Singh, S.Y. Low, S. Nordholm, Z. Zang, Bayesian noise estimation in the modulation domain.
Speech Commun. 96, 81–92 (2018)
21. C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time
frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 125–2136 (2011)
22. P.C. Yong, S. Nordholm, H.H. Dam, Optimization and evaluation of sigmoid function with a priori
SNR estimate. Speech Commun. 55(2), 358–376 (2012)
23. P. C. Yong, S. Nordholm, H. H. Dam, Noise estimation based on soft decisions and conditional smooth-
ing for speech enhancement, in International Workshop on Acoustic Signal Enhancement (2012)
24. P.C. Yong, S. Nordholm, H.H. Dam, Optimization and evaluation of sigmoid function with a priori
SNR estimate for real-time speech enhancement. Speech Commun. 55(2), 358–376 (2013)

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

Fundamental of Speech Enhencements
No ratings yet
Fundamental of Speech Enhencements
112 pages
Taal 2013
No ratings yet
Taal 2013
4 pages
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
No ratings yet
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
12 pages
1 en 26 Chapter Author
No ratings yet
1 en 26 Chapter Author
13 pages
Speech Enhancement Using Signal Subspace Algorithm
No ratings yet
Speech Enhancement Using Signal Subspace Algorithm
4 pages
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
No ratings yet
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
6 pages
Speech Enhancement Noise Reduction Noise Reduction: Pham Van Tuan
No ratings yet
Speech Enhancement Noise Reduction Noise Reduction: Pham Van Tuan
26 pages
Optimal Filtering and Speech Recognition With Microphone Arrays
100% (4)
Optimal Filtering and Speech Recognition With Microphone Arrays
101 pages
Speech Enhancement For Non-Stationary Noise Environments
No ratings yet
Speech Enhancement For Non-Stationary Noise Environments
16 pages
Atmaja 2016 J. Phys. Conf. Ser. 776 012072
No ratings yet
Atmaja 2016 J. Phys. Conf. Ser. 776 012072
7 pages
A Consolidate View of Loss Functions For Supervised Deep Learning-Based Speech Enhancement
No ratings yet
A Consolidate View of Loss Functions For Supervised Deep Learning-Based Speech Enhancement
5 pages
Multi-Level Speech Enhancement Method
No ratings yet
Multi-Level Speech Enhancement Method
13 pages
Speech Enhancement
No ratings yet
Speech Enhancement
9 pages
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
No ratings yet
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
17 pages
Simultaneous Detection and Estimation Approach For Speech Enhancement
No ratings yet
Simultaneous Detection and Estimation Approach For Speech Enhancement
10 pages
Adaptive Wiener Filtering Approach For Speech Enhancement
No ratings yet
Adaptive Wiener Filtering Approach For Speech Enhancement
9 pages
IWAENC2012 Hirszhorn
No ratings yet
IWAENC2012 Hirszhorn
4 pages
Ubiquitous Computing and Communication Journal - 72
No ratings yet
Ubiquitous Computing and Communication Journal - 72
8 pages
Lightburn2017 (Icaasp)
No ratings yet
Lightburn2017 (Icaasp)
5 pages
Discrete Time Processing of Speech Signa
No ratings yet
Discrete Time Processing of Speech Signa
12 pages
Comparison of Noise Removal and Echo Cancellation For Audio Signals
No ratings yet
Comparison of Noise Removal and Echo Cancellation For Audio Signals
3 pages
Natural Sounding Speech Enhancement
No ratings yet
Natural Sounding Speech Enhancement
11 pages
A Perceptually-Motivated Approach For Low-Complexity Real-Time Enhancement of Fullband Speech
No ratings yet
A Perceptually-Motivated Approach For Low-Complexity Real-Time Enhancement of Fullband Speech
5 pages
A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement
No ratings yet
A Priori SNR Estimation Based On A RNN For Robust Speech Enhancement
5 pages
Finding Structure in Audio For Music Information Retrieval - Naga Bhaskar
No ratings yet
Finding Structure in Audio For Music Information Retrieval - Naga Bhaskar
13 pages
Application of Microphone Array For Speech Coding in Noisy Environment
No ratings yet
Application of Microphone Array For Speech Coding in Noisy Environment
5 pages
Petkov
No ratings yet
Petkov
5 pages
Speech Enhancement: Chunjian Li Aalborg University, Denmark
No ratings yet
Speech Enhancement: Chunjian Li Aalborg University, Denmark
44 pages
A Supervised Signal-To-Noise Ratio Estimation of Speech Signals
No ratings yet
A Supervised Signal-To-Noise Ratio Estimation of Speech Signals
5 pages
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
No ratings yet
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
5 pages
MMSE STSA Based Techniques For Single Channel Speech Enhancement
No ratings yet
MMSE STSA Based Techniques For Single Channel Speech Enhancement
5 pages
DSP Paper
No ratings yet
DSP Paper
16 pages
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
No ratings yet
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
4 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
Comparison of Speech Enhancement Algorithms: Sciencedirect
No ratings yet
Comparison of Speech Enhancement Algorithms: Sciencedirect
11 pages
Speech Enhancement - Swarm Optimization
No ratings yet
Speech Enhancement - Swarm Optimization
11 pages
Designing of An Adaptive Filter in Digital Hearing-Aids For Noise Cancellation
No ratings yet
Designing of An Adaptive Filter in Digital Hearing-Aids For Noise Cancellation
4 pages
AM Modulation With Noise
No ratings yet
AM Modulation With Noise
5 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
Speech Enhancement Using An Adaptive Wiener Filtering Approach M. A. Abd El-Fattah, M. I. Dessouky, S. M. Diab and F. E. Abd El-Samie
No ratings yet
Speech Enhancement Using An Adaptive Wiener Filtering Approach M. A. Abd El-Fattah, M. I. Dessouky, S. M. Diab and F. E. Abd El-Samie
18 pages
Effect of Noise Suppression Losses On Speech Distortion and Asr Performance
No ratings yet
Effect of Noise Suppression Losses On Speech Distortion and Asr Performance
5 pages
Linear Prediction of Speech: D. Markel A. H. Gray, JR
No ratings yet
Linear Prediction of Speech: D. Markel A. H. Gray, JR
299 pages
Harmonic Enhancement With Noise Reduction of Speech Signal by Comb Filtering
No ratings yet
Harmonic Enhancement With Noise Reduction of Speech Signal by Comb Filtering
4 pages
Speech Endpoint Detection Based On Sub-Band Energy and Harmonic Structure of Voice
No ratings yet
Speech Endpoint Detection Based On Sub-Band Energy and Harmonic Structure of Voice
9 pages
Algorithms For Speech Processing
No ratings yet
Algorithms For Speech Processing
18 pages
7
No ratings yet
7
8 pages
Implementation of Digital Hearing AID For Sensory Neural Impairment
No ratings yet
Implementation of Digital Hearing AID For Sensory Neural Impairment
3 pages
Spectral Analysis in Speech Processing Techniques: Prof. Vijaya Sugandhi
No ratings yet
Spectral Analysis in Speech Processing Techniques: Prof. Vijaya Sugandhi
3 pages
Adaptive Noise Cancellation in ASR
No ratings yet
Adaptive Noise Cancellation in ASR
4 pages
Comb - Investigation of The Effect of Speech Enhancement On The Watermarking Process
No ratings yet
Comb - Investigation of The Effect of Speech Enhancement On The Watermarking Process
12 pages
EE 583: Project 3 Nithin Srinivasan, 3575 7582 04: 1 Narrowband Interference Suppression
No ratings yet
EE 583: Project 3 Nithin Srinivasan, 3575 7582 04: 1 Narrowband Interference Suppression
6 pages
Speech Enhancement: Concept and Methodology
No ratings yet
Speech Enhancement: Concept and Methodology
21 pages
T - C S E I C: WO Hannel Peech Nhancement AND Mplementation Onsiderations
No ratings yet
T - C S E I C: WO Hannel Peech Nhancement AND Mplementation Onsiderations
180 pages
PSO SNR Estimation MRMRevised AQRevised NKRevised Dec7
No ratings yet
PSO SNR Estimation MRMRevised AQRevised NKRevised Dec7
7 pages
CNN Basic
No ratings yet
CNN Basic
11 pages
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
No ratings yet
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
14 pages
Single-Channel Speech Enhancement
No ratings yet
Single-Channel Speech Enhancement
74 pages
Quantitative Perceptual Separation of Two Kinds of Degradation in Speech Denoising Applications
No ratings yet
Quantitative Perceptual Separation of Two Kinds of Degradation in Speech Denoising Applications
4 pages
Nissan Almera N15 1995-2000 Rear Suspension
No ratings yet
Nissan Almera N15 1995-2000 Rear Suspension
12 pages
Bamboo Farms
No ratings yet
Bamboo Farms
28 pages
EMEP Guideline Updates Rev-6
No ratings yet
EMEP Guideline Updates Rev-6
1 page
DTC - PT - English 4 - q1 Matatag Tos Key
No ratings yet
DTC - PT - English 4 - q1 Matatag Tos Key
6 pages
S2 CH 12 Areas and Volumes
No ratings yet
S2 CH 12 Areas and Volumes
33 pages
AI Notes
No ratings yet
AI Notes
241 pages
PASH Pakistan Strategy (21588)
No ratings yet
PASH Pakistan Strategy (21588)
21 pages
Pierburg-EGR Valve Product Info
No ratings yet
Pierburg-EGR Valve Product Info
12 pages
Movi C
No ratings yet
Movi C
60 pages
UBE-D-202 - Inspection & Test of Pressure Vessel and Heat Exchanger
No ratings yet
UBE-D-202 - Inspection & Test of Pressure Vessel and Heat Exchanger
18 pages
Simtech Guide Specification Industrial Alphaplus (Ap) Polypropylene Piping System
No ratings yet
Simtech Guide Specification Industrial Alphaplus (Ap) Polypropylene Piping System
2 pages
Catalogo Int. Fotoelectrico Fisher Pierce
No ratings yet
Catalogo Int. Fotoelectrico Fisher Pierce
2 pages
Clopidogrel Drug Study
No ratings yet
Clopidogrel Drug Study
2 pages
Diuretics: Chris Hague, PHD
No ratings yet
Diuretics: Chris Hague, PHD
29 pages
Radar Classification for Engineers
No ratings yet
Radar Classification for Engineers
25 pages
Corporate Deck
No ratings yet
Corporate Deck
20 pages
Icm RTC Tips and Tricks
No ratings yet
Icm RTC Tips and Tricks
47 pages
2024 McGraw Hill Inspire Science NM Grade 5
No ratings yet
2024 McGraw Hill Inspire Science NM Grade 5
9 pages
DE 1 Anh 11
No ratings yet
DE 1 Anh 11
10 pages
Longman English 2nd Grade - Unit 4
No ratings yet
Longman English 2nd Grade - Unit 4
10 pages
Thermal and Structural Analysis of 4-Cylinder Inline Engine
No ratings yet
Thermal and Structural Analysis of 4-Cylinder Inline Engine
10 pages
Toyota Block Heater Appl Guide PDF
0% (1)
Toyota Block Heater Appl Guide PDF
9 pages
Asia Builders - Mail List - Share2019 AUG Updated
No ratings yet
Asia Builders - Mail List - Share2019 AUG Updated
33 pages
Bandha and Mudras
No ratings yet
Bandha and Mudras
41 pages
Stephens. W. Stoic Ethics. Epictetus and Happiness As Freedom
100% (5)
Stephens. W. Stoic Ethics. Epictetus and Happiness As Freedom
197 pages
New Sewerage 004
No ratings yet
New Sewerage 004
4 pages
11 Appendix
No ratings yet
11 Appendix
15 pages
Introduction To The Orthodontic Aligner System
No ratings yet
Introduction To The Orthodontic Aligner System
105 pages
Ipbp A3 HD
No ratings yet
Ipbp A3 HD
34 pages
CPVC Fitting Price List 2020
No ratings yet
CPVC Fitting Price List 2020
2 pages

Sigmoid Functions Gain Function in Speech Enhancement

Uploaded by

Sigmoid Functions Gain Function in Speech Enhancement

Uploaded by

Circuits, Systems, and Signal Processing (2024) 43:2891–2908

Optimized Sigmoid Functions for Speech Presence

Received: 23 November 2022 / Revised: 18 October 2023 / Accepted: 21 October 2023 /

B Hai Huyen Dam

1 School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University,

Keywords Single channel speech enhancement · A-priori SNR estimation · Decision

2 System Model and the Gain Function

Y (k, m) = X (k, m) + V (k, m) (1)

X̂ (k, m) = G(k, m)Y (k, m) (2)

and ξ(k, m) is the a-priori SNR, obtained as

3 A-priori SNR Estimation and the Speech Presence Probability

4 The Proposed Gain Function and Speech Presence Probability

Speech Presence Probability 0.85

xbest,k = arg max f (xk,i )

The coefficient vectors of the k + 1 iteration are given as

Procedure 1: Optimization algorithm

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.4871 0.5321 1.3951

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.6775 0.5527 1.1615

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.7107 0.6013 1.1380

SNR Methods PESQ STOI LLR

−5 dB Decision direct 1.4704 0.6044 1.4908

−5 dB Weighting α = 0 1.7029 0.5896 1.1551

Table 6 Trade-off between

Table 7 Trade-off between

Table 8 Trade-off between

Gain function 0.7

increases, more weighting is emphasized towards STOI as opposed to PESQ, which

Speech Presence Probability

You might also like