Liu et al., 2022 - Google Patents
MOS prediction network for non-intrusive speech quality assessment in online conferencingLiu et al., 2022
View PDF- Document ID
- 10236130779183135732
- Author
- Liu W
- Xie C
- Publication year
- Publication venue
- Proc. Interspeech 2022
External Links
Snippet
Speech quality is a major indicator of the quality of service that describes the performance of speech communication network. Intrusive speech quality assessment generally requires a clean reference speech for evaluation, which is not available in applications such as online …
- 238000001303 quality assessment method 0 title abstract description 8
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Supervisory, monitoring, management, i.e. operation, administration, maintenance or testing arrangements
- H04M3/2236—Quality of speech transmission monitoring
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/18—Comparators
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Reddy et al. | DNSMOS P. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors | |
| Manocha et al. | A differentiable perceptual audio metric learned from just noticeable differences | |
| Reddy et al. | A scalable noisy speech dataset and online subjective test framework | |
| Mittag et al. | NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets | |
| Mittag et al. | Non-intrusive speech quality assessment for super-wideband speech communication networks | |
| Serrà et al. | SESQA: semi-supervised learning for speech quality assessment | |
| Fu et al. | MetricGAN-U: Unsupervised speech enhancement/dereverberation based only on noisy/reverberated speech | |
| Malfait et al. | P. 563—The ITU-T standard for single-ended speech quality assessment | |
| Liu et al. | X-SEPFORMER: End-to-end speaker extraction network with explicit optimization on speaker confusion | |
| BR112021012308A2 (en) | EQUIPMENT AND METHOD FOR SOURCE SEPARATION USING A SOUND QUALITY ESTIMATE AND CONTROL | |
| Yu et al. | Metricnet: Towards improved modeling for non-intrusive speech quality assessment | |
| Kawanaka et al. | Stable training of DNN for speech enhancement based on perceptually-motivated black-box cost function | |
| Subakan et al. | REAL-M: Towards speech separation on real mixtures | |
| Ristea et al. | ICASSP 2024 speech signal improvement challenge | |
| Dubey et al. | Non-intrusive speech quality assessment using several combinations of auditory features | |
| Zhang et al. | An end-to-end non-intrusive model for subjective and objective real-world speech assessment using a multi-task framework | |
| Cutler et al. | ICASSP 2023 speech signal improvement challenge | |
| Diener et al. | PLCMOS--a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms | |
| Mittag et al. | Full-reference speech quality estimation with attentional siamese neural networks | |
| Manocha et al. | Audio similarity is unreliable as a proxy for audio quality | |
| Rosenbaum et al. | Differentiable mean opinion score regularization for perceptual speech enhancement | |
| Manocha et al. | SQAPP: No-reference speech quality assessment via pairwise preference | |
| Hajal et al. | MOSRA: Joint mean opinion score and room acoustics speech quality assessment | |
| Mumtaz et al. | Nonintrusive perceptual audio quality assessment for user-generated content using deep learning | |
| Liu et al. | MOS prediction network for non-intrusive speech quality assessment in online conferencing |