CN107342074A - The recognition methods invention of voice and sound - Google Patents
The recognition methods invention of voice and sound Download PDFInfo
- Publication number
- CN107342074A CN107342074A CN201610273827.9A CN201610273827A CN107342074A CN 107342074 A CN107342074 A CN 107342074A CN 201610273827 A CN201610273827 A CN 201610273827A CN 107342074 A CN107342074 A CN 107342074A
- Authority
- CN
- China
- Prior art keywords
- sound
- array
- voice
- identified
- loudness
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000000605 extraction Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The present invention proposes the method for realizing speech recognition.This method is characterized in ignoring the less sound of loudness, and calculate sound to be identified and pure voice apart from when, the maximum loudness for being no more than pure voice of result of institute, therefore to having the environment of noise and pronouncing shorter word or word has preferable recognition effect.
Description
Technical field
The invention belongs to speech recognition and voice recognition field, and in particular to one kind realizes that voice and sound are known
Method for distinguishing.
Background technology
Speech recognition is the important component of artificial intelligence, there is extensive purposes, but current language
Sound identification recognition capability in the environment for have noise is poor.《IEEE JOURNAL ON SELECTED
AREAS IN COMMUNICATIONS, VOL.10, NO.5, JUNE 1992》Magazine《An
Objective Measure for Predicting Subjective Quality of Speech Coders》One text (with
Call document 1 in the following text) method that describes difference between a kind of two voices of comparison, but if knowing for voice
Not, this method effect is very undesirable.In addition, this method needs two voices to be aligned completely,
But in reality, voice can beginning and end at any time, as a consequence it is hardly possible to is aligned in advance.Therefore,
The present invention proposes a solution, it is intended to solves these problems.
The content of the invention
A kind of method for realizing speech recognition, method are that pure voice A is converted to represent described pure
The two-dimensional array F of loudness of the voice A on Bark, sound G to be identified is converted to and represents institute
The two-dimensional array H of the loudness of sound G to be identified on Bark is stated, it is characterized in that:
In the array F and the array H, it is smaller to ignore loudness in the array F
Element and the array H in element corresponding with the less element of loudness in the array F.
A kind of method for realizing speech recognition, method are that pure voice A2 is converted to represent described pure
The two-dimensional array F2 of loudness of the voice A2 on Bark, sound G2 to be identified is converted to expression
The two-dimensional array H2 of loudness of the sound G2 to be identified on Bark, it is characterized in that:
The corresponding element in the element F2 [x] [y] and the array H2 for calculating the array F2
H2 [x] [y] apart from when, make the maximum value for being no more than the element F2 [x] [y] of result of calculating.
Preferably, if sound G3 to be identified is the sound different with pure voice A3 length, for meter
Calculate whether the sound G3 to be identified includes the pure voice A3, it is characterized in that:
Extraction and the pure voice A3 length identical one from the sound G3 to be identified frame by frame
Duan Shengyin G4, then the sound G4 and pure voice A3.
Preferably, the pure voice A and the pure voice A2 are multiplied by a scale factor, then and
The sound G to be identified and the sound G2 to be identified are compared.
Compared with prior art, advantage of the invention is that:To having the environment of noise and pronunciation shorter
Word or word have preferable recognition effect.
Embodiment
Embodiment 1:
In voice, and it is broader for sound in, distribution of the power in frequency is not complete phase
Deng, and distribution of the power in frequency can change over time.The distribution of exactly this frequency, with
And their change, make one that various sound can be told.Assuming that there are one 200 hertz and one 2000
Hertz, the constant sinusoidal sound of intensity occurs simultaneously, and the loudness of 200 hertz of sinusoidal sounds is 2000 hertz
Hereby 2 times, in this case, the mankind can recognize the sound for having one 2000 hertz in sound easily
Sound.But if the method and formula of document 1 are directly used in the identification of sound, and calculate two sound
Distance, will be considered that this sound and 2000 hertz it is far apart, thus can not identify 2000 hertz this
Individual sound.But if first hearing 2000 hertz of sine wave pure tone to the mankind, he, it can be seen that this
Loudness of the individual sound on 200Hz and other frequencies is zero, thus can ignore 200 hertz of sound,
Only consider 2000 hertz this sound, thus can still recognize 2000 hertz of this sound.
In addition, in having the environment of noise, the too small sound of loudness is easily interfered very much, therefore is having
, it is necessary to ignore the sound that loudness is too small in pure voice during progress speech recognition in the environment of noise.
It is now assumed that the voice for thering is some to record, such as " north " word (hereinafter referred to as A) in " Beijing ".
A length of 0.5 second during A, 8000 hertz of sample rate, therefore share 4000 samplings.First, A points
Into multiple overlapping or nonoverlapping frame, then each frame using window function (such as Hamming window, Hanning window,
Sin windows etc.) adding window.The application recommends more than 8 times of overlap sampling, and is added using sin window functions
Window.For example, it is assumed that each frame is 50 milliseconds, 8 times of overlap samplings, then the 1st frame of voice is A
1 to 400 sampling, the 2nd frame is 51 to 450 sampling, and the 3rd frame is 101 to 500 to adopt
Sample etc..Then each frame is used sin window function adding windows.Therefore, A is converted into 2 dimensions
Group E, the element of array is E [n] [m], and wherein n is 1 totalframes for arriving A, and m is 1 to 400,
Wherein 400 be the hits of each frame.Here certain a line from E [x] [1] to E [x] [24] is represented with E [x].
Array E every a line is calculated to loudness caused by each Bark (bark) of human ear as the method for document 1
(loudness), thus array E to be converted into array F, F element be F [n] [m], wherein n is 1
To A totalframes, m is 1 to 24, wherein 24 be the Bark number of human ear, a line for representing E is (right
Answer, an A frame) calculated by the method for document 1, to caused by 24 Barks of human ear
Loudness.But the division of other quantity is also feasible, such as each Bark is divided into two again, because
This is divided into 48 Barks, can obtain more preferable recognition effect.It is now assumed that played again at another moment
During voice A, due to the influence of noise, A becomes G.Equally, the method for G being used document 1
The element for being converted into array H, H is H [n] [m], and wherein n is 1 totalframes for arriving A, and m arrives for 1
24.H a line represents what is calculated by the method for document 1, to caused by 24 bark of human ear
Loudness.Whether voice A is included for identification H, it is to calculate definitely to make array P=abs (H-F), wherein abs
The function of value.That is, array P each element of each element equal to array H is made to subtract
Array F each corresponding element, then array P each element is taken absolute value.
In order to be identified in the environment for have noise, it is necessary to ignore the element that loudness is too small in F.Because
In noisy environment, these elements are easily interfered very much, become almost unavailable.For loudness too
Small standard, the application recommend in pure voice 1/4 to 1/2 of maximum loudness value on bark.Human ear is come
Say, 1/4 loudness, acoustical power only about 1/100.Even 1/2 loudness, acoustical power is also about
Only 1/10, thus while their loudness in pure voice are not small, but actual acoustical power very little, and
And therefore it is highly susceptible to disturb.In quiet environment, these sound still contribute to identification,
But in noisy environment, but become no longer available.Specifically, it is assumed that maximum element in array F
It is worth for mf, then each element in F is checked, if F [x] [y] < (mf/4), then P [x] [y]=0 is set,
And F [x] [y]=0, these elements is not had any influence on result again in follow-up calculating,
In other words, these elements are ignored.
Secondly, when whether containing some voice in calculating identified sound, the calculating of distance is most very much not
The loudness that bark is corresponded in pure voice should be exceeded.That is, check array P each element
P [x] [y], if P [x] [y] > F [x] [y], then make P [x] [y]=F [x] [y].As an example it is assumed that calculate
The P [2] [5] gone out is equal to 0.8, and F [2] [5] is equal to 0.5, then will make P [2] [5]=0.5.
Afterwards, all elements sum in array F is calculated, obtains sf;Calculate in array P all elements it
With obtain sp.Make d=sp/sf.If d is less than or equal to some less numerical value, such as 0.2, then
It is considered as have found voice A in sound G.It should be noted that voice A is have found in sound G,
Possibility comprising other sound or voice is not precluded from G, such as the other voices or background spoken simultaneously
Sound of music etc..
Embodiment 2:
For embodiment 1, there is preferable judgement effect, but also had some problems to need to solve,
For example, it is assumed that the length of pure voice is 0.5 second, sound length to be identified is 10 seconds, and language therein
Sound may be since any time of 10 seconds, and embodiment 1 is then assumed before comparison, pure voice and
Sound length to be identified is identical, and the position that voice occurs in sound to be identified and pure voice is also complete
It is exactly the same.Solution is to compare frame by frame.For example, it is assumed that the sampling of sound and pure voice to be identified
Rate is all 8000Hz, and 50 milliseconds of frame length, using 8 times of overlap samplings, therefore the step-length of frame is 8000/
(1000/50)/8, equal to 50.If pure voice A length is 0.5 second, therefore has 4000 samplings.
First, the sampling from 1 to 4000 in voice to be identified is taken, judges it using the method in embodiment 1
In whether contain A, followed by the 2nd frame, that is, one step-length of increase, that is, in voice to be identified
51 to 4050 sampling and pure speech comparison.Followed by the 3rd frame, the 4th frame etc..But, may be used
The problem of voice is repeatedly identified, such as 4000 samplings that the 2nd frame and the 3rd frame start can occur
Voice A is all have identified, therefore, if same pure voice is identified in the position being too close to, such as
Only poor 1 to 2 frame, then need to delete these identifications repeated.
Further, since the reason such as recording, pure voice may become in the sound in be identified it is lighter or
It is louder, thus also need to the loudness of pure voice to be multiplied by every time or divided by a less coefficient, such as
1.05, then compare again with sound to be identified, until the loudness of pure voice and sound to be identified differs
It is too remote, such as more than 10 times, so that unlikely in sound to be identified be comprising this pure voice
Only.
In this application, voice and sound almost can always be replaced mutually.Embodiment described above,
Simply one kind of more preferably embodiment of the invention, those skilled in the art is in the technology of the present invention
Aspects in, the usual variations and alternatives of progress, should all include within the scope of the present invention.
Claims (4)
1. a kind of method for realizing speech recognition, method is that pure voice A is converted to represent the pure voice
The two-dimensional array F of loudness of the A on Bark, sound G to be identified is converted to and represents described to be identified
Loudness of the sound G on Bark two-dimensional array H, it is characterized in that:
In the array F and the array H, it is less to ignore loudness in the array F
Element corresponding with the less element of loudness in the array F in element and the array H.
2. a kind of method for realizing speech recognition, method is that pure voice A2 is converted to represent the pure language
The two-dimensional array F2 of loudness of the sound A2 on Bark, sound G2 to be identified is converted to described in expression
The two-dimensional array H2 of the loudness of sound G2 to be identified on Bark, it is characterized in that:
The corresponding element in the element F2 [x] [y] and the array H2 for calculating the array F2
H2 [x] [y] apart from when, make the maximum value for being no more than the element F2 [x] [y] of result of calculating.
3. according to the method for realizing speech recognition described in claim 1 and/or claim 2, method
It is, if sound G3 to be identified is the sound different with pure voice A3 length, to wait to know described in calculating
Whether other sound G3 includes the pure voice A3, it is characterized in that:
Extraction and the pure voice A3 length identical one from the sound G3 to be identified frame by frame
Duan Shengyin G4, then the sound G4 and pure voice A3.
4. according to the method for realizing speech recognition described in claim 1 and/or claim 2, it is special
Sign is:
The pure voice A and the pure voice A2 are multiplied by a scale factor, then wait to know with described
Other sound G and the sound G2 to be identified are compared.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610273827.9A CN107342074B (en) | 2016-04-29 | 2016-04-29 | Speech and sound recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610273827.9A CN107342074B (en) | 2016-04-29 | 2016-04-29 | Speech and sound recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107342074A true CN107342074A (en) | 2017-11-10 |
CN107342074B CN107342074B (en) | 2024-03-15 |
Family
ID=60221815
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610273827.9A Active CN107342074B (en) | 2016-04-29 | 2016-04-29 | Speech and sound recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107342074B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5864794A (en) * | 1994-03-18 | 1999-01-26 | Mitsubishi Denki Kabushiki Kaisha | Signal encoding and decoding system using auditory parameters and bark spectrum |
US20020062211A1 (en) * | 2000-10-13 | 2002-05-23 | Li Qi P. | Easily tunable auditory-based speech signal feature extraction method and apparatus for use in automatic speech recognition |
WO2003036621A1 (en) * | 2001-10-22 | 2003-05-01 | Motorola, Inc., A Corporation Of The State Of Delaware | Method and apparatus for enhancing loudness of an audio signal |
JP2004029215A (en) * | 2002-06-24 | 2004-01-29 | Auto Network Gijutsu Kenkyusho:Kk | Speech recognition accuracy evaluation method for speech recognition device |
CN1655230A (en) * | 2005-01-18 | 2005-08-17 | 中国电子科技集团公司第三十研究所 | Noise masking threshold algorithm based Barker spectrum distortion measuring method in objective assessment of sound quality |
CN102376306A (en) * | 2010-08-04 | 2012-03-14 | 华为技术有限公司 | Method and device for acquiring level of speech frame |
US20120233164A1 (en) * | 2008-09-05 | 2012-09-13 | Sourcetone, Llc | Music classification system and method |
CN103903612A (en) * | 2014-03-26 | 2014-07-02 | 浙江工业大学 | Method for performing real-time digital speech recognition |
-
2016
- 2016-04-29 CN CN201610273827.9A patent/CN107342074B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5864794A (en) * | 1994-03-18 | 1999-01-26 | Mitsubishi Denki Kabushiki Kaisha | Signal encoding and decoding system using auditory parameters and bark spectrum |
US20020062211A1 (en) * | 2000-10-13 | 2002-05-23 | Li Qi P. | Easily tunable auditory-based speech signal feature extraction method and apparatus for use in automatic speech recognition |
WO2003036621A1 (en) * | 2001-10-22 | 2003-05-01 | Motorola, Inc., A Corporation Of The State Of Delaware | Method and apparatus for enhancing loudness of an audio signal |
JP2004029215A (en) * | 2002-06-24 | 2004-01-29 | Auto Network Gijutsu Kenkyusho:Kk | Speech recognition accuracy evaluation method for speech recognition device |
CN1655230A (en) * | 2005-01-18 | 2005-08-17 | 中国电子科技集团公司第三十研究所 | Noise masking threshold algorithm based Barker spectrum distortion measuring method in objective assessment of sound quality |
US20120233164A1 (en) * | 2008-09-05 | 2012-09-13 | Sourcetone, Llc | Music classification system and method |
CN102376306A (en) * | 2010-08-04 | 2012-03-14 | 华为技术有限公司 | Method and device for acquiring level of speech frame |
CN103903612A (en) * | 2014-03-26 | 2014-07-02 | 浙江工业大学 | Method for performing real-time digital speech recognition |
Non-Patent Citations (5)
Title |
---|
K.K. CHU 等: "Perceptually non-uniform spectral compression for noisy speech recognition", 《 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING》 * |
K.K. CHU 等: "Perceptually non-uniform spectral compression for noisy speech recognition", 《 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING》, 10 April 2003 (2003-04-10), pages 404 - 407 * |
宋芳芳等: "基于语音识别技术的英语口语自学系统评分机制的研究", 《电脑知识与技术》 * |
宋芳芳等: "基于语音识别技术的英语口语自学系统评分机制的研究", 《电脑知识与技术》, vol. 5, no. 07, 5 March 2009 (2009-03-05), pages 1728 * |
袁修干 等: "《人机工程》", 31 August 2002, 北京航空航天大学出版社, pages: 131 * |
Also Published As
Publication number | Publication date |
---|---|
CN107342074B (en) | 2024-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102118411B1 (en) | Systems and methods for source signal separation | |
US7620546B2 (en) | Isolating speech signals utilizing neural networks | |
Nakatani et al. | Robust and accurate fundamental frequency estimation based on dominant harmonic components | |
Zhang et al. | Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison–female voices | |
CN108604452A (en) | Voice signal intensifier | |
CA2264773A1 (en) | Speech processing system | |
US20190180758A1 (en) | Voice processing apparatus, voice processing method, and non-transitory computer-readable storage medium for storing program | |
CN108597505A (en) | Audio recognition method, device and terminal device | |
Enzinger et al. | Empirical test of the performance of an acoustic-phonetic approach to forensic voice comparison under conditions similar to those of a real case | |
CN112992190B (en) | Audio signal processing method and device, electronic equipment and storage medium | |
US20080120100A1 (en) | Method For Detecting Target Sound, Method For Detecting Delay Time In Signal Input, And Sound Signal Processor | |
JP5443547B2 (en) | Signal processing device | |
Rahaman et al. | Performance analysis of isolated speech recognition technique using MFCC and cross-correlation | |
CN113674723B (en) | Audio processing method, computer equipment and readable storage medium | |
Tchorz et al. | Estimation of the signal-to-noise ratio with amplitude modulation spectrograms | |
Rao et al. | Robust speaker recognition on mobile devices | |
CN107342074A (en) | The recognition methods invention of voice and sound | |
JP3916834B2 (en) | Extraction method of fundamental period or fundamental frequency of periodic waveform with added noise | |
CN109272996A (en) | A kind of noise-reduction method and system | |
Vestman et al. | Time-varying autoregressions for speaker verification in reverberant conditions | |
Dai et al. | 2D Psychoacoustic modeling of equivalent masking for automatic speech recognition | |
Lee et al. | Speech Enhancement Using Phase‐Dependent A Priori SNR Estimator in Log‐Mel Spectral Domain | |
JP2014202777A (en) | Generation device and generation method and program for masker sound signal | |
Moritz et al. | Amplitude modulation filters as feature sets for robust ASR: constant absolute or relative bandwidth? | |
Kuo et al. | Auditory-based robust speech recognition system for ambient assisted living in smart home |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
DD01 | Delivery of document by public notice | ||
DD01 | Delivery of document by public notice |
Addressee: Wang Rong Document name: Notification of Patent Invention Entering into Substantive Examination Stage |
|
GR01 | Patent grant | ||
GR01 | Patent grant |