[go: up one dir, main page]

CN107342074A - The recognition methods invention of voice and sound - Google Patents

The recognition methods invention of voice and sound Download PDF

Info

Publication number
CN107342074A
CN107342074A CN201610273827.9A CN201610273827A CN107342074A CN 107342074 A CN107342074 A CN 107342074A CN 201610273827 A CN201610273827 A CN 201610273827A CN 107342074 A CN107342074 A CN 107342074A
Authority
CN
China
Prior art keywords
sound
array
voice
identified
loudness
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610273827.9A
Other languages
Chinese (zh)
Other versions
CN107342074B (en
Inventor
王荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201610273827.9A priority Critical patent/CN107342074B/en
Publication of CN107342074A publication Critical patent/CN107342074A/en
Application granted granted Critical
Publication of CN107342074B publication Critical patent/CN107342074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The present invention proposes the method for realizing speech recognition.This method is characterized in ignoring the less sound of loudness, and calculate sound to be identified and pure voice apart from when, the maximum loudness for being no more than pure voice of result of institute, therefore to having the environment of noise and pronouncing shorter word or word has preferable recognition effect.

Description

The recognition methods invention of voice and sound
Technical field
The invention belongs to speech recognition and voice recognition field, and in particular to one kind realizes that voice and sound are known Method for distinguishing.
Background technology
Speech recognition is the important component of artificial intelligence, there is extensive purposes, but current language Sound identification recognition capability in the environment for have noise is poor.《IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL.10, NO.5, JUNE 1992》Magazine《An Objective Measure for Predicting Subjective Quality of Speech Coders》One text (with Call document 1 in the following text) method that describes difference between a kind of two voices of comparison, but if knowing for voice Not, this method effect is very undesirable.In addition, this method needs two voices to be aligned completely, But in reality, voice can beginning and end at any time, as a consequence it is hardly possible to is aligned in advance.Therefore, The present invention proposes a solution, it is intended to solves these problems.
The content of the invention
A kind of method for realizing speech recognition, method are that pure voice A is converted to represent described pure The two-dimensional array F of loudness of the voice A on Bark, sound G to be identified is converted to and represents institute The two-dimensional array H of the loudness of sound G to be identified on Bark is stated, it is characterized in that:
In the array F and the array H, it is smaller to ignore loudness in the array F Element and the array H in element corresponding with the less element of loudness in the array F.
A kind of method for realizing speech recognition, method are that pure voice A2 is converted to represent described pure The two-dimensional array F2 of loudness of the voice A2 on Bark, sound G2 to be identified is converted to expression The two-dimensional array H2 of loudness of the sound G2 to be identified on Bark, it is characterized in that:
The corresponding element in the element F2 [x] [y] and the array H2 for calculating the array F2 H2 [x] [y] apart from when, make the maximum value for being no more than the element F2 [x] [y] of result of calculating.
Preferably, if sound G3 to be identified is the sound different with pure voice A3 length, for meter Calculate whether the sound G3 to be identified includes the pure voice A3, it is characterized in that:
Extraction and the pure voice A3 length identical one from the sound G3 to be identified frame by frame Duan Shengyin G4, then the sound G4 and pure voice A3.
Preferably, the pure voice A and the pure voice A2 are multiplied by a scale factor, then and The sound G to be identified and the sound G2 to be identified are compared.
Compared with prior art, advantage of the invention is that:To having the environment of noise and pronunciation shorter Word or word have preferable recognition effect.
Embodiment
Embodiment 1:
In voice, and it is broader for sound in, distribution of the power in frequency is not complete phase Deng, and distribution of the power in frequency can change over time.The distribution of exactly this frequency, with And their change, make one that various sound can be told.Assuming that there are one 200 hertz and one 2000 Hertz, the constant sinusoidal sound of intensity occurs simultaneously, and the loudness of 200 hertz of sinusoidal sounds is 2000 hertz Hereby 2 times, in this case, the mankind can recognize the sound for having one 2000 hertz in sound easily Sound.But if the method and formula of document 1 are directly used in the identification of sound, and calculate two sound Distance, will be considered that this sound and 2000 hertz it is far apart, thus can not identify 2000 hertz this Individual sound.But if first hearing 2000 hertz of sine wave pure tone to the mankind, he, it can be seen that this Loudness of the individual sound on 200Hz and other frequencies is zero, thus can ignore 200 hertz of sound, Only consider 2000 hertz this sound, thus can still recognize 2000 hertz of this sound.
In addition, in having the environment of noise, the too small sound of loudness is easily interfered very much, therefore is having , it is necessary to ignore the sound that loudness is too small in pure voice during progress speech recognition in the environment of noise.
It is now assumed that the voice for thering is some to record, such as " north " word (hereinafter referred to as A) in " Beijing ". A length of 0.5 second during A, 8000 hertz of sample rate, therefore share 4000 samplings.First, A points Into multiple overlapping or nonoverlapping frame, then each frame using window function (such as Hamming window, Hanning window, Sin windows etc.) adding window.The application recommends more than 8 times of overlap sampling, and is added using sin window functions Window.For example, it is assumed that each frame is 50 milliseconds, 8 times of overlap samplings, then the 1st frame of voice is A 1 to 400 sampling, the 2nd frame is 51 to 450 sampling, and the 3rd frame is 101 to 500 to adopt Sample etc..Then each frame is used sin window function adding windows.Therefore, A is converted into 2 dimensions Group E, the element of array is E [n] [m], and wherein n is 1 totalframes for arriving A, and m is 1 to 400, Wherein 400 be the hits of each frame.Here certain a line from E [x] [1] to E [x] [24] is represented with E [x]. Array E every a line is calculated to loudness caused by each Bark (bark) of human ear as the method for document 1 (loudness), thus array E to be converted into array F, F element be F [n] [m], wherein n is 1 To A totalframes, m is 1 to 24, wherein 24 be the Bark number of human ear, a line for representing E is (right Answer, an A frame) calculated by the method for document 1, to caused by 24 Barks of human ear Loudness.But the division of other quantity is also feasible, such as each Bark is divided into two again, because This is divided into 48 Barks, can obtain more preferable recognition effect.It is now assumed that played again at another moment During voice A, due to the influence of noise, A becomes G.Equally, the method for G being used document 1 The element for being converted into array H, H is H [n] [m], and wherein n is 1 totalframes for arriving A, and m arrives for 1 24.H a line represents what is calculated by the method for document 1, to caused by 24 bark of human ear Loudness.Whether voice A is included for identification H, it is to calculate definitely to make array P=abs (H-F), wherein abs The function of value.That is, array P each element of each element equal to array H is made to subtract Array F each corresponding element, then array P each element is taken absolute value.
In order to be identified in the environment for have noise, it is necessary to ignore the element that loudness is too small in F.Because In noisy environment, these elements are easily interfered very much, become almost unavailable.For loudness too Small standard, the application recommend in pure voice 1/4 to 1/2 of maximum loudness value on bark.Human ear is come Say, 1/4 loudness, acoustical power only about 1/100.Even 1/2 loudness, acoustical power is also about Only 1/10, thus while their loudness in pure voice are not small, but actual acoustical power very little, and And therefore it is highly susceptible to disturb.In quiet environment, these sound still contribute to identification, But in noisy environment, but become no longer available.Specifically, it is assumed that maximum element in array F It is worth for mf, then each element in F is checked, if F [x] [y] < (mf/4), then P [x] [y]=0 is set, And F [x] [y]=0, these elements is not had any influence on result again in follow-up calculating, In other words, these elements are ignored.
Secondly, when whether containing some voice in calculating identified sound, the calculating of distance is most very much not The loudness that bark is corresponded in pure voice should be exceeded.That is, check array P each element P [x] [y], if P [x] [y] > F [x] [y], then make P [x] [y]=F [x] [y].As an example it is assumed that calculate The P [2] [5] gone out is equal to 0.8, and F [2] [5] is equal to 0.5, then will make P [2] [5]=0.5.
Afterwards, all elements sum in array F is calculated, obtains sf;Calculate in array P all elements it With obtain sp.Make d=sp/sf.If d is less than or equal to some less numerical value, such as 0.2, then It is considered as have found voice A in sound G.It should be noted that voice A is have found in sound G, Possibility comprising other sound or voice is not precluded from G, such as the other voices or background spoken simultaneously Sound of music etc..
Embodiment 2:
For embodiment 1, there is preferable judgement effect, but also had some problems to need to solve, For example, it is assumed that the length of pure voice is 0.5 second, sound length to be identified is 10 seconds, and language therein Sound may be since any time of 10 seconds, and embodiment 1 is then assumed before comparison, pure voice and Sound length to be identified is identical, and the position that voice occurs in sound to be identified and pure voice is also complete It is exactly the same.Solution is to compare frame by frame.For example, it is assumed that the sampling of sound and pure voice to be identified Rate is all 8000Hz, and 50 milliseconds of frame length, using 8 times of overlap samplings, therefore the step-length of frame is 8000/ (1000/50)/8, equal to 50.If pure voice A length is 0.5 second, therefore has 4000 samplings. First, the sampling from 1 to 4000 in voice to be identified is taken, judges it using the method in embodiment 1 In whether contain A, followed by the 2nd frame, that is, one step-length of increase, that is, in voice to be identified 51 to 4050 sampling and pure speech comparison.Followed by the 3rd frame, the 4th frame etc..But, may be used The problem of voice is repeatedly identified, such as 4000 samplings that the 2nd frame and the 3rd frame start can occur Voice A is all have identified, therefore, if same pure voice is identified in the position being too close to, such as Only poor 1 to 2 frame, then need to delete these identifications repeated.
Further, since the reason such as recording, pure voice may become in the sound in be identified it is lighter or It is louder, thus also need to the loudness of pure voice to be multiplied by every time or divided by a less coefficient, such as 1.05, then compare again with sound to be identified, until the loudness of pure voice and sound to be identified differs It is too remote, such as more than 10 times, so that unlikely in sound to be identified be comprising this pure voice Only.
In this application, voice and sound almost can always be replaced mutually.Embodiment described above, Simply one kind of more preferably embodiment of the invention, those skilled in the art is in the technology of the present invention Aspects in, the usual variations and alternatives of progress, should all include within the scope of the present invention.

Claims (4)

1. a kind of method for realizing speech recognition, method is that pure voice A is converted to represent the pure voice The two-dimensional array F of loudness of the A on Bark, sound G to be identified is converted to and represents described to be identified Loudness of the sound G on Bark two-dimensional array H, it is characterized in that:
In the array F and the array H, it is less to ignore loudness in the array F Element corresponding with the less element of loudness in the array F in element and the array H.
2. a kind of method for realizing speech recognition, method is that pure voice A2 is converted to represent the pure language The two-dimensional array F2 of loudness of the sound A2 on Bark, sound G2 to be identified is converted to described in expression The two-dimensional array H2 of the loudness of sound G2 to be identified on Bark, it is characterized in that:
The corresponding element in the element F2 [x] [y] and the array H2 for calculating the array F2 H2 [x] [y] apart from when, make the maximum value for being no more than the element F2 [x] [y] of result of calculating.
3. according to the method for realizing speech recognition described in claim 1 and/or claim 2, method It is, if sound G3 to be identified is the sound different with pure voice A3 length, to wait to know described in calculating Whether other sound G3 includes the pure voice A3, it is characterized in that:
Extraction and the pure voice A3 length identical one from the sound G3 to be identified frame by frame Duan Shengyin G4, then the sound G4 and pure voice A3.
4. according to the method for realizing speech recognition described in claim 1 and/or claim 2, it is special Sign is:
The pure voice A and the pure voice A2 are multiplied by a scale factor, then wait to know with described Other sound G and the sound G2 to be identified are compared.
CN201610273827.9A 2016-04-29 2016-04-29 Speech and sound recognition method Active CN107342074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610273827.9A CN107342074B (en) 2016-04-29 2016-04-29 Speech and sound recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610273827.9A CN107342074B (en) 2016-04-29 2016-04-29 Speech and sound recognition method

Publications (2)

Publication Number Publication Date
CN107342074A true CN107342074A (en) 2017-11-10
CN107342074B CN107342074B (en) 2024-03-15

Family

ID=60221815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610273827.9A Active CN107342074B (en) 2016-04-29 2016-04-29 Speech and sound recognition method

Country Status (1)

Country Link
CN (1) CN107342074B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864794A (en) * 1994-03-18 1999-01-26 Mitsubishi Denki Kabushiki Kaisha Signal encoding and decoding system using auditory parameters and bark spectrum
US20020062211A1 (en) * 2000-10-13 2002-05-23 Li Qi P. Easily tunable auditory-based speech signal feature extraction method and apparatus for use in automatic speech recognition
WO2003036621A1 (en) * 2001-10-22 2003-05-01 Motorola, Inc., A Corporation Of The State Of Delaware Method and apparatus for enhancing loudness of an audio signal
JP2004029215A (en) * 2002-06-24 2004-01-29 Auto Network Gijutsu Kenkyusho:Kk Speech recognition accuracy evaluation method for speech recognition device
CN1655230A (en) * 2005-01-18 2005-08-17 中国电子科技集团公司第三十研究所 Noise masking threshold algorithm based Barker spectrum distortion measuring method in objective assessment of sound quality
CN102376306A (en) * 2010-08-04 2012-03-14 华为技术有限公司 Method and device for acquiring level of speech frame
US20120233164A1 (en) * 2008-09-05 2012-09-13 Sourcetone, Llc Music classification system and method
CN103903612A (en) * 2014-03-26 2014-07-02 浙江工业大学 Method for performing real-time digital speech recognition

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864794A (en) * 1994-03-18 1999-01-26 Mitsubishi Denki Kabushiki Kaisha Signal encoding and decoding system using auditory parameters and bark spectrum
US20020062211A1 (en) * 2000-10-13 2002-05-23 Li Qi P. Easily tunable auditory-based speech signal feature extraction method and apparatus for use in automatic speech recognition
WO2003036621A1 (en) * 2001-10-22 2003-05-01 Motorola, Inc., A Corporation Of The State Of Delaware Method and apparatus for enhancing loudness of an audio signal
JP2004029215A (en) * 2002-06-24 2004-01-29 Auto Network Gijutsu Kenkyusho:Kk Speech recognition accuracy evaluation method for speech recognition device
CN1655230A (en) * 2005-01-18 2005-08-17 中国电子科技集团公司第三十研究所 Noise masking threshold algorithm based Barker spectrum distortion measuring method in objective assessment of sound quality
US20120233164A1 (en) * 2008-09-05 2012-09-13 Sourcetone, Llc Music classification system and method
CN102376306A (en) * 2010-08-04 2012-03-14 华为技术有限公司 Method and device for acquiring level of speech frame
CN103903612A (en) * 2014-03-26 2014-07-02 浙江工业大学 Method for performing real-time digital speech recognition

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
K.K. CHU 等: "Perceptually non-uniform spectral compression for noisy speech recognition", 《 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING》 *
K.K. CHU 等: "Perceptually non-uniform spectral compression for noisy speech recognition", 《 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING》, 10 April 2003 (2003-04-10), pages 404 - 407 *
宋芳芳等: "基于语音识别技术的英语口语自学系统评分机制的研究", 《电脑知识与技术》 *
宋芳芳等: "基于语音识别技术的英语口语自学系统评分机制的研究", 《电脑知识与技术》, vol. 5, no. 07, 5 March 2009 (2009-03-05), pages 1728 *
袁修干 等: "《人机工程》", 31 August 2002, 北京航空航天大学出版社, pages: 131 *

Also Published As

Publication number Publication date
CN107342074B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
KR102118411B1 (en) Systems and methods for source signal separation
US7620546B2 (en) Isolating speech signals utilizing neural networks
Nakatani et al. Robust and accurate fundamental frequency estimation based on dominant harmonic components
Zhang et al. Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison–female voices
CN108604452A (en) Voice signal intensifier
CA2264773A1 (en) Speech processing system
US20190180758A1 (en) Voice processing apparatus, voice processing method, and non-transitory computer-readable storage medium for storing program
CN108597505A (en) Audio recognition method, device and terminal device
Enzinger et al. Empirical test of the performance of an acoustic-phonetic approach to forensic voice comparison under conditions similar to those of a real case
CN112992190B (en) Audio signal processing method and device, electronic equipment and storage medium
US20080120100A1 (en) Method For Detecting Target Sound, Method For Detecting Delay Time In Signal Input, And Sound Signal Processor
JP5443547B2 (en) Signal processing device
Rahaman et al. Performance analysis of isolated speech recognition technique using MFCC and cross-correlation
CN113674723B (en) Audio processing method, computer equipment and readable storage medium
Tchorz et al. Estimation of the signal-to-noise ratio with amplitude modulation spectrograms
Rao et al. Robust speaker recognition on mobile devices
CN107342074A (en) The recognition methods invention of voice and sound
JP3916834B2 (en) Extraction method of fundamental period or fundamental frequency of periodic waveform with added noise
CN109272996A (en) A kind of noise-reduction method and system
Vestman et al. Time-varying autoregressions for speaker verification in reverberant conditions
Dai et al. 2D Psychoacoustic modeling of equivalent masking for automatic speech recognition
Lee et al. Speech Enhancement Using Phase‐Dependent A Priori SNR Estimator in Log‐Mel Spectral Domain
JP2014202777A (en) Generation device and generation method and program for masker sound signal
Moritz et al. Amplitude modulation filters as feature sets for robust ASR: constant absolute or relative bandwidth?
Kuo et al. Auditory-based robust speech recognition system for ambient assisted living in smart home

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
DD01 Delivery of document by public notice
DD01 Delivery of document by public notice

Addressee: Wang Rong

Document name: Notification of Patent Invention Entering into Substantive Examination Stage

GR01 Patent grant
GR01 Patent grant