CN109346099A

CN109346099A - A kind of iterated denoising method and chip based on speech recognition

Info

Publication number: CN109346099A
Application number: CN201811512492.7A
Authority: CN
Inventors: 许登科
Original assignee: Zhuhai Amicro Semiconductor Co Ltd
Current assignee: Zhuhai Amicro Semiconductor Co Ltd
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2019-02-15
Anticipated expiration: 2038-12-11
Also published as: CN109346099B

Abstract

The present invention discloses a kind of iterated denoising method and chip based on speech recognition, comprising: step 1: determining targeted voice signal and its objective degrees of confidence value；Step 2: selection is worth matched noise data with objective degrees of confidence from noise database, and control noise data has acoustic frame to participate in predenoising processing with unlabelled in targeted voice signal；Step 3: judging whether the predenoising processing result is greater than the predetermined threshold, be to enter step 4, otherwise enter step 5；Step 4: the corresponding sound frame flag of the predenoising processing result is had into acoustic frame for what is denoised in the targeted voice signal；Step 5: judge the predenoising processing result confidence value and the objective degrees of confidence value absolute difference whether less than one confidence threshold value, be, by the corresponding sound frame flag of the predenoising processing result be the targeted voice signal in denoised have acoustic frame；Otherwise the objective degrees of confidence value is adjusted, the step 2 is returned again to.

Description

A kind of iterated denoising method and chip based on speech recognition

Technical field

The invention belongs to robotic technology field more particularly to a kind of iterated denoising methods and core based on speech recognition Piece.

Background technique

Existing mobile robot (such as sweeping robot, window wiping robot, floor-mopping robot in clean robot etc. Electromechanical integration equipment) noise that can generate during the work time, although although the voice pick device to circulate on the market can be with Voice pickup is carried out to the voice signal that user issues, but while being picked up to the voice signal that user issues, is led to Often also voice pickup can will be carried out to the noise that can be generated in the robot course of work, so that in the voice signal that equipment is picked up It is mingled with a large amount of outside noise, corresponding speech discrimination accuracy is not high, this, which will seriously affect robot, (extraneous voice Imitate signal) identification, and it is voice-based explanation and make logic decision (such as executing relevant path planning).

In the prior art, it is generally in the method that the front end of voice signal denoises, according to classification of speech signals as a result, choosing Suitable voice signal is selected, and inhibits undesirable voice signal, but is complex to the method for classification of speech signals, no Only noise reduction is not thorough, and audio identification efficiency is not high, always has remaining speech frame not processed, influences the effect of speech recognition Fruit.

Summary of the invention

In order to overcome the above technical defects, the present invention proposes following technical scheme:

A kind of iterated denoising method based on speech recognition, which includes: step 1: being obtained from microphone array Voice signal in determine targeted voice signal, correspondingly obtain objective degrees of confidence value；Step 2: being selected from noise database It is worth matched noise data with objective degrees of confidence, and control noise data to have acoustic frame participation with unlabelled in targeted voice signal Predenoising processing, to obtain predenoising processing result corresponding with noise data；Step 3: judging the predenoising processing result Whether it is greater than the predetermined threshold, is to enter step 4, otherwise enters step 5；Step 4: by the predenoising processing result pair The sound frame flag answered has acoustic frame for what is denoised in the targeted voice signal；Step 5: judging the predenoising processing result Confidence value and the objective degrees of confidence value absolute difference whether less than one confidence threshold value, be pre- to be gone described The corresponding sound frame flag of processing result of making an uproar has acoustic frame for what is denoised in the targeted voice signal；Otherwise the target is adjusted Confidence value returns again to the step 2；Wherein, targeted voice signal has acoustic frame comprising associated with control instruction.It is described to change For denoising method by being judged twice the predenoising processing result, comprehensively dispose each in the targeted voice signal There is acoustic frame, be conducive to the completeness of denoising and improve the accuracy of speech de-noising.

Further, the step 1 specifically includes: language acquired in the microphone array is identified by speech engine Sound signal has acoustic frame, and when the signal-to-noise ratio numerical value for having acoustic frame is greater than predetermined snr threshold, this is had the corresponding voice of acoustic frame Signal is determined as the targeted voice signal, then from the target for having and extracting the corresponding targeted voice signal in acoustic frame Confidence value, wherein described to have acoustic frame include confidence value and signal-to-noise ratio numerical value based on speech recognition signal.According to predetermined letter It makes an uproar than threshold value and filters out targeted voice signal, identifying processing targetedly is carried out to specific voice signal, improve noise The precision of speech recognition under environment.

Further, in the step 2, the selection from noise database is worth matched noise number with objective degrees of confidence According to specifically including: judging in the noise database with the presence or absence of one small with the absolute difference of the objective degrees of confidence value It is that then the determining default noise data is the described and mesh in the confidence value of the default noise data of predetermined noise threshold Mark the matched noise data of confidence value.According to the real-time matching degree of the confidence value of noise signal and noise database, selection Matched preset noise signal improves the accuracy of denoising operation.

Further, the method for the predenoising processing specifically includes: it first controls the noise data and carries out reverse phase processing, Obtain a reverse phase noise signal；The reverse phase noise signal is controlled again carry out mixing with the targeted voice signal be superimposed, with Obtain the predenoising processing result corresponding with the noise data.The predenoising processing method is simple and efficient.

Further, the method for the adjustment objective degrees of confidence value includes: according to unmarked in the targeted voice signal The confidence value for having acoustic frame and the current objective degrees of confidence value difference, the current objective degrees of confidence value is tuned up Or it turns down.Be conducive to subsequent based on the judgement screening for having acoustic frame unlabelled in the targeted voice signal, raising iterative processing The accuracy of process.

A kind of chip, the chip is for storing the corresponding program code of the iterated denoising method.By selectively There is acoustic frame to carry out denoising targeted voice signal, threshold value is intelligently set and marks what is currently denoised to have acoustic frame, thus Inhibit influence of the noise to speech recognition, so that denoising effect is more thorough.After obtaining targeted voice signal, carry out first Denoising, to improve recognition accuracy.And it according to the real-time matching degree of noise signal and noise database, is adjusted flexibly and sets Certainty value is to improve denoising efficiency, to further increase audio identification efficiency.

Compared with prior art, technical solution of the present invention is carrying out noise suppression preprocessing after obtaining targeted voice signal During, it is selectively right according to the confidence value real-time matching degree of the confidence value of noise signal and noise database Targeted voice signal has acoustic frame to carry out denoising, and by the way that confidence value is adjusted flexibly, and combines of noise database Denoising is carried out with data, to improve the denoising rate of the targeted voice signal.

Detailed description of the invention

Fig. 1 is a kind of process for iterated denoising method based on speech recognition that one embodiment of the present invention provides Figure.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention is retouched in detail It states.

As shown in fig.1, the embodiment of the present invention provides a kind of iterated denoising method based on speech recognition, as described A kind of embodiment of iterated denoising method, comprising:

Step S101, the voice signal that a particular orientation transmits is obtained from microphone array, is stored in advance based on speech engine Database information domain analysis go determine targeted voice signal, with realize orientation voice pick up, reduce outside noise interference.So After enter step S102.The targeted voice signal includes the voice data of the oral control command said of user or machine input, Correspondingly, objective degrees of confidence value is obtained based on targeted voice signal, under the present embodiment, objective degrees of confidence value refers to the movement Robot can be used to the credibility for indicating the preliminary recognition result of voice to the degree of the authenticity information of special sound signal Numerical value judges the correctness of recognition result according to confidence threshold value, then presents to reduce erroneous judgement.The target language that user says Sound signal is " returning seat charging ", then in the voice data identification process, the objective degrees of confidence value of return includes: sentence confidence level N.

Optionally, voice signal acquired in the microphone array is identified by the speech engine has acoustic frame, Can include and control instruction phase by related voice Characteristics Detection algorithm, the targeted voice signal in the microphone array It is associated to have acoustic frame, so that the targeted voice signal is convertible into multiple speech frames associated with user spoken utterances, wherein language Sound frame may include having acoustic frame and silent frame, which can be executed by various known technologies.When the signal-to-noise ratio numerical value for having acoustic frame When greater than predetermined snr threshold, there is the corresponding voice signal of acoustic frame to be determined as the targeted voice signal this, then from institute State the objective degrees of confidence value that the corresponding targeted voice signal is extracted in acoustic frame, wherein described to have acoustic frame including based on language The confidence value and signal-to-noise ratio numerical value of sound identification signal.

It should be noted that it is described have acoustic frame can be measured using signal-to-noise ratio it includes noise energy level, the letter Make an uproar than be voice data Yu noise data power ratio, usually indicated with decibels, general signal-to-noise ratio is higher to show noise number It is smaller according to power, otherwise on the contrary.The noise energy level is the size for reacting noise data energy in user voice data. Signal-to-noise ratio and noise energy level combine, and indicate the noise size.

Step S102, from selection and the matched noise data of objective degrees of confidence value in noise database is pre-configured, so After enter step S103.Specifically, according to the targeted voice signal include described in have corresponding noise data in acoustic frame, from Search is based on a default noise data in the pre-configuration noise database, then judges that the objective degrees of confidence value is made an uproar with default Whether the absolute difference of the confidence value of sound data is less than predetermined noise threshold, is that then the determining default noise data is institute It states and the matched noise data of objective degrees of confidence value.Under the embodiment of the present invention, due to being produced in the working region of robot Raw noise is more stable, therefore the pre-configuration noise database does not need just to carry out making poor ratio by real-time update noise data Compared with compared with the existing technology, the software for reducing speech recognition is born, and can leave relevant software resource at subsequent denoising for Reason.

Preferably, institute's noise data in the targeted voice signal and the pre-configuration noise database can be carried out To obtain all voice similarity values, the weighted average for being then based on all voice similarity values determines described make a reservation for for comparison Noise threshold.In addition, the highest result of discrimination can be selected as most by multiple noise databases, and from multiple databases Matching result afterwards.To improve the discrimination of the work noise of robot.

Step S103, control the noise data has acoustic frame to participate in predenoising with unlabelled in the targeted voice signal Processing, to obtain predenoising processing result corresponding with the noise data；Specifically, the method for the predenoising processing is specific Include: first to control the noise data to carry out reverse phase processing, obtains a reverse phase noise signal；The reverse phase noise letter is controlled again Number carrying out mixing with the targeted voice signal is superimposed, and is tied with obtaining the predenoising processing corresponding with the noise data Fruit obtains predenoising treated voice messaging to eliminate the noise signal in the targeted voice signal.

Step S104, judge whether the predenoising processing result is greater than predetermined threshold, be to enter step S105, otherwise Enter step S106；The predetermined threshold be it is pre-stored, can be used to the distortion level for measuring voice signal.If described pre- Denoising result is greater than the predetermined threshold, then the predenoising processing result indicates having from the targeted voice signal Undesirable noise is removed in acoustic frame, i.e. influence of these noises to recognition result has been eliminated.If the predenoising processing knot Fruit is less than the predetermined threshold, then needs further to adjust denoising, each in the received targeted voice signal to guarantee It is a to have acoustic frame that be handled, so that Speech Signal De-Noise is more thorough, and then improve the integrality of the targeted voice signal With the accuracy rate of identification.

Step S105, it is greater than the judging result of the predetermined threshold according to the predenoising processing result, at predenoising The corresponding sound frame flag of reason result is the acoustic frame that has denoised in the targeted voice signal, and the targeted voice signal It is remaining it is unlabelled have acoustic frame may be unsatisfactory for after the predenoising processing predenoising processing result be greater than it is described predetermined This condition of threshold value, needs to wait for the denoising of subsequent step, is then uniformly converted to phonetic control command, for controlling movement Robot.

Step S106, the confidence value for judging the predenoising processing result and the difference of the objective degrees of confidence value are absolute Value whether less than one confidence threshold value, be to enter step S107, otherwise enter step S108.The predenoising processing result Confidence value be the predenoising processing result be less than the predetermined threshold under the premise of, for predenoising processing described in The numerical value of the credibility of the recognition result of targeted voice signal, and the confidence threshold value can be used as the targeted voice signal The evaluation index of correct recognition rata after being disturbed.By judge the predenoising processing result confidence value and the target The absolute difference of confidence value whether less than one confidence threshold value, to be further processed the residue of the targeted voice signal The unlabelled noise signal for having acoustic frame, to improve the comprehensive and accuracy of the targeted voice signal denoising.

It step S107, is to have been gone in the targeted voice signal by the corresponding sound frame flag of the predenoising processing result That makes an uproar has acoustic frame, so that the unlabelled processing for having acoustic frame of step S105 judgement screening is realized, to improve the essence of speech recognition Degree；And have acoustic frame there are still unlabelled in the targeted voice signal, at this time, it is meant that the current objective degrees of confidence value Matched noise data is unobvious to the unlabelled predenoising effect for having acoustic frame, and error is larger.

Step S108, according to the confidence value for having acoustic frame unlabelled in the targeted voice signal and the current mesh The size of the difference for marking confidence value, the current objective degrees of confidence value is tuned up or is turned down.Under the present embodiment, when the mesh When the unlabelled confidence value for having acoustic frame is greater than the current objective degrees of confidence value in poster sound signal, described in current Objective degrees of confidence value tunes up accordingly, on the contrary then turn down accordingly, then return step S102, selection and the mesh adjusted The mark matched noise data of confidence value makees further denoising.This is apparently based on the current objective degrees of confidence value Parameters revision process, be then based on that corrected parameter rejudges label denoising has acoustic frame, in this way, by successive ignition, circulation Until all in the targeted voice signal have acoustic frame to complete denoising.According to the real-time of noise signal and noise database Confidence value is adjusted flexibly to improve denoising efficiency in matching degree.Then there is acoustic frame for what is denoised in the targeted voice signal It is converted to phonetic control command and goes control mobile robot.Since the targeted voice signal includes cyclic component, so, There is the rule of periodic iterations during executing aforementioned denoising method, avoid correcting the objective degrees of confidence value at random, add Fastly to the judgement speed of the targeted voice signal, to improve the working efficiency of denoising.

Technical solution of the present invention obtains the targeted voice signal that user issues under the noise scenarios that robot works, and The empirical data of noise database according to the pre-stored data and the targeted voice signal, control the empirical data of noise database Reverse phase handles the noise to inhibit the targeted voice signal；Simultaneously according to the real-time matching degree of noise signal and noise database It is adjusted flexibly associated confidence value, then has acoustic frame by what judgement filtered out denoising, greatly improve the completeness of denoising, in turn Phonetic recognization rate under noise circumstance.

A kind of chip, the chip is for storing the corresponding program code of the iterated denoising method.The chip is using dedicated Integrated control chip, these chips can parse control instruction either internally or externally, and export corresponding control signal, with The execution unit of control robot is acted accordingly.The built-in chip type is clear for controlling in the clean robot Clean robot executes the iterated denoising method, and the targeted voice signal is converted to matched control by treated Instruction, and corresponding operation is executed according to the control instruction.Wherein, the predenoising processing can be carried out using subtraction circuit Signal subtraction can also be combined using phase inverter and add circuit and carry out signal subtraction, these circuits can be together with processor It is integrated into dedicated processes chip, specifically corresponding configuration can be carried out according to design requirement.After having filtered out internal noise interference, place Reason device again parses the signal after filtering out, and parses external sound signal, is converted to matched control instruction with reality Now to the control of robot.How external sound signal is parsed as robot, belongs to the existing technology for having been able to realize, Details are not described herein again.

Finally it should be noted that: the above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof；To the greatest extent The present invention is described in detail with reference to preferred embodiments for pipe, it should be understood by those ordinary skilled in the art that: still It can modify to a specific embodiment of the invention or some technical features can be equivalently replaced；Without departing from this hair The spirit of bright technical solution should all cover within the scope of the technical scheme claimed by the invention.

Claims

1. a kind of iterated denoising method based on speech recognition characterized by comprising

Step 1: determining targeted voice signal from the voice signal that microphone array obtains, correspondingly obtain objective degrees of confidence Value；

Step 2: selection is worth matched noise data with objective degrees of confidence from noise database, and controls noise data and target It is unlabelled in voice signal to there is acoustic frame to participate in predenoising processing, to obtain predenoising processing result corresponding with noise data；

Step 3: judging whether the predenoising processing result is greater than the predetermined threshold, be to enter step 4, otherwise enter step Rapid 5；

Step 4: the corresponding sound frame flag of the predenoising processing result is had for what is denoised in the targeted voice signal Acoustic frame；

Step 5: judge the predenoising processing result confidence value and the objective degrees of confidence value absolute difference whether It is the targeted voice signal that less than one confidence threshold value, which is then by the corresponding sound frame flag of the predenoising processing result, In denoised have acoustic frame；Otherwise the objective degrees of confidence value is adjusted, the step 2 is returned again to；Wherein, targeted voice signal packet There is acoustic frame containing associated with control instruction.

2. iterated denoising method according to claim 1, which is characterized in that the step 1 specifically includes:

Voice signal acquired in the microphone array is identified by speech engine has acoustic frame, when the signal-to-noise ratio for having acoustic frame When numerical value is greater than predetermined snr threshold, there is the corresponding voice signal of acoustic frame to be determined as the targeted voice signal this, then From the objective degrees of confidence value for having and extracting the corresponding targeted voice signal in acoustic frame, wherein described to have acoustic frame include base In the confidence value and signal-to-noise ratio numerical value of speech recognition signal.

3. iterated denoising method according to claim 1, which is characterized in that described from noise database in the step 2 Selection is worth matched noise data with objective degrees of confidence, specifically includes:

Judge to be less than predetermined make an uproar with the presence or absence of the absolute difference of one and the objective degrees of confidence value in the noise database The confidence value of the default noise data of sound threshold value is that then the determining default noise data is the described and objective degrees of confidence It is worth matched noise data.

4. iterated denoising method according to claim 1, which is characterized in that the method for the predenoising processing specifically includes:

It first controls the noise data and carries out reverse phase processing, obtain a reverse phase noise signal；

The reverse phase noise signal is controlled again carry out mixing with the targeted voice signal be superimposed, it is described with the noise to obtain The corresponding predenoising processing result of data.

5. iterated denoising method according to claim 1, which is characterized in that the method packet of the adjustment objective degrees of confidence value It includes: according to the difference of the confidence value for having acoustic frame unlabelled in the targeted voice signal and the current objective degrees of confidence value Value, the current objective degrees of confidence value is tuned up or is turned down.

6. a kind of chip, which is characterized in that the chip is for storing any one of claim 1 to 5 iterated denoising method pair The program code answered.