CN109192224A - A kind of speech evaluating method, device, equipment and readable storage medium storing program for executing - Google Patents
A kind of speech evaluating method, device, equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN109192224A CN109192224A CN201811073869.3A CN201811073869A CN109192224A CN 109192224 A CN109192224 A CN 109192224A CN 201811073869 A CN201811073869 A CN 201811073869A CN 109192224 A CN109192224 A CN 109192224A
- Authority
- CN
- China
- Prior art keywords
- keyword
- voice
- evaluated
- hit
- evaluation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012360 testing method Methods 0.000 claims abstract description 117
- 238000011156 evaluation Methods 0.000 claims abstract description 98
- 238000001514 detection method Methods 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims description 46
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000013481 data capture Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 239000000463 material Substances 0.000 description 5
- 235000013399 edible fruits Nutrition 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 206010002953 Aphonia Diseases 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- NGVDGCNFYWLIFO-UHFFFAOYSA-N pyridoxal 5'-phosphate Chemical compound CC1=NC=C(COP(O)(O)=O)C(C=O)=C1O NGVDGCNFYWLIFO-UHFFFAOYSA-N 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Electrically Operated Instructional Devices (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of speech evaluating method, device, equipment and readable storage medium storing program for executing, the application obtains voice to be evaluated, and the keyword as evaluating standard, it further detects in voice to be evaluated with the presence or absence of sound bite corresponding with keyword, testing result is obtained, the evaluation result of voice to be evaluated is determined according to the testing result.The application is by obtaining the keyword as evaluating standard, it may be implemented in the detection voice to be evaluated of automation with the presence or absence of the corresponding sound bite of keyword, and determine the evaluation result of voice to be evaluated according to testing result, due to not needing manually to be evaluated and tested, not only interference of the subjective impact to evaluation result of people had been avoided, but also has reduced the consumption of cost of labor.
Description
Technical field
This application involves voice processing technology fields, more specifically to a kind of speech evaluating method, device, equipment
And readable storage medium storing program for executing.
Background technique
With deepening continuously for educational reform, speaking test is carried out in each provinces and cities in the whole nation.Speaking test is usually given
One section of material, and several topics are set for the material.After reading material by examinee, for per pass topic, pass through spoken language
Form says answer.
Existing speaking test is mostly the answer by professional teacher, according to the corresponding correct option information of topic, to examinee
It is evaluated and tested.This mode manually evaluated and tested extremely is easy the subjective impact by people, causes evaluation result by human interference, and
And it can also consume a large amount of cost of labor.
Summary of the invention
In view of this, this application provides a kind of speech evaluating method, device, equipment and readable storage medium storing program for executing, for solving
Disadvantage present in the mode of certainly existing artificial carry out speaking test evaluation and test.
To achieve the goals above, it is proposed that scheme it is as follows:
A kind of speech evaluating method, comprising:
Obtain voice to be evaluated, and the keyword as evaluating standard;
It detects in the voice to be evaluated with the presence or absence of sound bite corresponding with the keyword, obtains testing result;
According to the testing result, the evaluation result of the voice to be evaluated is determined.
Preferably, it whether there is sound bite corresponding with the keyword, packet in the detection voice to be evaluated
It includes:
Identify the voice to be evaluated, the text information after being identified;
The keyword matched with the text information to obtain matching result, the matching result show it is described to
Evaluation and test voice to the keyword correspond to sound bite include situation.
Preferably, the identification voice to be evaluated, the text information after being identified, comprising:
Extract the acoustic feature of the voice to be evaluated;
The acoustic feature is inputted to the first preset acoustics identification model, obtains the institute of the first acoustics identification model output
State the corresponding text information of voice to be evaluated.
Preferably, the first acoustics identification model is general acoustics identification model, or, to utilize the general acoustics
Identification model knows the recognition result of the voice to be evaluated to the acoustics after the general acoustics identification model progress adaptively
Other model.
Preferably, the first acoustics identification model is to be corresponded to the solution code space that the keyword and filter are constituted
Acoustics identification model, the filter characterizes all non-key words.
Preferably, it whether there is sound bite corresponding with the keyword in the detection voice to be evaluated, also
Include:
The hidden layer output of the first acoustics identification model is obtained, is averaged sound to the hidden layer after acoustic feature conversion
Learn feature;
The be averaged term vector feature of acoustic feature and the keyword of the hidden layer is inputted into the first preset keyword point
Class device, the voice to be evaluated for obtaining the output of the first keyword classification device correspond to voice sheet to the keyword and non-key word
The classification results of section;
The first keyword classification device is, with the acoustic feature of voice training data through the hidden of the first acoustics identification model
Layer convert after hidden layer be averaged acoustic feature and the keyword term vector feature into training sample, with voice training data
It is that sample label training obtains to the classification annotation result that the keyword and non-key word correspond to sound bite.
Preferably, it whether there is sound bite corresponding with the keyword in the detection voice to be evaluated, also
Include:
Adding window is carried out to the voice to be evaluated, obtains at least adding window voice to be evaluated;
It is special to correspond to adding window acoustics to each adding window voice to be evaluated for the hidden layer for obtaining the first acoustics identification model
Hidden layer after sign conversion is averaged adding window acoustic feature;
Each hidden layer adding window acoustic feature that is averaged is inputted into the second preset keyword classification device, obtains the second key
Each adding window voice to be evaluated of word classifier output corresponds to the classification results of sound bite to the keyword and non-key word;
The second keyword classification device is to correspond to sound bite using keyword, non-key word in voice training data
Keyword hidden layer of the acoustic feature after the conversion of the hidden layer of the first acoustics identification model be averaged acoustic feature, non-key word hidden layer
The keyword classification device of average characteristics training.
Preferably, described according to the testing result, determine the evaluation result of the voice to be evaluated, comprising:
Determine that evaluation and test feature, the evaluation and test feature include hit keyword, hit keyword according to the testing result
It is any one or more in the Gauss duration of confidence level, keyword hit rate and hit keyword;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit is closed
The confidence level of keyword is recognition confidence of the first acoustics identification model to the hit keyword, or, the hit is closed
The confidence level of keyword is classification confidence of the first keyword classification device to the hit keyword, or, the hit is closed
The confidence level of keyword is classification confidence of the second keyword classification device to the hit keyword;The keyword hit
Rate is the hit keyword number, accounts for the ratio of the total number of the keyword, the Gauss duration is by the language to be evaluated
The pronunciation duration that keyword is hit described in sound determines;
According to the evaluation and test feature, the evaluation result of the voice to be evaluated is determined.
A kind of speech evaluating device, comprising:
Data capture unit, for obtaining voice to be evaluated, and the keyword as evaluating standard;
Speech detection unit, for detecting in the voice to be evaluated with the presence or absence of voice sheet corresponding with the keyword
Section, obtains testing result;
Evaluation result determination unit, for determining the evaluation result of the voice to be evaluated according to the testing result.
Preferably, the speech detection unit includes:
Text identification unit, for identification voice to be evaluated, the text information after being identified;
Text matches unit obtains matching result for being matched the keyword with the text information, described
Matching result show the voice to be evaluated to the keyword correspond to sound bite comprising situation.
Preferably, the text identification unit includes:
Acoustic feature extraction unit, for extracting the acoustic feature of the voice to be evaluated;
First acoustics identification model predicting unit identifies mould for the acoustic feature to be inputted the first preset acoustics
Type obtains the corresponding text information of the voice to be evaluated of the first acoustics identification model output.
Preferably, the speech detection unit further include:
Global hidden layer feature acquiring unit, what the hidden layer for obtaining the first acoustics identification model exported, to described
Hidden layer after acoustic feature conversion is averaged acoustic feature;
First keyword classification device predicting unit, for by the hidden layer be averaged acoustic feature and the keyword word to
Measure feature inputs the first preset keyword classification device, obtains the voice to be evaluated of the first keyword classification device output to institute
It states keyword and non-key word corresponds to the classification results of sound bite;
The first keyword classification device is, with the acoustic feature of voice training data through the hidden of the first acoustics identification model
Layer convert after hidden layer be averaged acoustic feature and the keyword term vector feature into training sample, with voice training data
It is that sample label training obtains to the classification annotation result that the keyword and non-key word correspond to sound bite.
Preferably, the speech detection unit further include:
Voice windowing unit obtains at least adding window voice to be evaluated for carrying out adding window to the voice to be evaluated;
Adding window hidden layer feature acquiring unit, for obtaining the hidden layer of the first acoustics identification model, to it is each it is described plus
The hidden layer that window voice to be evaluated corresponds to after the conversion of adding window acoustic feature is averaged adding window acoustic feature;
Second keyword classification device predicting unit, for by each hidden layer be averaged adding window acoustic feature input it is preset
Second keyword classification device obtains each adding window voice to be evaluated of the second keyword classification device output to the keyword and non-
Keyword corresponds to the classification results of sound bite;
The second keyword classification device is to correspond to sound bite using keyword, non-key word in voice training data
Keyword hidden layer of the acoustic feature after the conversion of the hidden layer of the first acoustics identification model be averaged acoustic feature, non-key word hidden layer
The keyword classification device of average characteristics training.
Preferably, the evaluation result determination unit includes:
First evaluation and test characteristics determining unit, for determining evaluation and test feature, the evaluation and test feature packet according to the testing result
It includes hit keyword, the confidence level for hitting keyword, keyword hit rate and hits any one in the Gauss duration of keyword
Item is multinomial;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit is closed
The confidence level of keyword is recognition confidence of the first acoustics identification model to the hit keyword;The keyword hit
Rate is the hit keyword number, accounts for the ratio of the total number of the keyword, the Gauss duration is by the language to be evaluated
The pronunciation duration that keyword is hit described in sound determines;
First evaluation and test characteristic processing unit, for determining the evaluation and test knot of the voice to be evaluated according to the evaluation and test feature
Fruit.
Preferably, the evaluation result determination unit includes:
Second evaluation and test characteristics determining unit, for determining evaluation and test feature, the evaluation and test feature packet according to the testing result
It includes hit keyword, the confidence level for hitting keyword, keyword hit rate and hits any one in the Gauss duration of keyword
Item is multinomial;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit is closed
The confidence level of keyword is classification confidence of the first keyword classification device to the hit keyword, or is closed for described second
Classification confidence of the keyword classifier to the hit keyword;The keyword hit rate is the hit keyword number,
Account for the ratio of the total number of the keyword;The Gauss duration hits the pronunciation of keyword as described in the voice to be evaluated
Duration determines;
Second evaluation and test characteristic processing unit, for determining the evaluation and test knot of the voice to be evaluated according to the evaluation and test feature
Fruit.
A kind of speech evaluating equipment, including memory and processor;
The memory, for storing program;
The processor realizes each step of speech evaluating method as described above for executing described program.
A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor
Now each step of speech evaluating method as described above.
It can be seen from the above technical scheme that speech evaluating method provided by the embodiments of the present application, obtains language to be evaluated
Sound, and as the keyword of evaluating standard, further detect in voice to be evaluated with the presence or absence of voice sheet corresponding with keyword
Section, obtains testing result, the evaluation result of voice to be evaluated is determined according to the testing result.The application is by obtaining as evaluation and test
The keyword of standard may be implemented with the presence or absence of the corresponding sound bite of keyword in the detection voice to be evaluated of automation, and
Determine that the evaluation result of voice to be evaluated has both avoided the master of people due to not needing manually to be evaluated and tested according to testing result
Viewing rings the interference to evaluation result, and reduces the consumption of cost of labor.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of speech evaluating method flow chart disclosed in the embodiment of the present application;
Fig. 2 illustrates a kind of speech samples keyword to be evaluated and non-key word hidden layer average characteristics extract schematic diagram;
Fig. 3 is a kind of speech evaluating apparatus structure schematic diagram disclosed in the embodiment of the present application;
Fig. 4 is a kind of hardware block diagram of speech evaluating equipment disclosed in the embodiment of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
In order to solve existing spoken assessment by manually, causes evaluation result by human interference and waste of manpower cost is asked
Topic, the application are realized the speech evaluating of automation based on speech detection technology, are illustrated in detail in conjunction with attached drawing 1, speech evaluating
Method may include:
Step S100, voice to be evaluated, and the keyword as evaluating standard are obtained.
Specifically, by taking speaking test scene as an example, voice to be evaluated can be spoken answer that examinee provides and record.It is corresponding
, the keyword as evaluating standard can be preset in the present embodiment.By taking material reads speaking test topic as an example, as
The keyword of evaluating standard can be the keyword extracted from reading material.In addition to this, for the mouth of other types problem
Language examination, the keyword as evaluating standard can be problem and correspond to the keyword extracted in answer.
In this step, the acquisition modes of voice to be evaluated can be to be received by sound pick-up outfit, and sound pick-up outfit may include
Microphone, such as head microphone.
Wherein, as the keyword of evaluating standard, it is able to reflect the core point of model answer.Keyword can be pre- by user
It first specifies, or keyword can be extracted from the corresponding answer of problem using keyword extraction techniques, common keyword mentions
Take technology such as TF-IDF (term frequency-inverse document frequency) keyword extracting method etc..
It is understood that the number of the keyword as evaluating standard does not limit, one or more can be.
Step S110, detecting whether there is sound bite corresponding with the keyword in the voice to be evaluated, obtain
Testing result.
Specifically, the aforementioned keyword having determined that as evaluating standard, the keyword reflect the core point of answer,
Keyword detection can be carried out to voice to be evaluated in this step, that is, detected corresponding with the presence or absence of keyword in voice to be evaluated
Sound bite obtains testing result.
Testing result reflect voice to be evaluated to keyword correspond to sound bite comprising situation.When keyword is one
When, testing result is whether voice to be evaluated includes the corresponding sound bite of a keyword.When keyword is at least two
When a, testing result be voice to be evaluated to each keyword correspond to sound bite comprising situation.
Step S120, according to the testing result, the evaluation result of the voice to be evaluated is determined.
Specifically, according to preceding description it is found that keyword reflects the core point that problem corresponds to answer, therefore keyword exists
The corresponding answer of problem can be represented to a certain extent.In this step, voice is corresponded to keyword according to aforementioned voice to be evaluated
Segment includes situation, determines the evaluation result of voice to be evaluated.
It is understood that as the quantity that the keyword in the presence of voice to be evaluated corresponds to sound bite increases, to
The evaluation result for evaluating and testing voice is better.
Speech evaluating method provided by the embodiments of the present application may be implemented by obtaining the keyword as evaluating standard
It whether there is the corresponding sound bite of keyword in the detection of automation voice to be evaluated, and to be evaluated to determine according to testing result
The evaluation result for surveying voice has both avoided interference of the subjective impact to evaluation result of people due to not needing manually to be evaluated and tested,
Reduce the consumption of cost of labor again.
In one embodiment of the application, above-mentioned steps S110 is described, detects and whether is deposited in the voice to be evaluated
In a kind of embodiment of sound bite corresponding with the keyword, the optional process may include:
S1, the identification voice to be evaluated, the text information after being identified.
Specifically, speech recognition can be carried out to voice to be evaluated, the text information after being identified.
S2, the keyword is matched to obtain matching result with the text information, the matching result shows institute
State voice to be evaluated to the keyword correspond to sound bite comprising situation.
In this step, keyword is matched with text information, is obtained by the keyword as evaluating standard based on priori
To matching result, which can be used as aforementioned testing result.
It is understood that matching result may include in text information with the presence or absence of the word with each Keywords matching.
Further alternative, matching result can also include the confidence level present in text information with the word of each Keywords matching,
The confidence level of the word can be speech recognition process, the recognition confidence of each word.
Wherein, above-mentioned S1 identifies voice to be evaluated, and the process of the text information after being identified may include:
S11, the acoustic feature for extracting the voice to be evaluated.
Wherein acoustic feature is used for speech recognition, and the acoustic feature is generally the spectrum signature of voice data, such as Meier
Frequency cepstral coefficient (Mel Frequency Cepstrum Coefficient, MFCC) feature or perception linear prediction
(Perceptual Linear Predictive, PLP) feature etc..
When specific extraction, sub-frame processing can be carried out to voice to be evaluated in advance.Further, to the language to be evaluated after framing
Sound carries out preemphasis.Finally, successively extracting the spectrum signature of every frame voice to be evaluated.
S12, the acoustic feature is inputted to the first preset acoustics identification model, obtains the output of the first acoustics identification model
The corresponding text information of the voice to be evaluated.
Wherein, the acoustics that the first acoustics identification model can be the neural network form obtained using training corpus training is known
Other model.
The first acoustics identification model of several alternative constructions is provided in the present embodiment.Next it is introduced respectively.
Firstly, the first acoustics identification model can be general acoustics identification model, i.e., instructed using existing training corpus
The general acoustics identification model got.
It should be noted that although general acoustics identification model can carry out acoustics identification, due to its training corpus
All scenes of speaking test may not be covered, and speaking test scene otherness is larger, and different regions pronunciation difference is big, leads to
It can be reduced with recognition accuracy of the acoustics identification model to the speaking test scene.
On this basis, the present embodiment carries out an identification to voice to be evaluated using general acoustics identification model, obtains
A time recognition result.General acoustics may further be identified using a recognition result and voice to be evaluated as training data
Model carries out adaptive acoustics identification model after obtaining adaptively, as the first acoustics identification model.
Optionally, above-mentioned that adaptive process is carried out to general acoustics identification model, it can choose in a recognition result
Recognition confidence is higher than the recognition result of given threshold, in conjunction with corresponding voice to be evaluated as training data.
Further, this embodiment describes the first acoustics identification models of yet another construction.
Since the application target is the keyword based on priori, detecting in the voice to be evaluated whether there is and the pass
The corresponding sound bite of keyword.In order to further increase keyword detection accuracy, a kind of new solution is devised in the present embodiment
The acoustics identification model of code space is different from the existing solution code space constituted with words all in dictionary, the present embodiment design
New solution code space is made of keyword and filter, and the filter is for absorbing all non-key words in addition to keyword.
Show for example, keyword includes: A, B and C, represents filter with N, then new solution code space includes A, B, C and N.
First acoustics identification model of the new explanation code space of the present embodiment design, speech recognition process is converted to based on first
The keyword tested carries out the process of keyword active detecting.Using the first acoustics identification model of the present embodiment, do not trained
The influence that keyword and non-key word are unevenly distributed in data, keyword recognition accuracy are higher.
Using the corresponding acoustics identification model of above-mentioned new solution code space as the first acoustics identification model.Based on first sound
When learning identification model progress speech recognition, recognition result only includes keyword and filter.It has been filtered out by filter all
The influence of non-key word.Further, when the first acoustics identification model carries out speech recognition, judge that the recognition confidence of keyword is
No is more than setting confidence threshold value, if it is, otherwise being identified as corresponding keyword is identified as filter.
Further, since the acoustic pronunciation of different words or phrase is not necessarily identical, above-mentioned keyword confidence level threshold
Value is not particularly suited for all problems, and the application proposes a kind of keyword confidence threshold value adaptive approach.It is artificial right to can use
The scoring of voice to be evaluated, building alarm collect and recall collection, and alarm collection includes the low voice to be evaluated that scores, recalls and collect comprising commenting
Divide high voice to be evaluated.Based on alarm collection and recall confidence threshold value of the collection adjusting as the keyword of evaluating standard.
In another embodiment of the application, describe above-mentioned steps S110, detect in the voice to be evaluated whether
In the presence of the another embodiment of sound bite corresponding with the keyword.It, can be further on the basis of aforementioned S1-S2
Increase the process for carrying out keyword classification to voice to be evaluated by keyword classification device.
The keyword classification device that two kinds of forms are described in the present embodiment, is introduced respectively.
First keyword classification device:
Based in aforementioned S11-S12, the acoustic feature of voice to be evaluated is inputted into the first acoustics identification model, is exported
The corresponding text information of voice to be evaluated.In the present embodiment, the hidden layer of the available first acoustics identification model is exported
, it is averaged acoustic feature to the hidden layer after acoustic feature conversion.
Wherein, the hidden layer of the first acoustics identification model is averaged acoustic feature to the hidden layer after acoustic feature conversion, is to defeated
A kind of high abstraction for entering acoustic feature indicates.The hidden layer acoustic feature that is averaged is by the hidden layer acoustics of frames all in voice to be evaluated
Result after feature averaging.
Further, the be averaged term vector feature of acoustic feature and the keyword of the hidden layer is inputted into preset first
Keyword classification device obtains the voice to be evaluated of the first keyword classification device output to the keyword and non-key word pair
Answer the classification results of sound bite.
Specifically, classification results obtained in the present embodiment can be used as the testing result in step S110.
Wherein, the first keyword classification device is, with the acoustic feature of voice training data through the first acoustics identification model
Hidden layer convert after hidden layer be averaged acoustic feature and the keyword term vector feature into training sample, with voice training number
It is that sample label training obtains according to the classification annotation result for corresponding to sound bite to the keyword and non-key word.
Second keyword classification device:
In the present embodiment, adding window can be carried out to the voice to be evaluated, obtain at least adding window voice to be evaluated.Its
In, the long size of window can be the first setting frame number, and such as 40 frames, window moving step length can be the second setting frame number, such as 5 frames.For
Each adding window voice to be evaluated can extract corresponding adding window acoustic feature.Further, the first acoustics identification model is obtained
Hidden layer output, the hidden layer after adding window acoustic feature is converted correspond to each adding window voice to be evaluated be averaged adding window acoustics
Feature.
Further, each hidden layer adding window acoustic feature that is averaged is inputted into the second preset keyword classification device, obtained
Each adding window voice to be evaluated of second keyword classification device output corresponds to sound bite to the keyword and non-key word
Classification results.
Specifically, classification results obtained in the present embodiment can be used as the testing result in step S110.
Wherein, the second keyword classification device is to correspond to voice using keyword, non-key word in voice training data
Keyword hidden layer of the acoustic feature of segment after the conversion of the hidden layer of the first acoustics identification model is averaged acoustic feature, non-key word
The keyword classification device of hidden layer average characteristics training.
Specifically, when the training of the second keyword classification device, it can use the first acoustics identification model to voice training data
It is identified, and determines that keyword corresponds to sound bite in voice training data and non-key word corresponds to language according to recognition result
Tablet section.The average spy of keyword hidden layer of the sound bite after the conversion of the hidden layer of the first acoustics identification model is corresponded to using keyword
Sign and non-key word correspond to the average spy of non-key word hidden layer of the sound bite after the conversion of the hidden layer of the first acoustics identification model
Sign, the second keyword classification device of training.
Referring to fig. 2, it illustrates a kind of speech samples keywords to be evaluated and the extraction of non-key word hidden layer average characteristics to show
It is intended to.
Two kinds of keyword classification devices of above-mentioned example, can choose one of use, can also be used with both of which, crucial
The classification results that word classifier obtains are as the testing result in step S110.
In the present embodiment, the hidden layer feature using the first acoustics identification model is further increased, as keyword classification
The input feature vector of device, and export voice to be evaluated using keyword classification device and voice sheet is corresponded to the keyword and non-key word
The classification results of section, the classification results can be in conjunction with of the keyword text information corresponding with voice to be evaluated of aforementioned S1-S2
With as a result, collectively as testing result, whether there is sound bite corresponding with keyword in determination voice to be evaluated.
In another embodiment of the application, introduce abovementioned steps S120, according to the testing result, determine it is described to
Evaluate and test the process of the evaluation result of voice.
Based on the testing result that the various embodiments described above determine, the evaluation result of voice to be evaluated can be determined, which can
To include two links, it is respectively as follows:
First link:
Evaluation and test feature is determined according to the testing result.
A plurality of types of evaluation and test features are described in the present embodiment, are introduced respectively:
1) keyword is hit:
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated.According to aforementioned true
Fixed testing result can determine in voice to be evaluated specifically comprising the corresponding sound bite of which keyword.
Hit keyword as evaluation and test feature when, representation can with one-hot vector form, i.e., by N-dimensional to
Amount indicates that N is keyword number, and each element position corresponds to a unique keyword in N-dimensional vector, element position there are two kinds of values,
First value indicates the keyword as hit keyword, and the second value indicates that the keyword is keyword in unnatural death, wherein first
Value can be 1, and the second value can be 0.
2) confidence level of keyword is hit:
The confidence level of the hit keyword, can be identification of the aforementioned first acoustics identification model to the hit keyword
Confidence level, or can be, aforementioned keyword classification device, such as the first keyword classification device or the second keyword classification device are to the life
The classification confidence of middle keyword.
3) keyword hit rate:
The keyword hit rate is the hit keyword number, accounts for the ratio of the total number of the keyword.
4) the Gauss duration of keyword is hit:
The pronunciation duration that the Gauss duration of the hit keyword hits keyword as described in the voice to be evaluated is true
It is fixed.The Gauss duration of hit keyword can be used as a kind of measurement of the examinee to hit keyword pronunciation characteristics.
Specifically, it can determine to include the corresponding sound bite of which keyword in voice to be evaluated according to testing result,
And keyword corresponds to the position of sound bite.The pronunciation duration that sound bite is corresponded to according to keyword in voice to be evaluated, can
To determine the Gauss duration of keyword.
Gauss duration assumes that each syllable sounds duration Normal Distribution, the Gauss duration for hitting keyword can describe
The keyword pronunciation characteristics of examinee, extracting method are as follows: constructing each hit keyword or keyword component units (such as sound first
Section, phoneme etc.) pronunciation duration mean value and variance distribution table.It is illustrated for hitting keyword component units and being syllable:
The height of each syllable in the statistics available hit keyword of syllable sounds duration mean value and variance distribution table based on building
Long score at this moment is averaged out Gauss duration score of the Gauss duration score as the hit keyword of syllable, calculation formula
Referring to as follows:
Wherein, wgaussFor the Gauss duration for hitting keyword, K is the syllable number for hitting keyword, phgaussIt (k) is kth
The Gauss duration of a syllable, μkAnd σkThe pronunciation duration mean value and variance of respectively k-th syllable, xkTo be ordered in voice to be evaluated
The pronunciation duration of k-th of syllable of middle keyword.
Wherein, above-mentioned syllable sounds duration mean value and variance distribution table can be based on the building of a large amount of speaking test data
Common syllable pronunciation duration mean value and variance distribution table, can also be the auto-adaptive syllable based on the building of current speaking test data
Duration mean value of pronouncing and variance distribution table.
Second link:
According to the evaluation and test feature, the evaluation result of the voice to be evaluated is determined.
Specifically, a variety of evaluation and test features have been determined in first link, the combination of one or more of them, base can be chosen
In the evaluation and test feature of selection, the evaluation result of voice to be evaluated is determined.
In the present embodiment, voice to be evaluated can be determined based on evaluation and test feature and scoring regression model trained in advance
Evaluation result.
Wherein, scoring regression model can be the forms such as linear regression, Gauss recurrence, neural net regression.
The evaluation and test feature that voice training data can be used when training uses the voice training number of mark as training sample
According to evaluation result as sample label.
Speech evaluating device provided by the embodiments of the present application is described below, speech evaluating device described below with
Above-described speech evaluating method can correspond to each other reference.
Referring to Fig. 3, Fig. 3 is a kind of speech evaluating apparatus structure schematic diagram disclosed in the embodiment of the present application.As shown in figure 3,
The apparatus may include:
Data capture unit 11, for obtaining the voice to be evaluated for being directed to target problem, and the key as evaluating standard
Word;
Speech detection unit 12, for detecting in the voice to be evaluated with the presence or absence of voice corresponding with the keyword
Segment obtains testing result;
Evaluation result determination unit 13, for determining the evaluation result of the voice to be evaluated according to the testing result.
Optionally, the speech detection unit may include:
Text identification unit, for identification voice to be evaluated, the text information after being identified;
Text matches unit obtains matching result for being matched the keyword with the text information, described
Matching result show the voice to be evaluated to the keyword correspond to sound bite comprising situation.
Optionally, the text identification unit may include:
Acoustic feature extraction unit, for extracting the acoustic feature of the voice to be evaluated;
First acoustics identification model predicting unit identifies mould for the acoustic feature to be inputted the first preset acoustics
Type obtains the corresponding text information of the voice to be evaluated of the first acoustics identification model output.
Optionally, the first acoustics identification model can be, general acoustics identification model, or, being using described general
Acoustics identification model to the recognition result of the voice to be evaluated, to the general acoustics identification model carry out it is adaptive after sound
Learn identification model.
Optionally, the first acoustics identification model can be, the solution code space constituted with the keyword and filter
Corresponding acoustics identification model, the filter characterize all non-key words.
Optionally, the speech detection unit can also include:
Global hidden layer feature acquiring unit, what the hidden layer for obtaining the first acoustics identification model exported, to described
Hidden layer after acoustic feature conversion is averaged acoustic feature;
First keyword classification device predicting unit, for by the hidden layer be averaged acoustic feature and the keyword word to
Measure feature inputs the first preset keyword classification device, obtains the voice to be evaluated of the first keyword classification device output to institute
It states keyword and non-key word corresponds to the classification results of sound bite;
The first keyword classification device is, with the acoustic feature of voice training data through the hidden of the first acoustics identification model
Layer convert after hidden layer be averaged acoustic feature and the keyword term vector feature into training sample, with voice training data
It is that sample label training obtains to the classification annotation result that the keyword and non-key word correspond to sound bite.
Optionally, the speech detection unit can also include:
Voice windowing unit obtains at least adding window voice to be evaluated for carrying out adding window to the voice to be evaluated;
Adding window hidden layer feature acquiring unit, for obtaining the hidden layer of the first acoustics identification model, to it is each it is described plus
The hidden layer that window voice to be evaluated corresponds to after the conversion of adding window acoustic feature is averaged adding window acoustic feature;
Second keyword classification device predicting unit, for by each hidden layer be averaged adding window acoustic feature input it is preset
Second keyword classification device obtains each adding window voice to be evaluated of the second keyword classification device output to the keyword and non-
Keyword corresponds to the classification results of sound bite;
The second keyword classification device is to correspond to sound bite using keyword, non-key word in voice training data
Keyword hidden layer of the acoustic feature after the conversion of the hidden layer of the first acoustics identification model be averaged acoustic feature, non-key word hidden layer
The keyword classification device of average characteristics training.
Optionally, it present application illustrates two kinds of alternative constructions of evaluation result determination unit, is described below respectively:
The first, the evaluation result determination unit may include:
First evaluation and test characteristics determining unit, for determining evaluation and test feature, the evaluation and test feature packet according to the testing result
It includes hit keyword, the confidence level for hitting keyword, keyword hit rate and hits any one in the Gauss duration of keyword
Item is multinomial;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit is closed
The confidence level of keyword is recognition confidence of the first acoustics identification model to the hit keyword;The keyword hit
Rate is the hit keyword number, accounts for the ratio of the total number of the keyword;The Gauss duration is by the language to be evaluated
The pronunciation duration that keyword is hit described in sound determines;
First evaluation and test characteristic processing unit, for determining the evaluation and test knot of the voice to be evaluated according to the evaluation and test feature
Fruit.
Second, the evaluation result determination unit may include:
Second evaluation and test characteristics determining unit, for determining evaluation and test feature, the evaluation and test feature packet according to the testing result
It includes hit keyword, the confidence level for hitting keyword, keyword hit rate and hits any one in the Gauss duration of keyword
Item is multinomial;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit is closed
The confidence level of keyword is classification confidence of the first keyword classification device to the hit keyword, or is closed for described second
Classification confidence of the keyword classifier to the hit keyword;The keyword hit rate is the hit keyword number,
Account for the ratio of the total number of the keyword;The Gauss duration hits the pronunciation of keyword as described in the voice to be evaluated
Duration determines;
Second evaluation and test characteristic processing unit, for determining the evaluation and test knot of the voice to be evaluated according to the evaluation and test feature
Fruit.
Speech evaluating device provided by the embodiments of the present application can be applied to speech evaluating equipment, such as PC terminal, cloud platform, clothes
Business device and server cluster etc..Optionally, Fig. 4 shows the hardware block diagram of speech evaluating equipment, and referring to Fig. 4, voice is commented
The hardware configuration of measurement equipment may include: at least one processor 1, at least one communication interface 2,3 He of at least one processor
At least one communication bus 4;
In the embodiment of the present application, processor 1, communication interface 2, memory 3, communication bus 4 quantity be at least one,
And processor 1, communication interface 2, memory 3 complete mutual communication by communication bus 4;
Processor 1 may be a central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road etc.;
Memory 3 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-volatile
Memory) etc., a for example, at least magnetic disk storage;
Wherein, memory is stored with program, the program that processor can call memory to store, and described program is used for:
Obtain the voice to be evaluated for being directed to target problem, and the keyword as evaluating standard;
It detects in the voice to be evaluated with the presence or absence of sound bite corresponding with the keyword, obtains testing result;
According to the testing result, the evaluation result of the voice to be evaluated is determined.
Optionally, the refinement function of described program and extension function can refer to above description.
The embodiment of the present application also provides a kind of readable storage medium storing program for executing, which can be stored with and hold suitable for processor
Capable program, described program are used for:
Obtain the voice to be evaluated for being directed to target problem, and the keyword as evaluating standard;
It detects in the voice to be evaluated with the presence or absence of sound bite corresponding with the keyword, obtains testing result;
According to the testing result, the evaluation result of the voice to be evaluated is determined.
Optionally, the refinement function of described program and extension function can refer to above description.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (16)
1. a kind of speech evaluating method characterized by comprising
Obtain voice to be evaluated, and the keyword as evaluating standard;
It detects in the voice to be evaluated with the presence or absence of sound bite corresponding with the keyword, obtains testing result;
According to the testing result, the evaluation result of the voice to be evaluated is determined.
2. the method according to claim 1, wherein whether there is and institute in the detection voice to be evaluated
State the corresponding sound bite of keyword, comprising:
Identify the voice to be evaluated, the text information after being identified;
The keyword is matched to obtain matching result with the text information, the matching result shows described to be evaluated
Voice to the keyword correspond to sound bite comprising situation.
3. according to the method described in claim 2, it is characterized in that, described identify the voice to be evaluated, after being identified
Text information, comprising:
Extract the acoustic feature of the voice to be evaluated;
The acoustic feature is inputted to the first preset acoustics identification model, obtain the first acoustics identification model output it is described to
Evaluate and test the corresponding text information of voice.
4. according to the method described in claim 3, general acoustics identifies it is characterized in that, the first acoustics identification model is
Model, or, knowing for the recognition result using the general acoustics identification model to the voice to be evaluated to the general acoustics
Acoustics identification model after other model progress is adaptive.
5. according to the method described in claim 3, it is characterized in that, the first acoustics identification model is, with the keyword
And the corresponding acoustics identification model of solution code space that filter is constituted, the filter characterize all non-key words.
6. according to the method described in claim 3, it is characterized in that, whether there is and institute in the detection voice to be evaluated
State the corresponding sound bite of keyword, further includes:
The hidden layer output of the first acoustics identification model is obtained, it is special to the acoustics that be averaged of the hidden layer after acoustic feature conversion
Sign;
The be averaged term vector feature of acoustic feature and the keyword of the hidden layer is inputted into the first preset keyword classification device,
The voice to be evaluated for obtaining the output of the first keyword classification device corresponds to sound bite to the keyword and non-key word
Classification results;
The first keyword classification device is to be turned with the acoustic feature of voice training data through the hidden layer of the first acoustics identification model
The be averaged term vector feature of acoustic feature and the keyword of hidden layer after changing is training sample, with voice training data to institute
It states keyword and non-key word corresponds to the classification annotation result of sound bite and obtains for sample label training.
7. method according to claim 1-5, which is characterized in that it is described according to the testing result, determine institute
State the evaluation result of voice to be evaluated, comprising:
Evaluation and test feature is determined according to the testing result, and the evaluation and test feature includes the confidence hit keyword, hit keyword
It is any one or more in the Gauss duration of degree, keyword hit rate and hit keyword;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit keyword
Confidence level be the first acoustics identification model to it is described hit keyword recognition confidence;The keyword hit rate is
The hit keyword number, accounts for the ratio of the total number of the keyword;The Gauss duration is by the voice to be evaluated
The pronunciation duration of the hit keyword determines;
According to the evaluation and test feature, the evaluation result of the voice to be evaluated is determined.
8. according to the method described in claim 6, determining described to be evaluated it is characterized in that, described according to the testing result
The evaluation result of voice, comprising:
Evaluation and test feature is determined according to the testing result, and the evaluation and test feature includes the confidence hit keyword, hit keyword
It is any one or more in the Gauss duration of degree, keyword hit rate and hit keyword;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit keyword
Confidence level be the first keyword classification device to it is described hit keyword classification confidence;The keyword hit rate is
The hit keyword number, accounts for the ratio of the total number of the keyword;The Gauss duration is by the voice to be evaluated
The pronunciation duration of the hit keyword determines;
According to the evaluation and test feature, the evaluation result of the voice to be evaluated is determined.
9. a kind of speech evaluating device characterized by comprising
Data capture unit, for obtaining voice to be evaluated, and the keyword as evaluating standard;
Speech detection unit whether there is sound bite corresponding with the keyword for detecting in the voice to be evaluated,
Obtain testing result;
Evaluation result determination unit, for determining the evaluation result of the voice to be evaluated according to the testing result.
10. device according to claim 9, which is characterized in that the speech detection unit includes:
Text identification unit, for identification voice to be evaluated, the text information after being identified;
Text matches unit obtains matching result, the matching for being matched the keyword with the text information
The result shows that the voice to be evaluated to the keyword correspond to sound bite comprising situation.
11. device according to claim 10, which is characterized in that the text identification unit includes:
Acoustic feature extraction unit, for extracting the acoustic feature of the voice to be evaluated;
First acoustics identification model predicting unit is obtained for the acoustic feature to be inputted to the first preset acoustics identification model
The corresponding text information of the voice to be evaluated exported to the first acoustics identification model.
12. device according to claim 11, which is characterized in that the speech detection unit further include:
Global hidden layer feature acquiring unit, what the hidden layer for obtaining the first acoustics identification model exported, to the acoustics
Hidden layer after Feature Conversion is averaged acoustic feature;
First keyword classification device predicting unit, for by the hidden layer be averaged acoustic feature and the keyword term vector it is special
Sign inputs the first preset keyword classification device, obtains the voice to be evaluated of the first keyword classification device output to the pass
Keyword and non-key word correspond to the classification results of sound bite;
The first keyword classification device is to be turned with the acoustic feature of voice training data through the hidden layer of the first acoustics identification model
The be averaged term vector feature of acoustic feature and the keyword of hidden layer after changing is training sample, with voice training data to institute
It states keyword and non-key word corresponds to the classification annotation result of sound bite and obtains for sample label training.
13. according to the described in any item devices of claim 9-11, which is characterized in that the evaluation result determination unit includes:
First evaluation and test characteristics determining unit, for determining that evaluation and test feature, the evaluation and test feature include life according to the testing result
Middle keyword, hit the confidence level of keyword, keyword hit rate and hit any one in the Gauss duration of keyword or
It is multinomial;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit keyword
Confidence level be the first acoustics identification model to it is described hit keyword recognition confidence;The keyword hit rate is
The hit keyword number, accounts for the ratio of the total number of the keyword;The Gauss duration is by the voice to be evaluated
The pronunciation duration of the hit keyword determines;
First evaluation and test characteristic processing unit, for determining the evaluation result of the voice to be evaluated according to the evaluation and test feature.
14. device according to claim 12, which is characterized in that the evaluation result determination unit includes:
Second evaluation and test characteristics determining unit, for determining that evaluation and test feature, the evaluation and test feature include life according to the testing result
Middle keyword, hit the confidence level of keyword, keyword hit rate and hit any one in the Gauss duration of keyword or
It is multinomial;
The hit keyword is the presence of sound bite corresponding with keyword in the voice to be evaluated;The hit keyword
Confidence level be the first keyword classification device to it is described hit keyword classification confidence;The keyword hit rate is
The hit keyword number, accounts for the ratio of the total number of the keyword;The Gauss duration is by the voice to be evaluated
The pronunciation duration of the hit keyword determines;
Second evaluation and test characteristic processing unit, for determining the evaluation result of the voice to be evaluated according to the evaluation and test feature.
15. a kind of speech evaluating equipment, which is characterized in that including memory and processor;
The memory, for storing program;
The processor is realized for executing described program such as speech evaluating method of any of claims 1-8
Each step.
16. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed
When device executes, each step such as speech evaluating method of any of claims 1-8 is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811073869.3A CN109192224B (en) | 2018-09-14 | 2018-09-14 | Voice evaluation method, device and equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811073869.3A CN109192224B (en) | 2018-09-14 | 2018-09-14 | Voice evaluation method, device and equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109192224A true CN109192224A (en) | 2019-01-11 |
CN109192224B CN109192224B (en) | 2021-08-17 |
Family
ID=64910988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811073869.3A Active CN109192224B (en) | 2018-09-14 | 2018-09-14 | Voice evaluation method, device and equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109192224B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109215632A (en) * | 2018-09-30 | 2019-01-15 | 科大讯飞股份有限公司 | A kind of speech evaluating method, device, equipment and readable storage medium storing program for executing |
CN109887498A (en) * | 2019-03-11 | 2019-06-14 | 西安电子科技大学 | Scoring method of polite expressions at highway entrances |
CN109979482A (en) * | 2019-05-21 | 2019-07-05 | 科大讯飞股份有限公司 | A kind of evaluating method and device for audio |
CN110322871A (en) * | 2019-05-30 | 2019-10-11 | 清华大学 | A kind of sample keyword retrieval method based on acoustics characterization vector |
CN111833853A (en) * | 2020-07-01 | 2020-10-27 | 腾讯科技(深圳)有限公司 | Voice processing method and device, electronic equipment and computer readable storage medium |
CN112289308A (en) * | 2020-10-23 | 2021-01-29 | 上海凯石信息技术有限公司 | Voice dictation scoring method and device and electronic equipment |
CN113658586A (en) * | 2021-08-13 | 2021-11-16 | 北京百度网讯科技有限公司 | Training method of voice recognition model, voice interaction method and device |
CN113793589A (en) * | 2020-05-26 | 2021-12-14 | 华为技术有限公司 | Speech synthesis method and device |
CN114155831A (en) * | 2021-12-06 | 2022-03-08 | 科大讯飞股份有限公司 | Voice evaluation method, related equipment and readable storage medium |
CN114974214A (en) * | 2022-04-28 | 2022-08-30 | 北京明略昭辉科技有限公司 | Voice keyword detection method and device, electronic equipment and storage medium |
CN115171664A (en) * | 2022-07-05 | 2022-10-11 | 小米汽车科技有限公司 | Voice awakening method and device, intelligent voice equipment, vehicle and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559881A (en) * | 2013-11-08 | 2014-02-05 | 安徽科大讯飞信息科技股份有限公司 | Language-irrelevant key word recognition method and system |
CN104143328A (en) * | 2013-08-15 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Method and device for detecting keywords |
CN104810017A (en) * | 2015-04-08 | 2015-07-29 | 广东外语外贸大学 | Semantic analysis-based oral language evaluating method and system |
CN105741831A (en) * | 2016-01-27 | 2016-07-06 | 广东外语外贸大学 | Spoken language evaluation method based on grammatical analysis and spoken language evaluation system |
CN106856092A (en) * | 2015-12-09 | 2017-06-16 | 中国科学院声学研究所 | Chinese speech keyword retrieval method based on feedforward neural network language model |
CN107230475A (en) * | 2017-05-27 | 2017-10-03 | 腾讯科技(深圳)有限公司 | A kind of voice keyword recognition method, device, terminal and server |
CN108052504A (en) * | 2017-12-26 | 2018-05-18 | 科大讯飞股份有限公司 | Mathematics subjective item answers the structure analysis method and system of result |
JP2018081294A (en) * | 2016-11-10 | 2018-05-24 | 日本電信電話株式会社 | Acoustic model learning device, speech recognition device, acoustic model learning method, speech recognition method, and program |
-
2018
- 2018-09-14 CN CN201811073869.3A patent/CN109192224B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104143328A (en) * | 2013-08-15 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Method and device for detecting keywords |
CN103559881A (en) * | 2013-11-08 | 2014-02-05 | 安徽科大讯飞信息科技股份有限公司 | Language-irrelevant key word recognition method and system |
CN104810017A (en) * | 2015-04-08 | 2015-07-29 | 广东外语外贸大学 | Semantic analysis-based oral language evaluating method and system |
CN106856092A (en) * | 2015-12-09 | 2017-06-16 | 中国科学院声学研究所 | Chinese speech keyword retrieval method based on feedforward neural network language model |
CN105741831A (en) * | 2016-01-27 | 2016-07-06 | 广东外语外贸大学 | Spoken language evaluation method based on grammatical analysis and spoken language evaluation system |
JP2018081294A (en) * | 2016-11-10 | 2018-05-24 | 日本電信電話株式会社 | Acoustic model learning device, speech recognition device, acoustic model learning method, speech recognition method, and program |
CN107230475A (en) * | 2017-05-27 | 2017-10-03 | 腾讯科技(深圳)有限公司 | A kind of voice keyword recognition method, device, terminal and server |
CN108052504A (en) * | 2017-12-26 | 2018-05-18 | 科大讯飞股份有限公司 | Mathematics subjective item answers the structure analysis method and system of result |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109215632A (en) * | 2018-09-30 | 2019-01-15 | 科大讯飞股份有限公司 | A kind of speech evaluating method, device, equipment and readable storage medium storing program for executing |
CN109887498A (en) * | 2019-03-11 | 2019-06-14 | 西安电子科技大学 | Scoring method of polite expressions at highway entrances |
CN109979482A (en) * | 2019-05-21 | 2019-07-05 | 科大讯飞股份有限公司 | A kind of evaluating method and device for audio |
CN109979482B (en) * | 2019-05-21 | 2021-12-07 | 科大讯飞股份有限公司 | Audio evaluation method and device |
CN110322871A (en) * | 2019-05-30 | 2019-10-11 | 清华大学 | A kind of sample keyword retrieval method based on acoustics characterization vector |
CN113793589A (en) * | 2020-05-26 | 2021-12-14 | 华为技术有限公司 | Speech synthesis method and device |
CN111833853B (en) * | 2020-07-01 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Voice processing method and device, electronic equipment and computer readable storage medium |
CN111833853A (en) * | 2020-07-01 | 2020-10-27 | 腾讯科技(深圳)有限公司 | Voice processing method and device, electronic equipment and computer readable storage medium |
CN112289308A (en) * | 2020-10-23 | 2021-01-29 | 上海凯石信息技术有限公司 | Voice dictation scoring method and device and electronic equipment |
CN113658586A (en) * | 2021-08-13 | 2021-11-16 | 北京百度网讯科技有限公司 | Training method of voice recognition model, voice interaction method and device |
CN113658586B (en) * | 2021-08-13 | 2024-04-09 | 北京百度网讯科技有限公司 | Training method of voice recognition model, voice interaction method and device |
CN114155831A (en) * | 2021-12-06 | 2022-03-08 | 科大讯飞股份有限公司 | Voice evaluation method, related equipment and readable storage medium |
CN114974214A (en) * | 2022-04-28 | 2022-08-30 | 北京明略昭辉科技有限公司 | Voice keyword detection method and device, electronic equipment and storage medium |
CN115171664A (en) * | 2022-07-05 | 2022-10-11 | 小米汽车科技有限公司 | Voice awakening method and device, intelligent voice equipment, vehicle and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109192224B (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109192224A (en) | A kind of speech evaluating method, device, equipment and readable storage medium storing program for executing | |
JP6902010B2 (en) | Audio evaluation methods, devices, equipment and readable storage media | |
CN107274916B (en) | Method and device for operating audio/video files based on voiceprint information | |
CN105938716B (en) | A kind of sample copying voice automatic testing method based on the fitting of more precision | |
CN104900235B (en) | Method for recognizing sound-groove based on pitch period composite character parameter | |
CN110085261A (en) | A kind of pronunciation correction method, apparatus, equipment and computer readable storage medium | |
CN108305616A (en) | A kind of audio scene recognition method and device based on long feature extraction in short-term | |
CN105374352B (en) | A kind of voice activation method and system | |
CN108520753B (en) | Speech Lie Detection Method Based on Convolutional Bidirectional Long Short-Term Memory Network | |
CN105957531B (en) | Method and device for extracting speech content based on cloud platform | |
CN111640456B (en) | Method, device and equipment for detecting overlapping sound | |
CN106683666B (en) | A Domain Adaptive Method Based on Deep Neural Network | |
CN102915729B (en) | Speech keyword spotting system and system and method of creating dictionary for the speech keyword spotting system | |
CN108711421A (en) | A kind of voice recognition acoustic model method for building up and device and electronic equipment | |
CN108899033B (en) | Method and device for determining speaker characteristics | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN106941008A (en) | It is a kind of that blind checking method is distorted based on Jing Yin section of heterologous audio splicing | |
CN103971700A (en) | Voice monitoring method and device | |
US9799325B1 (en) | Methods and systems for identifying keywords in speech signal | |
CN109376264A (en) | A kind of audio-frequency detection, device, equipment and computer readable storage medium | |
CN110459242A (en) | Change of voice detection method, terminal and computer readable storage medium | |
CN112017690B (en) | Audio processing method, device, equipment and medium | |
CN110223678A (en) | Audio recognition method and system | |
CN111951825A (en) | Pronunciation evaluation method, medium, device and computing equipment | |
CN104700831B (en) | The method and apparatus for analyzing the phonetic feature of audio file |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |