Summary of the invention
It is an object of the invention in view of the above shortcomings of the prior art, propose a kind of highway mouth term of courtesy scoring
System, to realize the intelligence to fee-collector's voice monitoring, the supervision and assessment that permit ease of administration person works to fee-collector.
To achieve the above object, the present invention includes:
(1) select m term of courtesy of highway mouth fee-collector as keyword, selection n people as enunciator, everyone
It is complete to each keyword and clearly say x times, it always there are m × n × x wav file as corpus library file;
(2) keyword models and the parallel network model of Filler model are constructed:
Preemphasis successively 2a) is carried out to the corpus library file of each keyword, framing adds the pretreatment of Hamming window, obtain one
The voice data of one frame of frame extracts 24 Jan Vermeer frequency cepstral coefficient MFCC as characteristic parameter from the voice data;Using
Baum-Welch algorithm is trained this feature parameter, obtains the Hidden Markov Model HMM parameter model of the keyword;
2b) using the predictable non-courtesy speech syllable of highway as non-key word, use and 2a) identical method establishes
Non-key word HMM model;With method identical with 2a) to the single state HMM model of mute foundation, with non-key word model and mute
Model forms Filler model;
Keyword models and Filler model are arranged parallel 2c), form the network model without linguistic constraints;
(3) k people is chosen as test speaker person, everyone says one to the m voice segments comprising 1 to m keyword respectively
Time, k × m is always obtained!Wav file, as tone testing file;
(4) to tone testing file pass through and 2a) it is identical pretreatment and MFCC feature extraction, obtain tested speech feature
Parameter;After the weight for adjusting network edge in the parallel network model of (2) resulting keyword models and Filler model, use
Viterbi algorithm calculates the matching score of the tested speech characteristic parameter pair and each model in network model, retains matching
The higher s model of score, as keyword initial retrieval result;
(5) Viterbi algorithm is used, the higher model of s score and 2a in (4) resulting network model are calculated) institute
Keyword models matching score, temporally length is to s matching score normalization, and using result as corresponding to s
S confidence level of a model;One threshold value is set, the confidence level of more each model and the size of the threshold value are recycled, it is s times total,
It is higher than the threshold value lower than the model, confidence level is discarded if the threshold value if confidence level and just retains the model, which is just used as finally
Keyword retrieval result;
(6) by the random tone testing file of someone in file obtained by (3) behind (4) and (5), if including institute
M keyword of retrieval in need, then be judged to 100 points, if lacking y keyword, is judged to 100-y*100/m and divides, final
To being commented the work of personnel to score.
The present invention has the advantages that
1) application scenarios of the invention are the charge stations of highway mouth, build a set of term of courtesy points-scoring system, are realized
Intelligence to fee-collector's voice monitoring, supervision evaluation work of the permit ease of administration person to fee-collector;
2) present invention uses the voice keyword retrieval method based on HMM, has good robustness;
3) the present invention is based on after keyword initial retrieval as a result, realizing keyword recognition plus using confidence level, has
Higher retrieval accuracy;
4) it is that personnel is commented to score that the present invention, which strictly observes code of points, does not miss a term of courtesy, and leakage knowledge rate is lower.
Specific embodiment
With reference to the accompanying drawings and detailed description, the present invention is furture elucidated.
Referring to Fig.1, specific step is as follows for the present embodiment:
Step 1. acquires corpus library file.
Selected m term of courtesy of highway mouth fee-collector chooses n (n >=20) personal accomplishment enunciator as keyword,
Everyone is complete to each keyword and clearly says x times, always there are m × n × x wav file as corpus library file.
Step 2. constructs keyword and the parallel network model of Filler.
Preemphasis successively 2a) is carried out to the corpus library file of each keyword, framing adds the pretreatment of Hamming window:
Preemphasis 2a1) is carried out to original signal x (n) with single order high-pass digital filter, obtains preemphasis treated letter
Number are as follows:
Y (n)=x (n) -0.98x (n-1);
Framing 2a2) is carried out to preemphasis treated signal y (n) and adds Hamming window, framing is obtained and adds Hamming window treated
Signal are as follows:
2b) through 2a) pretreatment after obtain voice data one by one, extracted from the voice data 24 Jan Vermeers frequency
Rate cepstrum coefficient MFCC is as characteristic parameter;
2c) using Baum-Welch algorithm to 2b) resulting characteristic parameter is trained:
2c1) assume observation sequence O={ Ot, t=1,2 ..., T }, initial model λ=(π, A, B), if the initial model
State set is { Si, i=1,2 ..., N }, it is q in t moment statust, observe symbol are as follows:
V={ vk, k=1,2 ..., N },
In initial model λ: π={ πi, i=1,2 ..., N },
A={ aij, i=1,2 ..., N, j=1,2 ..., N },
B={ bjk, j=1,2 ..., N, k=1,2 ..., M },
πiIndicate initial state probabilities, aijIndicate that in the state of moment t be SiAnd it shifts at the t+1 moment as state SjIt is general
Rate, bjkIt indicates in state SjObserve symbol vkProbability;
2c2) in 2c1) hypothesis under, introduce two groups of probability variable εt(i, j) indicates that t moment is in state SiAnd the t+1 moment
In state SjProbability, γt(i) indicate that t moment is in state SiProbability, it may be assumed that
εt(i, j)=P (qt=Si,qt+1=Sj| O, λ),
γt(i)=P (O, qt=Si|λ);
2c3) by 2c2) introduced two groups of probability variables calculate one group of new parameter:
π′i=γ1(i),
2c4) by 2c3) resulting one group of new parameter π 'i, a 'ij, b 'j(k), revaluation obtains a new model:
λ '=(π ', A ', B '),
The probability P (O | λ ') of model λ ' generation observation sequence generates the probability P (O | λ) of observation sequence than initial model λ
It is big;
2c5) repeat 2c3) and 2c4), model parameter is continuously improved, until P (O | λ ') is no longer significantly increased, model at this time
λ '=(π ', A ', B ') is the Hidden Markov Model HMM parameterized template of the training keyword;
2d) using the predictable non-courtesy speech syllable of highway as non-key word, use and 2a) identical method establishes
Non-key word HMM model;With method identical with 2a) to the single state HMM model of mute foundation, with non-key word model and mute
Model forms Filler model;
Keyword models and Filler model are arranged parallel 2e), the network model without linguistic constraints are formed, such as Fig. 2 institute
Show.
Step 3. acquires tone testing file.
K personal accomplishment test speaker person is chosen, everyone says one to the m voice segments comprising 1 to m keyword respectively
Time, k × m is always obtained!Wav file, as tone testing file, wherein k > 5.
Step 4. keyword initial retrieval.
4a) tone testing file is successively passed through and 2a) identical pretreatment and and 2b) identical MFCC feature extraction,
Obtain tested speech characteristic parameter;
After the weight for 4b) adjusting network edge in the resulting parallel network model of step 2, using Viterbi algorithm, meter
Calculate 4a) resulting tested speech characteristic parameter is to the matching score of each model in network model:
4b1) under hypothesis identical with 2c1), if moment t is along a paths sequence Q={ q1,q2,…,qtAnd qt=Si
The maximum probability for generating observation sequence is δt(Si), introduce one group of intermediate variable
4b2) initialize 4b1) set by probability variable δt(Si) and intermediate variableAre as follows:
Moment t 4b3) is set along a paths sequence Q={ q1,q2,…,qtAnd qt=SjGenerate the maximum probability of observation sequence
For δt(Sj), introduce one group of intermediate variableIn 4a2) gained probability variable δt(Si) and intermediate variableBasis
On, obtain the maximum probability δ of observation sequencet(Sj) and intermediate variableAre as follows:
4b4) according to 4b3) resulting one group of intermediate variableRecursive calculation
4b5) by 4b3) resulting δT(Si), calculate the observation sequence at T moment and probability P ' (Q, the O | λ) of Model Matching and
State q 'T:
P ' (Q, O | λ)=max1≤i≤N[δT(Si)],
q′T=argmax1≤i≤N[δT(Si)],
P ' (Q, O | λ) is the matching score of observation sequence and model at this time;
4b6) merge 4b4) resulting one group of q '1,q′2,…,q′T-1And 4b4) resulting q 'T, obtain optimum state path
Sequence:
Q '={ q '1,q′2,…,q′T};
4c) retain 4b) in the higher s model of matching score, as keyword initial retrieval result.
Step 5. realizes keyword recognition with confidence level, obtains the final search result of keyword.
5a) use and 4b) identical Viterbi algorithm, calculate 4c) resulting higher keyword models pair of s score
The matching score of isolated word model, temporally length is to s matching score normalization, and normalized result respectively as right
It should be in s confidence level of s model;
One threshold value 5b) is set, and circulation compares 5a) size of resulting each model confidence and the threshold value, it is s times total,
It is higher than the threshold value lower than the model, confidence level is discarded if the threshold value if confidence level and just retains the model, the model of reservation is with regard to conduct
The search result of final keyword.
Step 6. completes scoring.
By the random tone testing file of someone in step 3 gained file after step 4 and step 5, if packet
Containing retrieval in need m keyword, then be judged to 100 points, if lacking y keyword, be judged to 100-y*100/m point, most
Obtain being commented the work of personnel to score eventually, wherein 0≤y≤m.
The above is only example of the present invention, does not constitute any limitation of the invention, it is clear that for
It, all may be without departing substantially from the principle of the invention, structure after having understood the content of present invention and principle for one of skill in the art
In the case where, carry out various modifications and change in form and details, but these modifications and variations based on inventive concept
Still within the scope of the claims of the present invention.