CN110310644A - Wisdom class board exchange method based on speech recognition - Google Patents
Wisdom class board exchange method based on speech recognition Download PDFInfo
- Publication number
- CN110310644A CN110310644A CN201910577869.5A CN201910577869A CN110310644A CN 110310644 A CN110310644 A CN 110310644A CN 201910577869 A CN201910577869 A CN 201910577869A CN 110310644 A CN110310644 A CN 110310644A
- Authority
- CN
- China
- Prior art keywords
- user
- signal
- information
- voice
- carried out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000004458 analytical method Methods 0.000 claims abstract description 24
- 230000004044 response Effects 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 12
- 238000006243 chemical reaction Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 12
- 238000009432 framing Methods 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 230000000877 morphologic effect Effects 0.000 claims description 5
- 238000010801 machine learning Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000011002 quantification Methods 0.000 claims description 3
- NGVDGCNFYWLIFO-UHFFFAOYSA-N pyridoxal 5'-phosphate Chemical compound CC1=NC=C(COP(O)(O)=O)C(C=O)=C1O NGVDGCNFYWLIFO-UHFFFAOYSA-N 0.000 claims description 2
- 230000009471 action Effects 0.000 abstract description 2
- 238000004904 shortening Methods 0.000 abstract 1
- 230000003993 interaction Effects 0.000 description 12
- 230000002452 interceptive effect Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000001149 cognitive effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention provides a kind of wisdom class board exchange method based on speech recognition, comprising: receive the voice signal of the first user;Voice signal is pre-processed, the first digital signal is obtained;Feature extraction is carried out to the first digital signal, obtains characteristic parameter;Characteristic parameter is decoded, optimal word Model sequence is obtained;Optimal word Model sequence constitutes the text information of voice signal;Semantic analysis is carried out to the text information of continuous speech signal, text instruction is obtained and user is intended to;According to text instruction, corresponding output result is instructed in display interface display text;It is intended to according to user, determines response message;Response message is subjected to conversion process, generate continuous speech and is exported.The courses of action for shortening user as a result, alleviate user to the operational load of wisdom class board, can carry out showing interface and voice answer-back simultaneously, greatly enhance user experience.
Description
Technical field
The present invention relates to a kind of processing method more particularly to a kind of wisdom class board exchange methods based on speech recognition.
Background technique
Network is fast-developing, and network-based application can help school's major occasion in administrative area to carry out Rich Media
The publication of information improves information in the propagation efficiency of school.Wisdom class board is an application vector in information technology environment, more
It is the important component of wisdom classroom and Intelligent campus construction.It is deployed in classroom doorway, has and shows curriculum schedule information, class
The purposes such as grade and the notice of school, cultural construction, notice information publication, attendance of swiping the card, information inquiry, family-school interaction message.
However, user can only carry out information interchange by traditional interactive mode and wisdom class board, it is inconvenient the problems such as enable user's body
It tests and has a greatly reduced quality.
Existing wisdom class board is mainly the mechanism for transmitting character code as information between user and equipment, Yong Hutong
Traditional input hardware equipment, such as the input equipment with touch sensible, remote controler, fuselage key are crossed, is all to rely on substantially
In hardware device, to simulate and realize mouse, to software sending, accurately operational order, computer carry out letter by the instruction of user
Breath handles and exports, and feeds back information to user in the form of character or picture.And user is using these traditional interaction sides
Formula carries out being in passive position when human-computer interaction, can not accomplish and machine natural interaction.
Traditional interactive mode needs to follow certain operating procedure, and input efficiency is low;Traditional interactive mode needs to use
Family carries out certain learning and Memory to application method, and cognitive load is higher.
Summary of the invention
The purpose of the present invention is in view of the drawbacks of the prior art, provide a kind of board interaction side, wisdom class based on speech recognition
Method needs user to carry out certain learning and Memory to application method to solve wisdom class board interactive mode in the prior art,
The caused higher problem of cognitive load.
To solve the above problems, the present invention provides a kind of wisdom class board exchange method based on speech recognition, the side
Method includes:
Receive the voice signal of the first user;
The voice signal is pre-processed, the first digital signal is obtained;
Feature extraction is carried out to first digital signal, obtains characteristic parameter;
The characteristic parameter is decoded, optimal word Model sequence is obtained;Described in the optimal word Model sequence is constituted
The text information of voice signal;
Semantic analysis is carried out to the text information of the continuous voice signal, text instruction is obtained and user is intended to;
According to the text instruction, corresponding output result is instructed in display interface display text;
It is intended to according to the user, determines response message;
The response message is subjected to conversion process, generate continuous speech and is exported.
In one possible implementation, before the method further include:
Receive the first voice authentication information of the first user;
Feature extraction is carried out to the first voice authentication information, obtains fisrt feature information;
The fisrt feature information is matched with the first template in reference model library;
After successful match, the fisrt feature information is handled, determines the first identity information of the first user;Institute
State the identity ID and the identity grade that the first identity information includes the first user;
The ID of the identity information and wisdom class board is bound.
In one possible implementation, described that the voice messaging is pre-processed, the first digital signal is obtained,
It specifically includes:
Sampling and quantification treatment are carried out to the voice messaging, obtain raw digital signal;
Preemphasis processing is carried out to the raw digital signal, obtains preemphasis voice signal;
Sub-frame processing is carried out to the preemphasis voice signal, obtains framing voice signal;
Windowing process is carried out to the framing voice signal, obtains the first digital signal.
In one possible implementation, the characteristic parameter includes: linear predictor coefficient LPCC, perception linear prediction
One of FACTOR P LP, mel-frequency cepstrum coefficient MFCC.
In one possible implementation, described that first digital signal is carried out when characteristic parameter is MFCC
Feature extraction obtains characteristic parameter, specifically includes:
The first digital signal is changed into frequency-region signal from time-domain signal using fast Fourier transform FFT;
Convolution is carried out according to the triangular filter group that Mel scale is distributed to the frequency-region signal;
According to convolution results, the vector constituted to the output of each triangular filter in the triangular filter group is carried out from remaining
String changes DCT;
Top n coefficient in the DCT is taken, the characteristic parameter of the first digital signal is obtained.
In one possible implementation, described that the characteristic parameter is decoded, optimal word Model sequence is obtained,
It specifically includes:
Respectively by acoustic model, language model and pronunciation dictionary, to the reference model library of the characteristic parameter and built in advance
In the similitude of reference template give a mark, obtain corresponding first score of acoustic model, speech model corresponding second
Divide third score corresponding with pronunciation dictionary;
Data fusion is weighted to first score, second score and the third score, obtains optimal word
Model sequence, to be made up of the text information of voice the optimal word Model sequence.
In one possible implementation, the text information to the continuous voice signal carries out semantic point
Analysis obtains text instruction and user is intended to specifically include:
Morphological analysis is carried out to the text information, the text information is divided into multiple words;
Syntactic analysis is carried out with the relationship between the multiple words of determination to the multiple word, generates the syntactic structure of sentence;
According to the syntactic structure, text instruction is obtained;
Using machine learning, intention analysis is carried out to the syntactic structure, determines the corresponding user's meaning of the syntactic structure
Figure.
In one possible implementation, described that the response message is subjected to conversion process, generate continuous speech simultaneously
Output, specifically includes:
By literary periodicals device, the response message is converted into continuous speech and is exported.
In one possible implementation, after the method further include:
Receive the second voice authentication information of second user;
Feature extraction is carried out to the second voice authentication information, obtains second feature information;
The second feature information is matched with the second template in reference model library;
After successful match, the second feature information is handled, determines the second identity information of second user;Institute
State the identity ID and identity grade that the second identity information includes second user;
The identity grade of the second user is compared with the identity grade of first user;
When the identity grade of the second user be greater than first user identity grade when, release wisdom class board ID with
The binding of first identity information, and the ID of wisdom class board and second identity information are bound.
In one possible implementation, the method also includes:
When no operating time being equal to preset time, sleep signal or screen locking signal are generated;
According to the sleep signal perhaps screen locking signal into dormant state or screen lock state.
By applying the wisdom class board exchange method provided by the invention based on speech recognition, the operation road of user is shortened
Diameter alleviates the load of operation of the user to wisdom class board, can carry out showing interface and voice answer-back simultaneously, greatly enhance
User experience.
Detailed description of the invention
Fig. 1 is the wisdom class board exchange method flow chart provided in an embodiment of the present invention based on speech recognition.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that for just
Part relevant to related invention is illustrated only in description, attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Wisdom class board is generally deployed in classroom doorway, has notice, the text for showing curriculum schedule information, class and school
Change the purposes such as construction, notice information publication, attendance of swiping the card, information inquiry, family's interaction message.The interactive system of wisdom class board by
Multiple block combiners carry out speech recognition together, to the voice signal of user's input, to realize through voice and user
Interaction.
Fig. 1 is the wisdom class board exchange method flow chart provided in an embodiment of the present invention based on speech recognition.This method
Executing subject is wisdom class board, as shown in Figure 1, method includes the following steps:
Step 101, the voice signal of the first user is received.
Wherein, the first user interacted with wisdom class board can pass through and brush personal IC card, recognition of face or voice
The modes such as identification carry out the identification of identity.
Below by taking speech recognition as an example, wisdom class board is illustrated with how the first user carries out identification.
Wisdom class board has speech reception module, and for receiving the voice messaging of user, which be can be
Microphone is also possible to microphone array.To realize the acquisition to voice messaging.
Specifically, firstly, receiving the first voice authentication information of the first user;Then, to the first voice authentication information into
Row feature extraction obtains fisrt feature information;Then, by the first template progress in fisrt feature information and reference model library
Match;Then, after successful match, fisrt feature information is handled, determines the first identity information of the first user;First body
Part information includes the identity ID and identity grade of the first user;Finally, the ID of identity information and wisdom class board is bound.
For example, speech reception module receives the information of the first user: " hello, I is Xiao Ming ", then to the information into
Row pretreatment, for example after sampling, quantization, preemphasis, framing, windowing process, carry out feature extraction, extract fisrt feature information
Afterwards, which is matched with the first template in reference model library.Wherein, there are multiple moulds in reference model library
Plate, each template correspond to a user, preset wake-up word in the first template, the content of the wake-up word of user does not limit, can
To be " hello ", " " etc., after successful match, into standby mode, meanwhile, which is carried out further
Processing, identifies the identity information of interactive people, which includes the identity ID and identity grade of the first user.For example, should
Also there is the preset voice messaging of the multiple users registered in advance, by presetting this feature parameter with this in reference model library
The characteristic parameter of voice messaging matched, so that it is determined that the identity information of user.
Wherein, grade can be divided according to the identity of student, teacher.Grade between student is identical, teacher etc.
Grade is higher than student.
Further, the grade that can also set the location of the wisdom class board student of corresponding class is higher than it
The grade of his student.For example, the wisdom class board that wisdom class board is six grades one class, then the grade of six grades one class of student is higher than
The grade of other classes student.
Further, the grade that can also set class leader is higher than the grade of non-class leader, for example, the grade of squad leader is high
Other students of Yu Benban.Those skilled in the art can according to need, and preset each grade, and the application does not limit this.
After identity validation and binding, interface then enters waiting command status, voice behaviour from suspend mode or screen lock state
Make the voice messaging for only handling current interaction people, i.e., one-to-one interactive mode, contact action is not limited by identity.
Step 102, voice signal is pre-processed, obtains the first digital signal.
Specifically, step 102 specifically includes:
Sampling and quantification treatment are carried out to voice signal, obtain raw digital signal;
Preemphasis processing is carried out to raw digital signal, obtains preemphasis voice signal;
Sub-frame processing is carried out to preemphasis voice signal, obtains framing voice signal;
Windowing process is carried out to framing voice signal, obtains the first digital signal.
Wherein, before carrying out preemphasis to voice signal, it is also necessary to be sampled and be quantified, the purpose of sampling is to mould
Quasi- audio signal waveform is split, and the purpose of quantization is the amplitude measured with shaping value storage sampling.To voice signal into
The purpose of row preemphasis is in order to which the high frequency section to voice aggravates, and the influence of removal lip radiation increases the height of voice
Frequency division resolution.Preemphasis generally is realized by the way that transmission function is single order FIR high-pass digital filter, and wherein a is preemphasis system
Number, 0.9 < a < 1.0.It is y (n) by preemphasis treated result if the speech sample value at n moment is x (n))=x (n)-ax
(n-1), a=0.98 is taken here.
After carrying out the processing of preemphasis digital filtering, here is exactly to carry out sub-frame processing, and voice signal has short-term stationarity
(10--30ms in it is considered that voice signal approximation constant) can be divided into voice signal some short sections thus to carry out
Processing, here it is framing, the framing of voice signal is the method that is weighted using the window of moveable finite length come reality
It is existing.General frame number per second is about 33-100 frame, is depended on the circumstances.General framing method is the method for overlapping segmentation, previous
Frame and the overlapping part of a later frame are known as frame shifting, and frame, which is moved, is generally 0-0.5 with the ratio of frame length.
Adding window is usually to add Hamming window or rectangular window, to increase the decaying to high fdrequency component.
Step 103, feature extraction is carried out to the first digital signal, obtains characteristic parameter.
Specifically, different characteristic parameters can be extracted according to the different purposes of the first digital signal.Wherein, feature is joined
The main linear predictive coefficient (Linear Predictive Cepstral Coding, LPCC) of number, perception linear predictor coefficient
(Perceptual Linear Predictive, PLP), mel-frequency cepstrum coefficient (Mel Frequency Cepstrum
Coefficient, MFCC).
When characteristic parameter is MFCC, step 103 is specifically included:
Firstly, using fast Fourier transform (Fast Fourier Transformation, FFT) by the first digital signal
It is changed into frequency-region signal from time-domain signal;Then, the triangular filter group that frequency-region signal is distributed according to Mel scale is rolled up
Product;Then, according to convolution results, the vector that the output of each triangular filter in triangular filter group is constituted become from cosine
Change (Discrete Cosine Transform, DCT);Finally, taking top n coefficient in DCT, the spy of the first digital signal is obtained
Levy parameter.
Step 104, characteristic parameter is decoded, obtains optimal word Model sequence;Optimal word Model sequence constitutes voice
The text information of signal.
Specifically, step 104 includes:
Respectively by acoustic model, language model and pronunciation dictionary, in the reference model library of characteristic parameter and built in advance
The similitude of reference template is given a mark, obtain corresponding first score of acoustic model, corresponding second score of speech model and
The corresponding third score of pronunciation dictionary;
Data fusion is weighted to the first score, the second score and third score, obtains optimal word Model sequence, with logical
Cross the text information that optimal word Model sequence constitutes voice.
Specifically, having speech recognition decoder and processor in wisdom class board, above-mentioned pretreatment, characteristic extraction procedure can
Being carried out by processor.Subsequent, the characteristic parameter of voice can be sent to speech recognition decoder by processor, by voice
Identify that decoder carries out similarity measurement comparison to the reference template in these characteristic parameters and reference model library.
Wherein, which may include signal processing module and characteristic extracting module, pass through the two modules respectively, into
Row pretreatment and feature extraction.
Wherein, acoustic model is the knowledge to the difference of acoustics, phonetics, the variable of environment, speaker's gender, accent etc.
It indicates, it is therefore an objective to convert orderly phoneme for the feature vector of all frames extracted through MFCC and export.
Language model is the representation of knowledge to one group of word Sequence composition, indicates the probability that a certain word sequence occurs, generally adopts
With chain rule, the probability of a sentence is disassembled the product of the probability of each word in growing up to be a useful person.
Pronunciation dictionary, comprising from the mapping between word all factor, effect is for connection to acoustic model and language model
's.Pronunciation dictionary includes the set for the word that wisdom class board can be handled, and designates its pronunciation.
Weighted fusion algorithm is weighted and averaged to multi-source redundancy, and fusion value is as a result used as, and is a kind of directly right
The method that data source is operated.
Acoustic model, language model and pronunciation dictionary etc. can be respectively to the reference moulds in characteristic parameter and reference model library
Plate marking, weighted fusion algorithm is finally carried out on score domain, provides last court verdict, that is, obtain it is a series of to
The optimal word Model sequence for describing input speech signal, to obtain the text information of voice.
Step 105, semantic analysis is carried out to the text information of continuous voice signal, obtains text instruction and user's meaning
Figure.
Specifically, step 105 includes: to be divided into text information multiple firstly, to text information progress morphological analysis
Word;Then, syntactic analysis is carried out with the relationship between the multiple words of determination to multiple words, generates the syntactic structure of sentence;Then, root
According to syntactic structure, text instruction is obtained;Finally, carrying out intention analysis using machine learning to syntactic structure, determining syntactic structure
Corresponding intention type;Finally, determining that user is intended to according to type is intended to.
Wherein, when carrying out morphological analysis, sentence is divided into each word using segmenter, relates generally to participle, part of speech mark
The operation such as note.In order to improve the quality of morphological analysis, it is also necessary to be added Entity recognition, such as some intrinsic place names, name with
And other proper names etc..In order to determine the relationship in sentence between vocabulary, syntactic analysis need to be used, the input of syntactic analysis is one
Word string, output are the syntactic structures of sentence.It is finally realized using the method for machine learning and is intended to analysis, it is intended that the work master of analysis
If sentence, which is assigned to corresponding intention type, by the method classified determines the intention of user, to mention according to type is intended to
The response efficiency of high wisdom class board.
Step 106, according to text instruction, corresponding output result is instructed in display interface display text.
Specifically, the text information by semantic analysis is converted into the text instruction of one or more (a series of), and adjust
It is operated with text instruction's interface.Such as first user say: " what class on today? " by speech recognition decoder and semanteme
After the processing of analysis module, the corresponding instruction of output is recalled into the school timetable of today, and present on the display interface of wisdom class board.
Step 107, it is intended to according to user, determines response message.
Step 108, response message is subjected to conversion process, generates continuous speech and exported.
Specifically, determining response by the dialog manager in wisdom class board after obtaining user's intention according to semantic analysis
Then information by the conversion of literary periodicals device, the continuous speech that response message changes into high quality, high naturalness is exported.
Literary periodicals device, which is mainly realized, is converted to audio-frequency information for response message.
For example, the first user says: " what class on today? ", the user that gets is intended to are as follows: the first user wonders that one is whole
It course arrangement.Dialog manager determines response message, which is text information, and literary periodicals device can be by text
Information changes into the continuous speech output of high quality, high naturalness.For example, " you, which assign to today 8: 30 at 9 points, will go up mathematics in 15 minutes
Class ... ".
Further, before step 101 or later, further includes:
Receive the second voice authentication information of second user;
Feature extraction is carried out to the second voice authentication information, obtains second feature information;
Second feature information is matched with the second template in reference model library;
After successful match, second feature information is handled, determines the second identity information of second user;Second body
Part information includes the identity ID and identity grade of second user;
The identity grade of second user is compared with the identity grade of the first user;
When the identity grade of second user is greater than the identity grade of the first user, wisdom class board ID and the first identity are released
The binding of information, and the ID of wisdom class board is bound with the second identity information.
Specifically, just obtaining and differentiating new identity information in real time after waking up from interface, it is assumed that current interaction people still in
It in equipment interactive process, is accessed if any other interaction persons, grade comparison can be carried out to the identity of midway connector, if grade
Lower or grade is identical, and wisdom class board will continue to handle the voice messaging of current interaction people, if higher ranked, the board meeting of wisdom class
The identity information of higher ranked person is bound, and switchs to the voice messaging for handling higher ranked person, this facilitates class leader or teacher
Control and management to wisdom class board.
Further, method further include:
When no operating time being equal to preset time, sleep signal or screen locking signal are generated;
According to sleep signal perhaps screen locking signal into dormant state or screen lock state.
Specifically, presetting no operation locks screen automatically the time, when user stops interaction with equipment and reaches the preset time
Afterwards, interface reenters suspend mode or screen lock state, waits wake-up next time.
By applying the wisdom class board exchange method provided by the invention based on speech recognition, the operation road of user is shortened
Diameter alleviates the load of operation of the user to wisdom class board, can carry out showing interface and voice answer-back simultaneously, greatly enhance
User experience.
Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure
Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate
The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description.
These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.
Professional technician can use different methods to achieve the described function each specific application, but this realization
It should not be considered as beyond the scope of the present invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processor
The combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory
(ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field
In any other form of storage medium well known to interior.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
1. a kind of wisdom class board exchange method based on speech recognition, which is characterized in that the described method includes:
Receive the voice signal of the first user;
The voice signal is pre-processed, the first digital signal is obtained;
Feature extraction is carried out to first digital signal, obtains characteristic parameter;
The characteristic parameter is decoded, optimal word Model sequence is obtained;The optimal word Model sequence constitutes the voice
The text information of signal;
Semantic analysis is carried out to the text information of the continuous voice signal, text instruction is obtained and user is intended to;
According to the text instruction, corresponding output result is instructed in display interface display text;
It is intended to according to the user, determines response message;
The response message is subjected to conversion process, generate continuous speech and is exported.
2. the method according to claim 1, wherein before the method further include:
Receive the first voice authentication information of the first user;
Feature extraction is carried out to the first voice authentication information, obtains fisrt feature information;
The fisrt feature information is matched with the first template in reference model library;
After successful match, the fisrt feature information is handled, determines the first identity information of the first user;Described
One identity information include the first user identity ID and the identity grade;
The ID of the identity information and wisdom class board is bound.
3. obtaining the method according to claim 1, wherein described pre-process the voice messaging
One digital signal, specifically includes:
Sampling and quantification treatment are carried out to the voice messaging, obtain raw digital signal;
Preemphasis processing is carried out to the raw digital signal, obtains preemphasis voice signal;
Sub-frame processing is carried out to the preemphasis voice signal, obtains framing voice signal;
Windowing process is carried out to the framing voice signal, obtains the first digital signal.
4. the method according to claim 1, wherein the characteristic parameter includes: linear predictor coefficient LPCC, sense
Know one of linear predictor coefficient PLP, mel-frequency cepstrum coefficient MFCC.
5. according to the method described in claim 4, it is characterized in that, when characteristic parameter be MFCC when, it is described to it is described first number
Word signal carries out feature extraction, obtains characteristic parameter, specifically includes:
The first digital signal is changed into frequency-region signal from time-domain signal using fast Fourier transform FFT;
Convolution is carried out according to the triangular filter group that Mel scale is distributed to the frequency-region signal;
According to convolution results, the vector that the output of each triangular filter in the triangular filter group is constituted become from cosine
Change DCT;
Top n coefficient in the DCT is taken, the characteristic parameter of the first digital signal is obtained.
6. being obtained optimal the method according to claim 1, wherein described be decoded the characteristic parameter
Word Model sequence, specifically includes:
Respectively by acoustic model, language model and pronunciation dictionary, in the reference model library of the characteristic parameter and built in advance
The similitude of reference template is given a mark, obtain corresponding first score of acoustic model, corresponding second score of speech model and
The corresponding third score of pronunciation dictionary;
Data fusion is weighted to first score, second score and the third score, obtains optimal word model
Sequence, to be made up of the text information of voice the optimal word Model sequence.
7. the method according to claim 1, wherein the text information to the continuous voice signal into
Row semantic analysis obtains text instruction and user is intended to specifically include:
Morphological analysis is carried out to the text information, the text information is divided into multiple words;
Syntactic analysis is carried out with the relationship between the multiple words of determination to the multiple word, generates the syntactic structure of sentence;
According to the syntactic structure, text instruction is obtained;
Using machine learning, intention analysis is carried out to the syntactic structure, determines that the corresponding user of the syntactic structure is intended to.
8. being generated the method according to claim 1, wherein described carry out conversion process for the response message
Continuous speech simultaneously exports, and specifically includes:
By literary periodicals device, the response message is converted into continuous speech and is exported.
9. according to the method described in claim 2, it is characterized in that, after the method further include:
Receive the second voice authentication information of second user;
Feature extraction is carried out to the second voice authentication information, obtains second feature information;
The second feature information is matched with the second template in reference model library;
After successful match, the second feature information is handled, determines the second identity information of second user;Described
Two identity informations include the identity ID and identity grade of second user;
The identity grade of the second user is compared with the identity grade of first user;
When the identity grade of the second user be greater than first user identity grade when, release wisdom class board ID with it is described
The binding of first identity information, and the ID of wisdom class board and second identity information are bound.
10. the method according to claim 1, wherein the method also includes:
When no operating time being equal to preset time, sleep signal or screen locking signal are generated;
According to the sleep signal perhaps screen locking signal into dormant state or screen lock state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910577869.5A CN110310644A (en) | 2019-06-28 | 2019-06-28 | Wisdom class board exchange method based on speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910577869.5A CN110310644A (en) | 2019-06-28 | 2019-06-28 | Wisdom class board exchange method based on speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110310644A true CN110310644A (en) | 2019-10-08 |
Family
ID=68079256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910577869.5A Pending CN110310644A (en) | 2019-06-28 | 2019-06-28 | Wisdom class board exchange method based on speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110310644A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110971875A (en) * | 2019-12-04 | 2020-04-07 | 广州云蝶科技有限公司 | Control method and device combining recording and broadcasting system and IP broadcasting system |
CN111559675A (en) * | 2020-05-22 | 2020-08-21 | 云知声智能科技股份有限公司 | Method for controlling elevator by voice |
CN111904806A (en) * | 2020-07-30 | 2020-11-10 | 云知声智能科技股份有限公司 | Blind guiding system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110258285A1 (en) * | 2005-02-25 | 2011-10-20 | Lightningcast LLC. | Inserting Branding Elements |
CN105335398A (en) * | 2014-07-18 | 2016-02-17 | 华为技术有限公司 | Service recommendation method and terminal |
CN107767875A (en) * | 2017-10-17 | 2018-03-06 | 深圳市沃特沃德股份有限公司 | Sound control method, device and terminal device |
CN107895578A (en) * | 2017-11-15 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | Voice interactive method and device |
CN108153881A (en) * | 2017-12-26 | 2018-06-12 | 重庆大争科技有限公司 | Teaching monitoring and managing method based on Intelligent campus management |
CN108959520A (en) * | 2018-06-28 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | Searching method, device, equipment and storage medium based on artificial intelligence |
CN109036391A (en) * | 2018-06-26 | 2018-12-18 | 华为技术有限公司 | Audio recognition method, apparatus and system |
CN109741746A (en) * | 2019-01-31 | 2019-05-10 | 上海元趣信息技术有限公司 | Robot personalizes interactive voice algorithm, emotion communication algorithm and robot |
-
2019
- 2019-06-28 CN CN201910577869.5A patent/CN110310644A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110258285A1 (en) * | 2005-02-25 | 2011-10-20 | Lightningcast LLC. | Inserting Branding Elements |
CN105335398A (en) * | 2014-07-18 | 2016-02-17 | 华为技术有限公司 | Service recommendation method and terminal |
CN107767875A (en) * | 2017-10-17 | 2018-03-06 | 深圳市沃特沃德股份有限公司 | Sound control method, device and terminal device |
CN107895578A (en) * | 2017-11-15 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | Voice interactive method and device |
CN108153881A (en) * | 2017-12-26 | 2018-06-12 | 重庆大争科技有限公司 | Teaching monitoring and managing method based on Intelligent campus management |
CN109036391A (en) * | 2018-06-26 | 2018-12-18 | 华为技术有限公司 | Audio recognition method, apparatus and system |
CN108959520A (en) * | 2018-06-28 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | Searching method, device, equipment and storage medium based on artificial intelligence |
CN109741746A (en) * | 2019-01-31 | 2019-05-10 | 上海元趣信息技术有限公司 | Robot personalizes interactive voice algorithm, emotion communication algorithm and robot |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110971875A (en) * | 2019-12-04 | 2020-04-07 | 广州云蝶科技有限公司 | Control method and device combining recording and broadcasting system and IP broadcasting system |
CN110971875B (en) * | 2019-12-04 | 2021-02-05 | 广州云蝶科技有限公司 | Control method and device combining recording and broadcasting system and IP broadcasting system |
CN111559675A (en) * | 2020-05-22 | 2020-08-21 | 云知声智能科技股份有限公司 | Method for controlling elevator by voice |
CN111904806A (en) * | 2020-07-30 | 2020-11-10 | 云知声智能科技股份有限公司 | Blind guiding system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Darabkh et al. | An efficient speech recognition system for arm‐disabled students based on isolated words | |
Ahsiah et al. | Tajweed checking system to support recitation | |
Cao et al. | [Retracted] Optimization of Intelligent English Pronunciation Training System Based on Android Platform | |
CN110310644A (en) | Wisdom class board exchange method based on speech recognition | |
Liu et al. | AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning | |
Laurinčiukaitė et al. | Lithuanian Speech Corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode | |
Wang et al. | Research on correction method of spoken pronunciation accuracy of AI virtual English reading | |
Wang | Detecting pronunciation errors in spoken English tests based on multifeature fusion algorithm | |
CN108364655A (en) | Method of speech processing, medium, device and computing device | |
CN113539239B (en) | Voice conversion method and device, storage medium and electronic equipment | |
Habbash et al. | Recognition of Arabic accents from English spoken speech using deep learning approach | |
Vaquero et al. | E-inclusion technologies for the speech handicapped | |
CN118038851B (en) | A multi-dialect speech recognition method, system, device and medium | |
Wang | [Retracted] Research on Open Oral English Scoring System Based on Neural Network | |
Han et al. | [Retracted] The Modular Design of an English Pronunciation Level Evaluation System Based on Machine Learning | |
Krug et al. | Articulatory synthesis for data augmentation in phoneme recognition | |
Venkatagiri | Speech recognition technology applications in communication disorders | |
CN112885326A (en) | Method and device for creating personalized speech synthesis model, method and device for synthesizing and testing speech | |
Xu et al. | Application of multimodal NLP instruction combined with speech recognition in oral english practice | |
Yue | English spoken stress recognition based on natural language processing and endpoint detection algorithm | |
CN114283788A (en) | Pronunciation evaluation method, training method, device and equipment of pronunciation evaluation system | |
Di Benedetto et al. | Lexical Access Model for Italian--Modeling human speech processing: identification of words in running speech toward lexical access based on the detection of landmarks and other acoustic cues to features | |
Junli | Speech recognition and English corpus vocabulary learning based on endpoint detection algorithm | |
CN117423260B (en) | Auxiliary teaching method based on classroom speech recognition and related equipment | |
Bao et al. | [Retracted] An Auxiliary Teaching System for Spoken English Based on Speech Recognition Technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191008 |
|
RJ01 | Rejection of invention patent application after publication |