[go: up one dir, main page]

CN110310644A - Wisdom class board exchange method based on speech recognition - Google Patents

Wisdom class board exchange method based on speech recognition Download PDF

Info

Publication number
CN110310644A
CN110310644A CN201910577869.5A CN201910577869A CN110310644A CN 110310644 A CN110310644 A CN 110310644A CN 201910577869 A CN201910577869 A CN 201910577869A CN 110310644 A CN110310644 A CN 110310644A
Authority
CN
China
Prior art keywords
user
signal
information
voice
carried out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910577869.5A
Other languages
Chinese (zh)
Inventor
陈天
蔡瑞琦
丁国柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yundi Technology Co Ltd
Original Assignee
Guangzhou Yundi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yundi Technology Co Ltd filed Critical Guangzhou Yundi Technology Co Ltd
Priority to CN201910577869.5A priority Critical patent/CN110310644A/en
Publication of CN110310644A publication Critical patent/CN110310644A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention provides a kind of wisdom class board exchange method based on speech recognition, comprising: receive the voice signal of the first user;Voice signal is pre-processed, the first digital signal is obtained;Feature extraction is carried out to the first digital signal, obtains characteristic parameter;Characteristic parameter is decoded, optimal word Model sequence is obtained;Optimal word Model sequence constitutes the text information of voice signal;Semantic analysis is carried out to the text information of continuous speech signal, text instruction is obtained and user is intended to;According to text instruction, corresponding output result is instructed in display interface display text;It is intended to according to user, determines response message;Response message is subjected to conversion process, generate continuous speech and is exported.The courses of action for shortening user as a result, alleviate user to the operational load of wisdom class board, can carry out showing interface and voice answer-back simultaneously, greatly enhance user experience.

Description

Wisdom class board exchange method based on speech recognition
Technical field
The present invention relates to a kind of processing method more particularly to a kind of wisdom class board exchange methods based on speech recognition.
Background technique
Network is fast-developing, and network-based application can help school's major occasion in administrative area to carry out Rich Media The publication of information improves information in the propagation efficiency of school.Wisdom class board is an application vector in information technology environment, more It is the important component of wisdom classroom and Intelligent campus construction.It is deployed in classroom doorway, has and shows curriculum schedule information, class The purposes such as grade and the notice of school, cultural construction, notice information publication, attendance of swiping the card, information inquiry, family-school interaction message. However, user can only carry out information interchange by traditional interactive mode and wisdom class board, it is inconvenient the problems such as enable user's body It tests and has a greatly reduced quality.
Existing wisdom class board is mainly the mechanism for transmitting character code as information between user and equipment, Yong Hutong Traditional input hardware equipment, such as the input equipment with touch sensible, remote controler, fuselage key are crossed, is all to rely on substantially In hardware device, to simulate and realize mouse, to software sending, accurately operational order, computer carry out letter by the instruction of user Breath handles and exports, and feeds back information to user in the form of character or picture.And user is using these traditional interaction sides Formula carries out being in passive position when human-computer interaction, can not accomplish and machine natural interaction.
Traditional interactive mode needs to follow certain operating procedure, and input efficiency is low;Traditional interactive mode needs to use Family carries out certain learning and Memory to application method, and cognitive load is higher.
Summary of the invention
The purpose of the present invention is in view of the drawbacks of the prior art, provide a kind of board interaction side, wisdom class based on speech recognition Method needs user to carry out certain learning and Memory to application method to solve wisdom class board interactive mode in the prior art, The caused higher problem of cognitive load.
To solve the above problems, the present invention provides a kind of wisdom class board exchange method based on speech recognition, the side Method includes:
Receive the voice signal of the first user;
The voice signal is pre-processed, the first digital signal is obtained;
Feature extraction is carried out to first digital signal, obtains characteristic parameter;
The characteristic parameter is decoded, optimal word Model sequence is obtained;Described in the optimal word Model sequence is constituted The text information of voice signal;
Semantic analysis is carried out to the text information of the continuous voice signal, text instruction is obtained and user is intended to;
According to the text instruction, corresponding output result is instructed in display interface display text;
It is intended to according to the user, determines response message;
The response message is subjected to conversion process, generate continuous speech and is exported.
In one possible implementation, before the method further include:
Receive the first voice authentication information of the first user;
Feature extraction is carried out to the first voice authentication information, obtains fisrt feature information;
The fisrt feature information is matched with the first template in reference model library;
After successful match, the fisrt feature information is handled, determines the first identity information of the first user;Institute State the identity ID and the identity grade that the first identity information includes the first user;
The ID of the identity information and wisdom class board is bound.
In one possible implementation, described that the voice messaging is pre-processed, the first digital signal is obtained, It specifically includes:
Sampling and quantification treatment are carried out to the voice messaging, obtain raw digital signal;
Preemphasis processing is carried out to the raw digital signal, obtains preemphasis voice signal;
Sub-frame processing is carried out to the preemphasis voice signal, obtains framing voice signal;
Windowing process is carried out to the framing voice signal, obtains the first digital signal.
In one possible implementation, the characteristic parameter includes: linear predictor coefficient LPCC, perception linear prediction One of FACTOR P LP, mel-frequency cepstrum coefficient MFCC.
In one possible implementation, described that first digital signal is carried out when characteristic parameter is MFCC Feature extraction obtains characteristic parameter, specifically includes:
The first digital signal is changed into frequency-region signal from time-domain signal using fast Fourier transform FFT;
Convolution is carried out according to the triangular filter group that Mel scale is distributed to the frequency-region signal;
According to convolution results, the vector constituted to the output of each triangular filter in the triangular filter group is carried out from remaining String changes DCT;
Top n coefficient in the DCT is taken, the characteristic parameter of the first digital signal is obtained.
In one possible implementation, described that the characteristic parameter is decoded, optimal word Model sequence is obtained, It specifically includes:
Respectively by acoustic model, language model and pronunciation dictionary, to the reference model library of the characteristic parameter and built in advance In the similitude of reference template give a mark, obtain corresponding first score of acoustic model, speech model corresponding second Divide third score corresponding with pronunciation dictionary;
Data fusion is weighted to first score, second score and the third score, obtains optimal word Model sequence, to be made up of the text information of voice the optimal word Model sequence.
In one possible implementation, the text information to the continuous voice signal carries out semantic point Analysis obtains text instruction and user is intended to specifically include:
Morphological analysis is carried out to the text information, the text information is divided into multiple words;
Syntactic analysis is carried out with the relationship between the multiple words of determination to the multiple word, generates the syntactic structure of sentence;
According to the syntactic structure, text instruction is obtained;
Using machine learning, intention analysis is carried out to the syntactic structure, determines the corresponding user's meaning of the syntactic structure Figure.
In one possible implementation, described that the response message is subjected to conversion process, generate continuous speech simultaneously Output, specifically includes:
By literary periodicals device, the response message is converted into continuous speech and is exported.
In one possible implementation, after the method further include:
Receive the second voice authentication information of second user;
Feature extraction is carried out to the second voice authentication information, obtains second feature information;
The second feature information is matched with the second template in reference model library;
After successful match, the second feature information is handled, determines the second identity information of second user;Institute State the identity ID and identity grade that the second identity information includes second user;
The identity grade of the second user is compared with the identity grade of first user;
When the identity grade of the second user be greater than first user identity grade when, release wisdom class board ID with The binding of first identity information, and the ID of wisdom class board and second identity information are bound.
In one possible implementation, the method also includes:
When no operating time being equal to preset time, sleep signal or screen locking signal are generated;
According to the sleep signal perhaps screen locking signal into dormant state or screen lock state.
By applying the wisdom class board exchange method provided by the invention based on speech recognition, the operation road of user is shortened Diameter alleviates the load of operation of the user to wisdom class board, can carry out showing interface and voice answer-back simultaneously, greatly enhance User experience.
Detailed description of the invention
Fig. 1 is the wisdom class board exchange method flow chart provided in an embodiment of the present invention based on speech recognition.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that for just Part relevant to related invention is illustrated only in description, attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Wisdom class board is generally deployed in classroom doorway, has notice, the text for showing curriculum schedule information, class and school Change the purposes such as construction, notice information publication, attendance of swiping the card, information inquiry, family's interaction message.The interactive system of wisdom class board by Multiple block combiners carry out speech recognition together, to the voice signal of user's input, to realize through voice and user Interaction.
Fig. 1 is the wisdom class board exchange method flow chart provided in an embodiment of the present invention based on speech recognition.This method Executing subject is wisdom class board, as shown in Figure 1, method includes the following steps:
Step 101, the voice signal of the first user is received.
Wherein, the first user interacted with wisdom class board can pass through and brush personal IC card, recognition of face or voice The modes such as identification carry out the identification of identity.
Below by taking speech recognition as an example, wisdom class board is illustrated with how the first user carries out identification.
Wisdom class board has speech reception module, and for receiving the voice messaging of user, which be can be Microphone is also possible to microphone array.To realize the acquisition to voice messaging.
Specifically, firstly, receiving the first voice authentication information of the first user;Then, to the first voice authentication information into Row feature extraction obtains fisrt feature information;Then, by the first template progress in fisrt feature information and reference model library Match;Then, after successful match, fisrt feature information is handled, determines the first identity information of the first user;First body Part information includes the identity ID and identity grade of the first user;Finally, the ID of identity information and wisdom class board is bound.
For example, speech reception module receives the information of the first user: " hello, I is Xiao Ming ", then to the information into Row pretreatment, for example after sampling, quantization, preemphasis, framing, windowing process, carry out feature extraction, extract fisrt feature information Afterwards, which is matched with the first template in reference model library.Wherein, there are multiple moulds in reference model library Plate, each template correspond to a user, preset wake-up word in the first template, the content of the wake-up word of user does not limit, can To be " hello ", " " etc., after successful match, into standby mode, meanwhile, which is carried out further Processing, identifies the identity information of interactive people, which includes the identity ID and identity grade of the first user.For example, should Also there is the preset voice messaging of the multiple users registered in advance, by presetting this feature parameter with this in reference model library The characteristic parameter of voice messaging matched, so that it is determined that the identity information of user.
Wherein, grade can be divided according to the identity of student, teacher.Grade between student is identical, teacher etc. Grade is higher than student.
Further, the grade that can also set the location of the wisdom class board student of corresponding class is higher than it The grade of his student.For example, the wisdom class board that wisdom class board is six grades one class, then the grade of six grades one class of student is higher than The grade of other classes student.
Further, the grade that can also set class leader is higher than the grade of non-class leader, for example, the grade of squad leader is high Other students of Yu Benban.Those skilled in the art can according to need, and preset each grade, and the application does not limit this.
After identity validation and binding, interface then enters waiting command status, voice behaviour from suspend mode or screen lock state Make the voice messaging for only handling current interaction people, i.e., one-to-one interactive mode, contact action is not limited by identity.
Step 102, voice signal is pre-processed, obtains the first digital signal.
Specifically, step 102 specifically includes:
Sampling and quantification treatment are carried out to voice signal, obtain raw digital signal;
Preemphasis processing is carried out to raw digital signal, obtains preemphasis voice signal;
Sub-frame processing is carried out to preemphasis voice signal, obtains framing voice signal;
Windowing process is carried out to framing voice signal, obtains the first digital signal.
Wherein, before carrying out preemphasis to voice signal, it is also necessary to be sampled and be quantified, the purpose of sampling is to mould Quasi- audio signal waveform is split, and the purpose of quantization is the amplitude measured with shaping value storage sampling.To voice signal into The purpose of row preemphasis is in order to which the high frequency section to voice aggravates, and the influence of removal lip radiation increases the height of voice Frequency division resolution.Preemphasis generally is realized by the way that transmission function is single order FIR high-pass digital filter, and wherein a is preemphasis system Number, 0.9 < a < 1.0.It is y (n) by preemphasis treated result if the speech sample value at n moment is x (n))=x (n)-ax (n-1), a=0.98 is taken here.
After carrying out the processing of preemphasis digital filtering, here is exactly to carry out sub-frame processing, and voice signal has short-term stationarity (10--30ms in it is considered that voice signal approximation constant) can be divided into voice signal some short sections thus to carry out Processing, here it is framing, the framing of voice signal is the method that is weighted using the window of moveable finite length come reality It is existing.General frame number per second is about 33-100 frame, is depended on the circumstances.General framing method is the method for overlapping segmentation, previous Frame and the overlapping part of a later frame are known as frame shifting, and frame, which is moved, is generally 0-0.5 with the ratio of frame length.
Adding window is usually to add Hamming window or rectangular window, to increase the decaying to high fdrequency component.
Step 103, feature extraction is carried out to the first digital signal, obtains characteristic parameter.
Specifically, different characteristic parameters can be extracted according to the different purposes of the first digital signal.Wherein, feature is joined The main linear predictive coefficient (Linear Predictive Cepstral Coding, LPCC) of number, perception linear predictor coefficient (Perceptual Linear Predictive, PLP), mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC).
When characteristic parameter is MFCC, step 103 is specifically included:
Firstly, using fast Fourier transform (Fast Fourier Transformation, FFT) by the first digital signal It is changed into frequency-region signal from time-domain signal;Then, the triangular filter group that frequency-region signal is distributed according to Mel scale is rolled up Product;Then, according to convolution results, the vector that the output of each triangular filter in triangular filter group is constituted become from cosine Change (Discrete Cosine Transform, DCT);Finally, taking top n coefficient in DCT, the spy of the first digital signal is obtained Levy parameter.
Step 104, characteristic parameter is decoded, obtains optimal word Model sequence;Optimal word Model sequence constitutes voice The text information of signal.
Specifically, step 104 includes:
Respectively by acoustic model, language model and pronunciation dictionary, in the reference model library of characteristic parameter and built in advance The similitude of reference template is given a mark, obtain corresponding first score of acoustic model, corresponding second score of speech model and The corresponding third score of pronunciation dictionary;
Data fusion is weighted to the first score, the second score and third score, obtains optimal word Model sequence, with logical Cross the text information that optimal word Model sequence constitutes voice.
Specifically, having speech recognition decoder and processor in wisdom class board, above-mentioned pretreatment, characteristic extraction procedure can Being carried out by processor.Subsequent, the characteristic parameter of voice can be sent to speech recognition decoder by processor, by voice Identify that decoder carries out similarity measurement comparison to the reference template in these characteristic parameters and reference model library.
Wherein, which may include signal processing module and characteristic extracting module, pass through the two modules respectively, into Row pretreatment and feature extraction.
Wherein, acoustic model is the knowledge to the difference of acoustics, phonetics, the variable of environment, speaker's gender, accent etc. It indicates, it is therefore an objective to convert orderly phoneme for the feature vector of all frames extracted through MFCC and export.
Language model is the representation of knowledge to one group of word Sequence composition, indicates the probability that a certain word sequence occurs, generally adopts With chain rule, the probability of a sentence is disassembled the product of the probability of each word in growing up to be a useful person.
Pronunciation dictionary, comprising from the mapping between word all factor, effect is for connection to acoustic model and language model 's.Pronunciation dictionary includes the set for the word that wisdom class board can be handled, and designates its pronunciation.
Weighted fusion algorithm is weighted and averaged to multi-source redundancy, and fusion value is as a result used as, and is a kind of directly right The method that data source is operated.
Acoustic model, language model and pronunciation dictionary etc. can be respectively to the reference moulds in characteristic parameter and reference model library Plate marking, weighted fusion algorithm is finally carried out on score domain, provides last court verdict, that is, obtain it is a series of to The optimal word Model sequence for describing input speech signal, to obtain the text information of voice.
Step 105, semantic analysis is carried out to the text information of continuous voice signal, obtains text instruction and user's meaning Figure.
Specifically, step 105 includes: to be divided into text information multiple firstly, to text information progress morphological analysis Word;Then, syntactic analysis is carried out with the relationship between the multiple words of determination to multiple words, generates the syntactic structure of sentence;Then, root According to syntactic structure, text instruction is obtained;Finally, carrying out intention analysis using machine learning to syntactic structure, determining syntactic structure Corresponding intention type;Finally, determining that user is intended to according to type is intended to.
Wherein, when carrying out morphological analysis, sentence is divided into each word using segmenter, relates generally to participle, part of speech mark The operation such as note.In order to improve the quality of morphological analysis, it is also necessary to be added Entity recognition, such as some intrinsic place names, name with And other proper names etc..In order to determine the relationship in sentence between vocabulary, syntactic analysis need to be used, the input of syntactic analysis is one Word string, output are the syntactic structures of sentence.It is finally realized using the method for machine learning and is intended to analysis, it is intended that the work master of analysis If sentence, which is assigned to corresponding intention type, by the method classified determines the intention of user, to mention according to type is intended to The response efficiency of high wisdom class board.
Step 106, according to text instruction, corresponding output result is instructed in display interface display text.
Specifically, the text information by semantic analysis is converted into the text instruction of one or more (a series of), and adjust It is operated with text instruction's interface.Such as first user say: " what class on today? " by speech recognition decoder and semanteme After the processing of analysis module, the corresponding instruction of output is recalled into the school timetable of today, and present on the display interface of wisdom class board.
Step 107, it is intended to according to user, determines response message.
Step 108, response message is subjected to conversion process, generates continuous speech and exported.
Specifically, determining response by the dialog manager in wisdom class board after obtaining user's intention according to semantic analysis Then information by the conversion of literary periodicals device, the continuous speech that response message changes into high quality, high naturalness is exported. Literary periodicals device, which is mainly realized, is converted to audio-frequency information for response message.
For example, the first user says: " what class on today? ", the user that gets is intended to are as follows: the first user wonders that one is whole It course arrangement.Dialog manager determines response message, which is text information, and literary periodicals device can be by text Information changes into the continuous speech output of high quality, high naturalness.For example, " you, which assign to today 8: 30 at 9 points, will go up mathematics in 15 minutes Class ... ".
Further, before step 101 or later, further includes:
Receive the second voice authentication information of second user;
Feature extraction is carried out to the second voice authentication information, obtains second feature information;
Second feature information is matched with the second template in reference model library;
After successful match, second feature information is handled, determines the second identity information of second user;Second body Part information includes the identity ID and identity grade of second user;
The identity grade of second user is compared with the identity grade of the first user;
When the identity grade of second user is greater than the identity grade of the first user, wisdom class board ID and the first identity are released The binding of information, and the ID of wisdom class board is bound with the second identity information.
Specifically, just obtaining and differentiating new identity information in real time after waking up from interface, it is assumed that current interaction people still in It in equipment interactive process, is accessed if any other interaction persons, grade comparison can be carried out to the identity of midway connector, if grade Lower or grade is identical, and wisdom class board will continue to handle the voice messaging of current interaction people, if higher ranked, the board meeting of wisdom class The identity information of higher ranked person is bound, and switchs to the voice messaging for handling higher ranked person, this facilitates class leader or teacher Control and management to wisdom class board.
Further, method further include:
When no operating time being equal to preset time, sleep signal or screen locking signal are generated;
According to sleep signal perhaps screen locking signal into dormant state or screen lock state.
Specifically, presetting no operation locks screen automatically the time, when user stops interaction with equipment and reaches the preset time Afterwards, interface reenters suspend mode or screen lock state, waits wake-up next time.
By applying the wisdom class board exchange method provided by the invention based on speech recognition, the operation road of user is shortened Diameter alleviates the load of operation of the user to wisdom class board, can carry out showing interface and voice answer-back simultaneously, greatly enhance User experience.
Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description. These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution. Professional technician can use different methods to achieve the described function each specific application, but this realization It should not be considered as beyond the scope of the present invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processor The combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium well known to interior.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (10)

1. a kind of wisdom class board exchange method based on speech recognition, which is characterized in that the described method includes:
Receive the voice signal of the first user;
The voice signal is pre-processed, the first digital signal is obtained;
Feature extraction is carried out to first digital signal, obtains characteristic parameter;
The characteristic parameter is decoded, optimal word Model sequence is obtained;The optimal word Model sequence constitutes the voice The text information of signal;
Semantic analysis is carried out to the text information of the continuous voice signal, text instruction is obtained and user is intended to;
According to the text instruction, corresponding output result is instructed in display interface display text;
It is intended to according to the user, determines response message;
The response message is subjected to conversion process, generate continuous speech and is exported.
2. the method according to claim 1, wherein before the method further include:
Receive the first voice authentication information of the first user;
Feature extraction is carried out to the first voice authentication information, obtains fisrt feature information;
The fisrt feature information is matched with the first template in reference model library;
After successful match, the fisrt feature information is handled, determines the first identity information of the first user;Described One identity information include the first user identity ID and the identity grade;
The ID of the identity information and wisdom class board is bound.
3. obtaining the method according to claim 1, wherein described pre-process the voice messaging One digital signal, specifically includes:
Sampling and quantification treatment are carried out to the voice messaging, obtain raw digital signal;
Preemphasis processing is carried out to the raw digital signal, obtains preemphasis voice signal;
Sub-frame processing is carried out to the preemphasis voice signal, obtains framing voice signal;
Windowing process is carried out to the framing voice signal, obtains the first digital signal.
4. the method according to claim 1, wherein the characteristic parameter includes: linear predictor coefficient LPCC, sense Know one of linear predictor coefficient PLP, mel-frequency cepstrum coefficient MFCC.
5. according to the method described in claim 4, it is characterized in that, when characteristic parameter be MFCC when, it is described to it is described first number Word signal carries out feature extraction, obtains characteristic parameter, specifically includes:
The first digital signal is changed into frequency-region signal from time-domain signal using fast Fourier transform FFT;
Convolution is carried out according to the triangular filter group that Mel scale is distributed to the frequency-region signal;
According to convolution results, the vector that the output of each triangular filter in the triangular filter group is constituted become from cosine Change DCT;
Top n coefficient in the DCT is taken, the characteristic parameter of the first digital signal is obtained.
6. being obtained optimal the method according to claim 1, wherein described be decoded the characteristic parameter Word Model sequence, specifically includes:
Respectively by acoustic model, language model and pronunciation dictionary, in the reference model library of the characteristic parameter and built in advance The similitude of reference template is given a mark, obtain corresponding first score of acoustic model, corresponding second score of speech model and The corresponding third score of pronunciation dictionary;
Data fusion is weighted to first score, second score and the third score, obtains optimal word model Sequence, to be made up of the text information of voice the optimal word Model sequence.
7. the method according to claim 1, wherein the text information to the continuous voice signal into Row semantic analysis obtains text instruction and user is intended to specifically include:
Morphological analysis is carried out to the text information, the text information is divided into multiple words;
Syntactic analysis is carried out with the relationship between the multiple words of determination to the multiple word, generates the syntactic structure of sentence;
According to the syntactic structure, text instruction is obtained;
Using machine learning, intention analysis is carried out to the syntactic structure, determines that the corresponding user of the syntactic structure is intended to.
8. being generated the method according to claim 1, wherein described carry out conversion process for the response message Continuous speech simultaneously exports, and specifically includes:
By literary periodicals device, the response message is converted into continuous speech and is exported.
9. according to the method described in claim 2, it is characterized in that, after the method further include:
Receive the second voice authentication information of second user;
Feature extraction is carried out to the second voice authentication information, obtains second feature information;
The second feature information is matched with the second template in reference model library;
After successful match, the second feature information is handled, determines the second identity information of second user;Described Two identity informations include the identity ID and identity grade of second user;
The identity grade of the second user is compared with the identity grade of first user;
When the identity grade of the second user be greater than first user identity grade when, release wisdom class board ID with it is described The binding of first identity information, and the ID of wisdom class board and second identity information are bound.
10. the method according to claim 1, wherein the method also includes:
When no operating time being equal to preset time, sleep signal or screen locking signal are generated;
According to the sleep signal perhaps screen locking signal into dormant state or screen lock state.
CN201910577869.5A 2019-06-28 2019-06-28 Wisdom class board exchange method based on speech recognition Pending CN110310644A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910577869.5A CN110310644A (en) 2019-06-28 2019-06-28 Wisdom class board exchange method based on speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910577869.5A CN110310644A (en) 2019-06-28 2019-06-28 Wisdom class board exchange method based on speech recognition

Publications (1)

Publication Number Publication Date
CN110310644A true CN110310644A (en) 2019-10-08

Family

ID=68079256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910577869.5A Pending CN110310644A (en) 2019-06-28 2019-06-28 Wisdom class board exchange method based on speech recognition

Country Status (1)

Country Link
CN (1) CN110310644A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110971875A (en) * 2019-12-04 2020-04-07 广州云蝶科技有限公司 Control method and device combining recording and broadcasting system and IP broadcasting system
CN111559675A (en) * 2020-05-22 2020-08-21 云知声智能科技股份有限公司 Method for controlling elevator by voice
CN111904806A (en) * 2020-07-30 2020-11-10 云知声智能科技股份有限公司 Blind guiding system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110258285A1 (en) * 2005-02-25 2011-10-20 Lightningcast LLC. Inserting Branding Elements
CN105335398A (en) * 2014-07-18 2016-02-17 华为技术有限公司 Service recommendation method and terminal
CN107767875A (en) * 2017-10-17 2018-03-06 深圳市沃特沃德股份有限公司 Sound control method, device and terminal device
CN107895578A (en) * 2017-11-15 2018-04-10 百度在线网络技术(北京)有限公司 Voice interactive method and device
CN108153881A (en) * 2017-12-26 2018-06-12 重庆大争科技有限公司 Teaching monitoring and managing method based on Intelligent campus management
CN108959520A (en) * 2018-06-28 2018-12-07 百度在线网络技术(北京)有限公司 Searching method, device, equipment and storage medium based on artificial intelligence
CN109036391A (en) * 2018-06-26 2018-12-18 华为技术有限公司 Audio recognition method, apparatus and system
CN109741746A (en) * 2019-01-31 2019-05-10 上海元趣信息技术有限公司 Robot personalizes interactive voice algorithm, emotion communication algorithm and robot

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110258285A1 (en) * 2005-02-25 2011-10-20 Lightningcast LLC. Inserting Branding Elements
CN105335398A (en) * 2014-07-18 2016-02-17 华为技术有限公司 Service recommendation method and terminal
CN107767875A (en) * 2017-10-17 2018-03-06 深圳市沃特沃德股份有限公司 Sound control method, device and terminal device
CN107895578A (en) * 2017-11-15 2018-04-10 百度在线网络技术(北京)有限公司 Voice interactive method and device
CN108153881A (en) * 2017-12-26 2018-06-12 重庆大争科技有限公司 Teaching monitoring and managing method based on Intelligent campus management
CN109036391A (en) * 2018-06-26 2018-12-18 华为技术有限公司 Audio recognition method, apparatus and system
CN108959520A (en) * 2018-06-28 2018-12-07 百度在线网络技术(北京)有限公司 Searching method, device, equipment and storage medium based on artificial intelligence
CN109741746A (en) * 2019-01-31 2019-05-10 上海元趣信息技术有限公司 Robot personalizes interactive voice algorithm, emotion communication algorithm and robot

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110971875A (en) * 2019-12-04 2020-04-07 广州云蝶科技有限公司 Control method and device combining recording and broadcasting system and IP broadcasting system
CN110971875B (en) * 2019-12-04 2021-02-05 广州云蝶科技有限公司 Control method and device combining recording and broadcasting system and IP broadcasting system
CN111559675A (en) * 2020-05-22 2020-08-21 云知声智能科技股份有限公司 Method for controlling elevator by voice
CN111904806A (en) * 2020-07-30 2020-11-10 云知声智能科技股份有限公司 Blind guiding system

Similar Documents

Publication Publication Date Title
Darabkh et al. An efficient speech recognition system for arm‐disabled students based on isolated words
Ahsiah et al. Tajweed checking system to support recitation
Cao et al. [Retracted] Optimization of Intelligent English Pronunciation Training System Based on Android Platform
CN110310644A (en) Wisdom class board exchange method based on speech recognition
Liu et al. AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning
Laurinčiukaitė et al. Lithuanian Speech Corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode
Wang et al. Research on correction method of spoken pronunciation accuracy of AI virtual English reading
Wang Detecting pronunciation errors in spoken English tests based on multifeature fusion algorithm
CN108364655A (en) Method of speech processing, medium, device and computing device
CN113539239B (en) Voice conversion method and device, storage medium and electronic equipment
Habbash et al. Recognition of Arabic accents from English spoken speech using deep learning approach
Vaquero et al. E-inclusion technologies for the speech handicapped
CN118038851B (en) A multi-dialect speech recognition method, system, device and medium
Wang [Retracted] Research on Open Oral English Scoring System Based on Neural Network
Han et al. [Retracted] The Modular Design of an English Pronunciation Level Evaluation System Based on Machine Learning
Krug et al. Articulatory synthesis for data augmentation in phoneme recognition
Venkatagiri Speech recognition technology applications in communication disorders
CN112885326A (en) Method and device for creating personalized speech synthesis model, method and device for synthesizing and testing speech
Xu et al. Application of multimodal NLP instruction combined with speech recognition in oral english practice
Yue English spoken stress recognition based on natural language processing and endpoint detection algorithm
CN114283788A (en) Pronunciation evaluation method, training method, device and equipment of pronunciation evaluation system
Di Benedetto et al. Lexical Access Model for Italian--Modeling human speech processing: identification of words in running speech toward lexical access based on the detection of landmarks and other acoustic cues to features
Junli Speech recognition and English corpus vocabulary learning based on endpoint detection algorithm
CN117423260B (en) Auxiliary teaching method based on classroom speech recognition and related equipment
Bao et al. [Retracted] An Auxiliary Teaching System for Spoken English Based on Speech Recognition Technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191008

RJ01 Rejection of invention patent application after publication