CN109273004A

CN109273004A - Predictive speech recognition method and device based on big data

Info

Publication number: CN109273004A
Application number: CN201811505498.1A
Authority: CN
Inventors: 吴有宝; 胡明国
Original assignee: AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2019-01-25
Anticipated expiration: 2038-12-10
Also published as: CN109273004B

Abstract

The invention discloses a kind of predictive audio recognition method based on big data includes the following steps: to receive the first recognition result；Judge whether the first recognition result is the first Chinese character identified, and if it is first Chinese character, the final recognition result of the first recognition result is determined according to confidence level；If non-first Chinese character, according to the confidence level of the first recognition result and to the big data analysis of first final recognition result as a result, determining the final recognition result of the first recognition result.The invention also discloses a kind of prediction speech recognition equipment based on big data, it according to the method for the present invention can be by carrying out Confidence Analysis to identification intermediate result with device, and secondary analysis is carried out to the recognition result based on big data, the high recognition result of accuracy rate can be obtained, the experience sense of user is greatly improved.

Description

Predictive audio recognition method and device based on big data

Technical field

The present invention relates to technical field of voice recognition, especially a kind of predictive audio recognition method based on big data and Device.

Background technique

The maturation of interactive voice technology more at present, in interactive voice, speech recognition process is generally required to knowledge Not Chu the marking of each word, each word corresponds to different scores, and the highest word that will give a mark is as recognition result.The side of this marking Formula accuracy rate is not high, and only makes recognition result by the marking mode, not objective enough, influences user experience.

Summary of the invention

To solve the above-mentioned problems, it is contemplated that angle based on big data is set out, the intermediate of speech recognition is known Other result makees optimization processing, optimizes intermediate identification process, so that recognition result is more accurate, rationally, significant increase user's body It tests.

According to the first aspect of the invention, a kind of predictive audio recognition method based on big data is provided, including such as Lower step:

Receive the first recognition result；

Judge whether the first recognition result is the first Chinese character identified,

If it is first Chinese character, the final recognition result of the first recognition result is determined according to confidence level；

If non-first Chinese character, according to the confidence level of the first recognition result and to the big data point of first final recognition result Analysis is as a result, determine the final recognition result of the first recognition result.

According to the second aspect of the invention, a kind of predictive speech recognition equipment based on big data is provided, including

Intermediate result obtains module, for receiving the first recognition result；

Forecasting recognition module, including

Judging unit, for judging whether the first recognition result is the first Chinese character identified, in Chinese character headed by judgement It calls first Chinese character predicting unit to carry out respective handling, subsequent Chinese character predicting unit is otherwise called to carry out respective handling；

First Chinese character predicting unit, for determining the final recognition result of the first recognition result according to confidence level；

Subsequent Chinese character predicting unit, for the confidence level according to the first recognition result and to the big of first final recognition result Data analysis result determines the final recognition result of the first recognition result.

According to the third aspect of the present invention, a kind of electronic equipment is provided comprising: at least one processor, and The memory being connect at least one processor communication, wherein memory is stored with the finger that can be executed by least one processor It enables, instruction is executed by least one processor, so that the step of at least one processor is able to carry out the above method.

According to the fourth aspect of the present invention, a kind of storage medium is provided, computer program is stored thereon with, the program The step of above method is realized when being executed by processor.

Method and device is provided according to the present invention, by carrying out Confidence Analysis to identification intermediate result, and based on big Data carry out secondary analysis to the recognition result, can obtain the high recognition result of accuracy rate, greatly improve the body of user Test sense.

Detailed description of the invention

Fig. 1 is the predictive audio recognition method flow chart based on big data of an embodiment of the present invention；

Fig. 2 is the predictive speech recognition equipment functional block diagram based on big data of an embodiment of the present invention；

Fig. 3 is the block diagram of the electronic equipment of one embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.

The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.

In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions The signals of data communicated by locally and/or remotely process.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want There is also other identical elements in the process, method, article or equipment of element.

The predictive audio recognition method based on big data of the embodiment of the present invention can be applied to any be configured with voice The terminal device of function, for example, the terminal devices such as smart phone, tablet computer, smart home, the invention is not limited in this regard, So that user obtains response more promptly and accurately during using these terminal devices, user experience is promoted.

The invention will now be described in further detail with reference to the accompanying drawings.

Fig. 1 schematically shows the predictive audio recognition method based on big data according to an embodiment of the present invention Flow chart, as shown in Figure 1, the present embodiment includes the following steps:

Step S101: the first recognition result is received.After starting interactive voice, start the language that audio monitoring issues user Sound is identified, is identified according to prior art speech recognition to the voice received, and the first recognition result is obtained.It needs Bright, the prior art is to carry out the identification of Chinese character one by one to the identification of voice, then successively determines the recognition result of each word , thus, the method for the embodiment of the present invention is that analysis is recognized to each word identified, to determine to the Chinese character Final recognition result.

Step S102: judge whether the first recognition result is the first Chinese character identified, so that it is determined that final recognition result. If it is first Chinese character, then starting step S103: determining the final recognition result of the first recognition result according to confidence level, first The identification of word can be according to acoustics point in the prior art and language point marking, to highest word of giving a mark as first Chinese character, confidence level It for example chooses and is judged using 70% as confidence threshold based on experience value, general recognition result can obtain in resolving Corresponding confidence level, can also height-regulating confidence level appropriate in order to provide the accuracy of identification.It is straight if having reached confidence level It connects to export this recognition result and is used as final recognition result.

If not the recognition result of first Chinese character, then starting step S104: can be first according to the confidence of the first recognition result Degree and to the big data analysis of first final recognition result as a result, the final recognition result of the first recognition result of determination.It is specific real It is existing are as follows:

First choice judges whether the confidence level of the first recognition result reaches confidence threshold value, if reaching confidence threshold value, Using the first recognition result as final recognition result, i.e. above-mentioned steps.If being lower than confidence threshold value, to formerly final identification As a result big data analysis is carried out, the final recognition result of the first recognition result is determined based on analysis result, to realize prediction Function.For first final recognition result to be carried out to the mode of big data analysis, it illustratively can be, obtain big data in advance Database, wherein the database of big data can be selects online dictionary, is also possible to user oneself and configures generate Dictionary, as long as the database of big data meets " there are many dictionaries for storage in big data ", it includes with good grounds machine experiences The features such as statistics of speech recognition matching dictionary, the dictionary counted according to the result of usual interactive voice, wherein word Chinese character, Chinese character pronunciation, Chinese phrase and phrase label are preferably configured to store in allusion quotation library, phrase label includes commonly using, very With and access times.Whether common and non-common mark word or phrase are everyday words, and access times identify the word or phrase exists The daily number being used by a user.Dictionary based on big data, to first final recognition result (i.e. intermediate recognition result or identification The final recognition result of a upper Chinese character out illustratively instructs in " opening music ... " user speech received, " beating ", " " " opening ", " open sound " be intermediate recognition result, Chinese character headed by the final recognition result of a upper Chinese character for "ON" " beating ", the final recognition result of a upper Chinese character of " sound " they are second Chinese character "ON", and so on) carry out matching generation multiple groups Permutation and combination, i.e., by the way that by this, formerly final recognition result is matched with the database of big data, it is available to it is first The final matched multiple phrase combinations of recognition result.Then, from permutation and combination reject be of little use combination (i.e. based on commonly use, no Common phrase label is screened), form the permutation and combination of high matching degree.For example, for " navigation " this recognition result, it is right It " may be led " according to first character in dictionary, associate multiple words, such as " guided missile ", " instructor in broadcasting ", " navigation ".But it is right It in interactive voice and is of little use in " guided missile ", " instructor in broadcasting " this vocabulary, so can weed out, leaving " navigation ", this is alternative , and determined according to the last one Chinese character (such as " boat ") and the similarity of current first recognition result (such as " containing ") in permutation and combination Final recognition result (it is determined as " navigate ", rather than " containing ", have modified the first recognition result), wherein similarity is secondary according to using Number and/or pronunciation initial determine, such as Optimum Matching or the identical priority match of initial pronunciation more than access times, or Meet access times and reaches preset value and the identical priority match of initial pronunciation etc..

According to this embodiment, it can which the dictionary by big data carries out secondary analysis to intermediate recognition result, in conjunction with confidence Degree reaches the function of Forecasting recognition result, compared to the recognition result that best result is chosen in the existing scoring for being based only on recognition result As the method for final recognition result, the high recognition result of accuracy rate can be obtained, the experience sense of user is greatly improved.

Fig. 2 schematically shows the predictive speech recognition equipment based on big data according to an embodiment of the present invention Functional block diagram, as shown in Fig. 2,

Predictive speech recognition equipment based on big data includes that intermediate result obtains module 3 and Forecasting recognition module 2.

Intermediate result obtains module 3 and is used to receive the first recognition result, can be according to prior art speech recognition to reception To voice identified, obtain the first recognition result.

Forecasting recognition module 2 includes determining whether unit 201, first Chinese character predicting unit 202 and subsequent Chinese character predicting unit 203.

Judging unit 201 is for judging whether the first recognition result is the first Chinese character identified, the Chinese character headed by judgement When call first Chinese character predicting unit 203 to carry out respective handling, otherwise call subsequent Chinese character predicting unit 203 to carry out respective handling, Specific implementation is referred to above-mentioned method part.First Chinese character predicting unit will be called directly when Chinese character headed by judgement 202 are handled, and concrete implementation mode is referred to above-mentioned method part.

First Chinese character predicting unit 202 is used to determine the final recognition result of the first recognition result according to confidence level.Different The single word of recognition result has different confidence levels, 70% can be set by confidence level according to machine experience, as confidence threshold Value is judged that general recognition result can obtain corresponding confidence level in resolving, in order to provide the accuracy of identification It can also height-regulating confidence level appropriate.Directly by the output of this recognition result as final identification knot if having reached confidence level Fruit.Each recognition result includes the corresponding confidence level of the word, the confidence level for the recognition result which will acquire with it is preset Confidence level is compared, and judges whether it reaches defined confidence level, if reached i.e. as final recognition result, ensure that knowledge The accuracy of other result.

Subsequent Chinese character predicting unit 203 is used for according to the confidence level of the first recognition result and to first final recognition result Big data analysis is as a result, determine the final recognition result of the first recognition result.It is implemented is realized by above-mentioned method.

Subsequent Chinese character predicting unit 203 includes that the first prediction element 2031, second predicts element 2032 and Chinese character dictionary 2033。

First prediction element 2031 is for judging whether the confidence level of the first recognition result reaches confidence threshold value, if reaching Confidence threshold value, then using the first recognition result as final recognition result, the judgement of the final recognition result and above-mentioned first Chinese character The judgment method of predicting unit 202 is consistent.Otherwise the second prediction element 2032 is called to carry out respective handling.

Second prediction element 2032 is used to carry out big data analysis to first final recognition result, is determined based on analysis result The final recognition result of first recognition result.Big data analysis is carried out to need to be carried out by means of Chinese character dictionary 2033 with reference to base It is quasi-.Chinese character dictionary 2033 includes common, non-for storing Chinese character, Chinese character pronunciation, Chinese phrase and phrase label, phrase label Common and access times.The method confirmed based on big data analysis to online final recognition result is referred to the above method The step of part, herein without repeating.

It may be implemented to predict recognition result according to the device of the present embodiment, by means of the internal information of big data, Available more accurate prediction result, substantially increases the experience sense of user, overcomes the mistake error rate of some recognition results The problems such as higher.

In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit Being stored in storage media one or more includes the programs executed instruction, it is described execute instruction can by electronic equipment (including but It is not limited to computer, server or the network equipment etc.) it reads and executes, to be based on for executing any of the above-described of the present invention The predictive audio recognition method of big data.

In some embodiments, the embodiment of the present invention also provides a kind of computer program product, computer program product packet The computer program being stored on non-volatile computer readable storage medium storing program for executing is included, computer program includes program instruction, works as institute When program instruction is computer-executed, computer is made to execute predictive audio recognition method of any of the above-described based on big data.

In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, And the memory being connect at least one processor communication, wherein memory, which is stored with, to be executed by least one processor Instruction, instruction by least one described processor execute so that at least one processor is able to carry out based on the pre- of big data The property surveyed audio recognition method.

In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program, It is characterized in that, predictive audio recognition method when which is executed by processor based on big data.

The predictive speech recognition equipment based on big data of the embodiments of the present invention can be used for executing implementation of the present invention The predictive audio recognition method based on big data of example, and the realization for reaching the embodiments of the present invention accordingly is based on big number According to predictive audio recognition method technical effect achieved, which is not described herein again.It can be by hard in the embodiment of the present invention Part processor (hardware processor) realizes related function module.

Fig. 3 is that the electronics for predictive audio recognition method of the execution based on big data that another embodiment of the application provides is set Standby hardware structural diagram, as shown in figure 3, the equipment includes:

One or more processors 310 and memory 320, in Fig. 3 by taking a processor 310 as an example.

The equipment for executing the predictive audio recognition method based on big data can also include: input unit 330 and output Device 340.

Processor 310, memory 320, input unit 330 and output device 340 can pass through bus or other modes It connects, in Fig. 3 for being connected by bus.

Memory 320 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, such as the predictive voice based on big data in the embodiment of the present application Corresponding program instruction/the module of recognition methods.The non-volatile software that processor 310 is stored in memory 320 by operation Program, instruction and module, thereby executing the various function application and data processing of server, i.e. the realization above method is implemented Predictive audio recognition method of the example based on big data.

Memory 320 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function；Storage data area can be stored knows according to the predictive voice based on big data Other device uses created data etc..In addition, memory 320 may include high-speed random access memory, can also wrap Include nonvolatile memory, for example, at least a disk memory, flush memory device or other non-volatile solid state memories Part.In some embodiments, it includes the memory remotely located relative to processor 310 that memory 320 is optional, these are remotely deposited Reservoir can pass through network connection to the predictive speech recognition equipment based on big data.The example of above-mentioned network includes but unlimited In internet, intranet, local area network, mobile radio communication and combinations thereof.

Input unit 330 can receive the number or character information of input, and generate and the predictive language based on big data The related signal of user setting and function control of sound identification device.Output device 340 may include that display screen etc. shows equipment.

Said one or multiple modules are stored in the memory 320, when by one or more of processors When 310 execution, the predictive audio recognition method based on big data in above-mentioned any means embodiment is executed.

Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.

The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:

(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..

(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.

(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.

(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.

(5) other electronic devices with data interaction function.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or Method described in certain parts of embodiment.

Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

A predictive speech recognition method based on big data, comprising:

Receiving a first recognition result;

Determining whether the first recognition result is the first Chinese character identified,

If it is the first Chinese character, the final recognition result of the first recognition result is determined according to the confidence level;

If it is not the first Chinese character, the final recognition result of the first recognition result is determined according to the confidence of the first recognition result and the big data analysis result of the previous final recognition result.

The method according to claim 1, wherein the determining the final recognition result of the first recognition result according to the confidence of the first recognition result and the big data analysis result of the previous final recognition result comprises:

Determining whether the confidence of the first recognition result reaches a confidence threshold,

If the confidence threshold is reached, the first recognition result is used as the final recognition result;

If it is lower than the confidence threshold, the big data analysis is performed on the previous final recognition result, and the final recognition result of the first recognition result is determined based on the analysis result.

The method according to claim 2, wherein if the threshold is lower than the confidence threshold, the big data analysis is performed on the previous final recognition result, and the final recognition result of the first recognition result is determined based on the analysis result, including:

A permutation combination based on big data acquisition and prior final recognition results;

The final recognition result is determined according to the similarity between the last Chinese character in the permutation combination and the first recognition result.

4. Method according to claim 3, characterized in that the similarity is determined according to the number of uses and the initials of the pronunciation.

The method according to claim 3, further comprising: before determining the final recognition result according to the similarity between the last Chinese character in the permutation combination and the first recognition result:

Unusual combinations are eliminated from the permutation combination to form a high-matching permutation combination.

6. A predictive speech recognition device based on big data, characterized in that it comprises

An intermediate result obtaining module, configured to receive the first recognition result;

Predictive identification module, including

The determining unit is configured to determine whether the first recognition result is the first Chinese character identified, and when the first Chinese character is determined to be the first Chinese character, the first Chinese character prediction unit is called to perform corresponding processing; otherwise, the subsequent Chinese character prediction unit is called to perform corresponding processing;

a first Chinese character prediction unit, configured to determine a final recognition result of the first recognition result according to the confidence level;

The subsequent Chinese character prediction unit is configured to determine a final recognition result of the first recognition result according to the confidence of the first recognition result and the big data analysis result of the previous final recognition result.

The apparatus according to claim 6, wherein the subsequent Chinese character prediction unit comprises

a first prediction component, configured to determine whether the confidence of the first recognition result reaches a confidence threshold; if the confidence threshold is reached, the first recognition result is used as a final recognition result; otherwise, the second prediction component is called to perform corresponding processing;

The second prediction component is configured to perform big data analysis on the prior final recognition result, and determine a final recognition result of the first recognition result based on the analysis result.

The device according to claim 7, wherein the Chinese character dictionary library is configured to store Chinese characters, Chinese character pronunciations, Chinese character phrases and phrase tags, and the phrase tags include common usage, very use, and usage times;

The second prediction component performs big data analysis on the prior final recognition result according to the Chinese character dictionary library.

9. An electronic device comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being The at least one processor is operative to enable the at least one processor to perform the steps of the method of any of claims 1-5.

10. A storage medium having stored thereon a computer program, characterized in that the program is executed by a processor to carry out the steps of the method of any of claims 1-5.