CN104282301A - Voice command processing method and system - Google Patents
Voice command processing method and system Download PDFInfo
- Publication number
- CN104282301A CN104282301A CN201310287147.9A CN201310287147A CN104282301A CN 104282301 A CN104282301 A CN 104282301A CN 201310287147 A CN201310287147 A CN 201310287147A CN 104282301 A CN104282301 A CN 104282301A
- Authority
- CN
- China
- Prior art keywords
- service
- decoding
- institute
- class
- decoding network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the invention discloses a voice command processing method and system. The voice command processing method and system are used for improving the response efficiency of user voice command input and improving user experience. The voice command processing method includes the steps of obtaining a voice command for a service request, extracting a voice feature sequence according to the voice command, decoding the voice command according to a preset service category decoding network, determining the service category of the current request, determining a service decoding network corresponding to the service category, carrying out secondary decoding on the voice command according to the determined service decoding network, and determining the complete content of the voice command.
Description
Technical field
The present invention relates to communication and field of computer technology, especially relate to a kind of voice command disposal route and system.
Background technology
In existing voice command control system, the voice command signal that system receives user inputs also extracts corresponding acoustic feature sequence; System searches for the optimal path corresponding to described acoustic feature sequence in the order word decoding network preset subsequently, obtains user command content.All voice commands that described default order word decoding network is supported by system usually and corresponding voice command parameter are determined.That is to the phonetic entry of user, system needs the probability calculating all possible paths in a complete order word decoding network, determines optimum recognition result.Obviously the voice response function along with the support of voice command control system is increasing, and decoding network scale also day by day expands, and the operational efficiency of decoding in described order word decoding network also can be under some influence.
Particularly to user's shirtsleeve operation order, as " phoning Zhang San " etc., because system still needs all voice commands and correlation parameter decoding, thus delay the time, greatly have impact on Consumer's Experience; Can in the lump with reference to the word level schematic diagram of a kind of existing order word decoding network shown in figure 1, the voice command of user is inputted, system needs from start node, calculates the acoustic feature sequence of described extraction and the similarity of all voice commands and the corresponding model of correlation parameter frame by frame; As inputted the voice command of " phoning Zhang San ", system needs the similarity of calculating acoustic feature sequence in the space that forms in the relevant path of the relevant path of " making a phone call " order, path that " navigating to " order is relevant, " program request " order and the relevant path of other orders.The mode that this legacy system is decoded in overall network easily causes system responses comparatively slow, and particularly to the order input with speech parameter on a small scale, its decode time easily exceedes user's expection, and then affects Consumer's Experience.Such as, for the decoding of user speech input " phoning Zhang San ", system respectively by its command parameter relevant with navigation service (as 1,000,000 point of interest (POI, Point of Interest) data) and the relevant command parameter (as 1,000 name parameters) of telephone service and the relevant command parameter (as 2,000 first songs) of music services mate respectively, it is the decode time that 1,000,000 POI entries add 1,000 names and 2,000 first songs that the response time of its system is approximately added up, and have impact on the quick response to order.Under this external this mode of decoding based on extensive decoding network, increase owing to obscuring vocabulary, its discrimination also may can be subject to certain impact.
Summary of the invention
Embodiments providing a kind of voice command disposal route and system, for improving the response efficiency of user voice command input, improving Consumer's Experience.
First aspect present invention provides a kind of voice command disposal route, wherein, can comprise:
Obtain the voice command being used for service request;
According to institute's speech commands, extract phonetic feature sequence;
According to preset class of service decoding network, institute's speech commands is decoded, determine the class of service of current request;
Determine the business decoding network corresponding with described class of service;
According to the business decoding network determined, decode in two phases is carried out to institute's speech commands, determine the complete content of institute's speech commands.
Preferably, describedly according to preset class of service decoding network, institute's speech commands to be decoded, determines the class of service of current request, comprising:
In preset class of service decoding network, search for and determine to have with described phonetic feature sequence the first decoding paths of maximum similarity;
According to the class of service of described first decoding paths determination current request.
Preferably, the described business decoding network according to determining carries out decode in two phases to institute's speech commands, determines the complete content of institute's speech commands, comprising:
In the business decoding network corresponding with described class of service determined, select and determine to have with described phonetic feature sequence the second decoding paths of maximum similarity;
The complete content of institute's speech commands is determined according to described second decoding paths.
Preferably, describedly in the business decoding network determined, decode in two phases is carried out to voice command, determines the complete content of institute's speech commands, comprising:
Obtain the voice segments information in the preset decoded voice command of class of service decoding network corresponding to keyword;
Obtain the voice signal corresponding with command parameter part in institute speech commands;
Determine the service parameter decoding network that described business decoding network is corresponding;
In described service parameter decoding network, described voice signal is decoded, select and determine to have with described phonetic feature sequence the 3rd decoding paths of maximum similarity;
The voice command parameter corresponding to described voice signal is determined according to described 3rd decoding paths.
According to institute's speech segment information and institute's speech commands parameter, determine the complete content of institute's speech commands.
Second aspect present invention provides a kind of voice command disposal system, wherein, comprising:
Acquisition module, for obtaining the voice command for service request;
Extraction module, for according to institute's speech commands, extracts phonetic feature sequence;
First decoder module, for decoding to institute's speech commands according to preset class of service decoding network, determines the class of service of current request;
Determination module, for determining the business decoding network corresponding with described class of service;
Second decoder module, for carrying out decode in two phases according to the business decoding network determined to decoded voice command, determines the complete content of institute's speech commands.
Preferably, described first decoder module, specifically in preset class of service decoding network, searches for and determines to have the first decoding paths with described phonetic feature sequence maximum similarity; According to the class of service of described first decoding paths determination current request.
Preferably, described second decoder module, specifically in the business decoding network corresponding with described class of service determined, select and determine to have with described phonetic feature sequence the second decoding paths of maximum similarity, described business decoding network is the decoding network comprising business related command keyword and command parameter; The complete content of institute's speech commands is determined according to described second decoding paths.
Preferably, described second decoder module, specifically for obtaining the voice segments information in the preset decoded voice command of class of service decoding network corresponding to order keyword; Obtain the voice signal corresponding with command parameter part in institute speech commands; In described service parameter decoding network, described voice signal is decoded, select and determine to have with described phonetic feature sequence the 3rd decoding paths of maximum similarity; The voice command parameter corresponding to described voice signal is determined according to described 3rd decoding paths; According to institute's speech segment information and institute's speech commands parameter, determine the complete content of institute's speech commands.
As can be seen from the above technical solutions, a kind of voice command disposal route that the embodiment of the present invention provides and system, have the following advantages: the voice command for service request obtained is decoded first, determine the class of service of this request, thereafter corresponding according to described class of service business decoding network carries out decode in two phases to institute's speech commands, thus determines the complete content of voice command; In the business decoding network corresponding with class of service, carry out decode in two phases, the quick response to voice command can be realized, in particular improving the practicality for supporting many kinds of parameters scale not wait voice command control system, also improving vocabulary discrimination.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described describing the required accompanying drawing used to embodiment below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of a kind of voice command disposal route that Fig. 1 provides for the embodiment of the present invention;
Another schematic flow sheet of the voice command disposal route that Fig. 2 provides for the embodiment of the present invention;
Another schematic flow sheet of the voice command disposal route that Fig. 3 provides for the embodiment of the present invention;
Another schematic flow sheet of the voice command disposal route that Fig. 4 provides for the embodiment of the present invention;
Fig. 5 is the decoding network search schematic diagram adopted in the embodiment of the present invention;
Fig. 6 is the response schematic diagram to a kind of business in the embodiment of the present invention;
Fig. 7 a is the class of service network decoding schematic diagram that in the embodiment of the present invention, a kind of business is relevant;
Fig. 7 b is the business network decoding schematic diagram that in the embodiment of the present invention, a kind of business is relevant;
Fig. 8 is the service parameter decoding network schematic diagram that in the embodiment of the present invention, a kind of business is relevant;
The structural representation of a kind of voice command disposal system that Fig. 9 provides for the embodiment of the present invention.
Embodiment
Embodiments providing a kind of voice command disposal route and system, for improving the response efficiency of user voice command input, improving Consumer's Experience.
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.
Below be described in detail respectively.
Please refer to Fig. 1, the schematic flow sheet of a kind of voice command disposal route that Fig. 1 provides for the embodiment of the present invention, wherein, institute's speech commands disposal route comprises:
Step 101, acquisition are used for the voice command of service request;
Wherein, institute's speech commands comprises the voice signal corresponding with order the Keywords section and the corresponding voice signal of command parameter part;
Be understandable that, institute's speech commands is the voice command that user inputs, and institute's speech commands is used for service request, as " phoning Zhang San ", " requesting songs " friend " " etc.
Step 102, according to institute's speech commands, extract phonetic feature sequence;
Step 103, according to preset class of service decoding network, institute's speech commands to be decoded, determine the class of service of current request;
Be understandable that, described according to preset class of service decoding network, institute's speech commands is decoded after, the order keyword of institute's speech commands can be determined.
Step 104, determine the business decoding network corresponding with described class of service;
Wherein, determine that a kind of Alternate embodiments of the business decoding network corresponding with described class of service is: according to the class of service of the current request determined, preset class of service is corresponding with the relation of business decoding network show in search, obtain and determine the business decoding network corresponding with described class of service.
Be understandable that, class of service is corresponding with the relation of business decoding network to be shown to be set in advance in voice command disposal system; In the present embodiment, described class of service can comprise at least one in the business such as telephonic communication business, requesting songs business and local navigation service.
In addition, the embodiment of the present invention can also use extend markup language (Extensible Markup Language, XML) mode of configuration file comes record traffic classification and business decoding network, or, the service parameter that the embodiment of the present invention can also be carried by the class of service determined, generate and determine the business decoding network corresponding with described class of service, not doing concrete restriction herein.
Step 105, according to the business decoding network determined, decode in two phases is carried out to decoded voice command, determine the complete content of institute's speech commands.
From the above, a kind of voice command disposal route that the embodiment of the present invention provides, the voice command for service request obtained is decoded first, determine the class of service of this request, thereafter corresponding according to described class of service business decoding network carries out decode in two phases to institute's speech commands, thus determines the complete content of voice command; In the business decoding network corresponding with class of service, carry out decode in two phases, the quick response to voice command can be realized, in particular improving the practicality for supporting many kinds of parameters scale not wait voice command control system, also improving vocabulary discrimination.
Alternatively, please refer to Fig. 2, the schematic flow sheet of a kind of voice command disposal route that Fig. 2 provides for the embodiment of the present invention, wherein, according to preset class of service decoding network, institute's speech commands is decoded, determines the class of service (step 103) of current request, can comprise:
Step 1031, in preset class of service decoding network, search for and determine to have with described phonetic feature sequence the first decoding paths of maximum similarity;
Step 1032, class of service according to described first decoding paths determination current request.
Wherein, according to the class of service of the first decoding paths determination current request with described phonetic feature sequence with maximum similarity, namely determine the type of service of institute's speech commands, can reduce between decode empty pointedly, improve decoding efficiency; In the embodiment of the present invention, the first decoding paths of described maximum similarity can think the optimal path of the class of service determining current request in this embodiment.
Alternatively, please refer to Fig. 3, the schematic flow sheet of a kind of voice command disposal route that Fig. 3 provides for the embodiment of the present invention, in a kind of embodiment, business decoding network according to determining carries out decode in two phases to decoded voice command, determine the complete content (step 105) of institute's speech commands, can specifically comprise:
Step 1051-a, in the business decoding network corresponding with described class of service determined, select and determine to have with described phonetic feature sequence the second decoding paths of maximum similarity;
Be understandable that, in this embodiment, described business decoding network is the decoding network comprising business related command keyword and command parameter;
Step 1052-a, determine the complete content of institute's speech commands according to described second decoding paths;
In present embodiment, because this business decoding network contains order keyword corresponding in institute's speech commands and command parameter, namely described step 1051-a and described step 1052-a decodes to the entirety of institute's speech commands, therefore, the result of this decode in two phases is the complete content of voice command.
Be understandable that, utilize the business decoding network corresponding with described class of service and with described phonetic feature sequence, there is the second decoding paths of maximum similarity, decode in two phases is carried out to institute's speech commands, determine the complete content of institute's speech commands, reduce more targetedly between decode empty, improve decoding efficiency, the second decoding paths of described maximum similarity can think the optimal path of the complete content determining institute's speech commands in this embodiment.
Separately it should be noted that, in the embodiment of the present invention, the decoding paths that decode procedure uses for the first time is the first decoding paths, the decoding paths that second time decode procedure uses is the second decoding paths, described first decoding paths and described second decoding paths are all to represent when in pre-treating method, with predicate sound characteristic sequence, there is the decoding paths of maximum similarity, described first decoding paths can be identical with described second decoding paths, also can be different, the embodiment of the present invention does not do concrete restriction to this.
Alternatively, please refer to Fig. 4, the schematic flow sheet of a kind of voice command disposal route that Fig. 4 provides for the embodiment of the present invention, in another kind of embodiment, business decoding network according to determining carries out decode in two phases to decoded voice command, determine the complete content (step 105) of institute's speech commands, can specifically comprise:
Step 1051-b, obtain the voice segments information corresponding to order keyword in the preset decoded voice command of class of service decoding network;
Step 1052-b, obtain the voice signal corresponding with command parameter part in institute speech commands;
Step 1053-b, determine the service parameter decoding network corresponding with described business decoding network;
Step 1054-b, in described business decoding network, described voice signal to be decoded, select and determine to have with described phonetic feature sequence the 3rd decoding paths of maximum similarity;
Step 1055-b, determine the voice command parameter corresponding to described voice signal according to described 3rd decoding paths;
Step 1056-b, according to institute's speech segment information and institute's speech commands parameter, determine the complete content of institute's speech commands.
In present embodiment, according to business decoding network, the voice signal corresponding to command parameter part in voice command is decoded, and the voice command parameter obtained is combined with the voice segments information corresponding to described order keyword the complete content obtaining voice command.
Be understandable that, utilize the business decoding network corresponding with described class of service and with described phonetic feature sequence, there is the 3rd decoding paths of maximum similarity, the voice signal corresponding to command parameter part in institute's speech commands is decoded, determine the complete content of institute's speech commands, reduce more targetedly between decode empty, improve decoding efficiency, the 3rd decoding paths of described maximum similarity can think the optimal path of the complete content determining institute's speech commands in this embodiment.
Separately it should be noted that, in the embodiment of the present invention, the decoding paths used in a kind of embodiment of second time decode procedure is the second decoding paths, the decoding paths used in another kind of embodiment is the 3rd decoding paths, described second decoding paths and described 3rd decoding paths are all to represent in the process of voice command being carried out to second time decoding, with predicate sound characteristic sequence, there is the decoding paths of maximum similarity, described second decoding paths can be identical with described 3rd decoding paths, also can be different, the embodiment of the present invention does not do concrete restriction to this.
From the above, a kind of voice command disposal route that the embodiment of the present invention provides, the voice command for service request obtained is decoded first, determine the class of service of this request, thereafter corresponding according to described class of service business decoding network carries out decode in two phases to institute's speech commands, thus determines the complete content of voice command; In the business decoding network corresponding with class of service, carry out decode in two phases, the quick response to voice command can be realized, in particular improving the practicality for supporting many kinds of parameters scale not wait voice command control system, also improving vocabulary discrimination.
In order to understand a kind of voice command disposal route provided by the invention better, below with an application scenarios for example is specifically described:
First, can system initialization be carried out, generate the decoding network of voice command class of service and the decoding network of each order correlation parameter.The decoding network of institute's speech commands class of service is used for such as " making a phone call ", " navigate to ", " program request ", " open ", the identification of operational orders such as " search ", and the decoding network of order correlation parameter forms according to each order request is customized, such as to " making a phone call " business, the contact name that system may prestore according to system builds, and also can comprise digital recognition network etc. further; Secondly, receive the voice command of user's input, extract phonetic feature sequence, institute's speech commands is used for service request; Then, the decoding of class of service decoding network is used to determine class of service; Then, the business decoding network corresponding with described class of service is determined; Finally, the search optimal path in described business decoding network, determines voice command content.
Incorporated by reference to reference to figure 5, Fig. 5 being the decoding network search schematic diagram adopted in the embodiment of the present invention.First system determines single type of service in class of service decoding network, determines optimal path subsequently in the relevant business decoding network of this type of service.Such as to the input of " phoning Zhang San ", first system is decoded and is determined type of service in class of service decoding network, such as " to make a phone call " business, obtain the business decoding network that this business is corresponding subsequently, and determine complete voice command content in described business decoding network.The mode of this classification process greatly reduces decode search cost, improves decoding efficiency.
As shown in Figure 6, Fig. 6 is the response schematic diagram to " making a phone call " business in the embodiment of the present invention, obviously to the voice command of user's input, first system confirms that type of service is " making a phone call " business, 1,000 relevant to business subsequently name decodings, determine command parameter, obtain complete decoded result.The response time of this system is the response time of 1,000 entries, and carries out in 1,000 limited name decode empty owing to decoding, and its recognition accuracy is also further improved.
In the embodiment of the present invention, first possible type of service is determined according to the voice command of user's input, concrete system is search optimal path in class of service decode empty as shown in Figure 7a, the decoding paths with maximum similarity is selected to be optimal result, wherein, absorbing model is illustrated for simulating the distribution of other voice signals of non-voice order; After acquisition type of service, the business decoding network that the type of service that Systematic selection is determined is correlated with is as new decoding network, and in described business decoding network, select the result with maximum path similarity as decoded result, as shown in Figure 7b, be business network decoding schematic diagram that " making a phone call " business is relevant.
System described above, under classification decoding framework, carries out decode in two phases to voice command respectively in class of service decoding network and in business decoding network.Further, in order to improve the decoding efficiency of system, this case also proposed a kind of new algorithm:
To user voice signal decoding in class of service decoding network, obtain optimal path and the voice segments information corresponding to order keyword; Obtain the voice signal that in voice command, command parameter part is corresponding to input as new voice command, due to when class of service decoding network is decoded, the voice signal of voice command argument section is absorbed by absorbing model, therefore it can be used as new voice command input; The service parameter decoding network that acquisition business decoding network is corresponding; In described service parameter decoding network, described new voice command input is decoded, obtain voice command parameter; According to voice segments information and voice command parameter determination voice command complete content.Preferably, service parameter decoding network only comprises voice command parameter, avoids the repeat decoding to voice command, improves decoding efficiency.As shown in Figure 8, be service parameter decoding network schematic diagram that " making a phone call " business is relevant.
From the above, the voice command disposal route that the invention process provides, achieve the classification process to user voice signal, first business command type is determined by simple coding/decoding method efficiently, interior to specific instructions content decoding in the decode empty that business is relevant subsequently, achieve the synchronous support response to different scales voice command, improve system effectiveness and recognition accuracy.
For ease of better implementing the technical scheme of the embodiment of the present invention, the embodiment of the present invention is also provided for the related system implementing speech commands disposal route.Wherein the implication of noun is identical with upper speech commands disposal route, and specific implementation details can explanation in reference method embodiment.
Please refer to Fig. 9, the structural representation of a kind of voice command disposal system that Fig. 9 provides for the embodiment of the present invention, wherein, institute's speech commands disposal system comprises acquisition module 901, extraction module 902, first decoder module 903, determination module 904 and the second decoder module 905:
Described acquisition module 901, for obtaining the voice command for service request;
Wherein, institute's speech commands comprises the voice signal corresponding with order the Keywords section and the corresponding voice signal of command parameter part;
Be understandable that, institute's speech commands is the voice command that user inputs, and institute's speech commands is used for service request, as " phoning Zhang San ", " requesting songs " friend " " etc.
Described extraction module 902, for according to institute's speech commands, extracts phonetic feature sequence;
Described first decoder module 903, for decoding to institute's speech commands according to preset class of service decoding network, determines the class of service of current request;
Be understandable that, described according to preset class of service decoding network, institute's speech commands is decoded after, the order keyword of institute's speech commands can be determined.
Described determination module 904, for determining the business decoding network corresponding with described class of service;
Described second decoder module 905, for carrying out decode in two phases according to the business decoding network determined to decoded voice command, determines the complete content of institute's speech commands.
Wherein, determine that a kind of Alternate embodiments of the business decoding network corresponding with described class of service is: according to the class of service of the current request determined, preset class of service is corresponding with the relation of business decoding network show in search, obtain and determine the business decoding network corresponding with described class of service.
Be understandable that, class of service is corresponding with the relation of business decoding network to be shown to be set in advance in voice command disposal system; In the present embodiment, described class of service can comprise at least one in the business such as telephonic communication business, requesting songs business and local navigation service.
In addition, the embodiment of the present invention can also use the mode of expandable mark language XML configuration file to come record traffic classification and business decoding network, or, the service parameter that the embodiment of the present invention can also be carried by the class of service determined, generate and determine the business decoding network corresponding with described class of service, not doing concrete restriction herein.
From the above, a kind of voice command disposal system that the embodiment of the present invention provides, the voice command for service request obtained is decoded first, determine the class of service of this request, thereafter corresponding according to described class of service business decoding network carries out decode in two phases to institute's speech commands, thus determines the complete content of voice command; In the business decoding network corresponding with class of service, carry out decode in two phases, the quick response to voice command can be realized, in particular improving the practicality for supporting many kinds of parameters scale not wait voice command control system, also improving vocabulary discrimination.
Alternatively, in some embodiments, described first decoder module 903, specifically in preset class of service decoding network, can search for and determine to have the first decoding paths with described phonetic feature sequence maximum similarity; According to the class of service of described first decoding paths determination current request.
Wherein, according to the class of service of the first decoding paths determination current request with described phonetic feature sequence with maximum similarity, namely determine the type of service of institute's speech commands, can reduce between decode empty pointedly, improve decoding efficiency; In the embodiment of the present invention, the first decoding paths of described maximum similarity can think the optimal path of the class of service determining current request in this embodiment.
Alternatively, in some embodiments, described second decoder module 905, can specifically in the business decoding network corresponding with described class of service determined, select and determine to have with described phonetic feature sequence the second decoding paths of maximum similarity, described business decoding network is the decoding network comprising business related command keyword and command parameter; The complete content of institute's speech commands is determined according to described second decoding paths.
In present embodiment, because this business decoding network contains order keyword corresponding in institute's speech commands and command parameter, namely described second decoder module 905 is decoded to the entirety of institute's speech commands, and therefore, the result of this decode in two phases is the complete content of voice command.
Be understandable that, utilize the business decoding network corresponding with described class of service and with described phonetic feature sequence, there is the second decoding paths of maximum similarity, decode in two phases is carried out to institute's speech commands, determine the complete content of institute's speech commands, reduce more targetedly between decode empty, improve decoding efficiency, the second decoding paths of described maximum similarity can think the optimal path of the complete content determining institute's speech commands in this embodiment.
Further alternatively, under yet other embodiments, described second decoder module 905, can specifically for obtaining the voice segments information in the preset decoded voice command of class of service decoding network corresponding to order keyword; Obtain the voice signal corresponding with command parameter part in institute speech commands; In described service parameter decoding network, described voice signal is decoded, select and determine to have with described phonetic feature sequence the 3rd decoding paths of maximum similarity; The voice command parameter corresponding to described voice signal is determined according to described 3rd decoding paths; According to institute's speech segment information and institute's speech commands parameter, determine the complete content of institute's speech commands.
In present embodiment, according to business decoding network, the voice signal corresponding to command parameter part in voice command is decoded, and the voice command parameter obtained is combined with the voice segments information corresponding to described order keyword the complete content obtaining voice command.
Be understandable that, utilize the business decoding network corresponding with described class of service and with described phonetic feature sequence, there is the 3rd decoding paths of maximum similarity, the voice signal corresponding to command parameter part in institute's speech commands is decoded, determine the complete content of institute's speech commands, reduce more targetedly between decode empty, improve decoding efficiency, the 3rd decoding paths of described maximum similarity can think the optimal path of the complete content determining institute's speech commands in this embodiment.
Separately it should be noted that, in the embodiment of the present invention, the decoding paths that decode procedure uses for the first time is the first decoding paths, the decoding paths that second time decode procedure uses is the second decoding paths or the 3rd decoding paths, described first decoding paths, described second decoding paths and described 3rd decoding paths are all to represent when in pre-treating method, with predicate sound characteristic sequence, there is the decoding paths of maximum similarity, described first decoding paths, described second decoding paths can be identical with described 3rd decoding paths, also can be different, the embodiment of the present invention does not do concrete restriction to this.
Those skilled in the art can be well understood to, for convenience and simplicity of description, the specific works process of each functional module in the system of foregoing description and system and application scenarios, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
From the above, a kind of voice command disposal system that the embodiment of the present invention provides, the voice command for service request obtained is decoded first, determine the class of service of this request, thereafter corresponding according to described class of service business decoding network carries out decode in two phases to institute's speech commands, thus determines the complete content of voice command; In the business decoding network corresponding with class of service, carry out decode in two phases, the quick response to voice command can be realized, in particular improving the practicality for supporting many kinds of parameters scale not wait voice command control system, also improving vocabulary discrimination.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiments that the application provides, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form of SFU software functional unit also can be adopted to realize.
If described integrated unit using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computer read/write memory medium.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words or all or part of of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. various can be program code stored medium.
Above a kind of voice command disposal route provided by the present invention and system are described in detail, for one of ordinary skill in the art, according to the thought of the embodiment of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.
Claims (8)
1. a voice command disposal route, is characterized in that, comprising:
Obtain the voice command being used for service request;
According to institute's speech commands, extract phonetic feature sequence;
According to preset class of service decoding network, institute's speech commands is decoded, determine the class of service of current request;
Determine the business decoding network corresponding with described class of service;
According to the business decoding network determined, decode in two phases is carried out to institute's speech commands, determine the complete content of institute's speech commands.
2. method according to claim 1, is characterized in that, describedly decodes to institute's speech commands according to preset class of service decoding network, determines that the class of service of current request comprises:
In preset class of service decoding network, search for and determine to have with described phonetic feature sequence the first decoding paths of maximum similarity;
According to the class of service of described first decoding paths determination current request.
3. method according to claim 1 and 2, is characterized in that, the described business decoding network according to determining carries out decode in two phases to institute's speech commands, determines the complete content of institute's speech commands, comprising:
In the business decoding network corresponding with described class of service determined, select and determine to have with described phonetic feature sequence the second decoding paths of maximum similarity, described business decoding network is the decoding network comprising business related command keyword and command parameter;
The complete content of institute's speech commands is determined according to described second decoding paths.
4. method according to claim 1 and 2, is characterized in that, the described business decoding network according to determining carries out decode in two phases to institute's speech commands, determines the complete content of institute's speech commands, comprising:
Obtain the voice segments information in the preset decoded voice command of class of service decoding network corresponding to order keyword;
Obtain the voice signal corresponding with command parameter part in institute speech commands;
Determine the service parameter decoding network corresponding with described business decoding network;
In described service parameter decoding network, described voice signal is decoded, select and determine to have with described phonetic feature sequence the 3rd decoding paths of maximum similarity;
The voice command parameter corresponding to described voice signal is determined according to described 3rd decoding paths.
According to institute's speech segment information and institute's speech commands parameter, determine the complete content of institute's speech commands.
5. a voice command disposal system, is characterized in that, comprising:
Acquisition module, for obtaining the voice command for service request;
Extraction module, for according to institute's speech commands, extracts phonetic feature sequence;
First decoder module, for decoding to institute's speech commands according to preset class of service decoding network, determines the class of service of current request;
Determination module, for determining the business decoding network corresponding with described class of service;
Second decoder module, for carrying out decode in two phases according to the business decoding network determined to decoded voice command, determines the complete content of institute's speech commands.
6. system according to claim 5, is characterized in that, described first decoder module, specifically in preset class of service decoding network, searches for and determines to have the first decoding paths with described phonetic feature sequence maximum similarity; According to the class of service of described first decoding paths determination current request.
7. the system according to claim 5 or 6, it is characterized in that, described second decoder module, specifically in the business decoding network corresponding with described class of service determined, select and determine to have with described phonetic feature sequence the second decoding paths of maximum similarity, described business decoding network is the decoding network comprising business related command keyword and command parameter; The complete content of institute's speech commands is determined according to described second decoding paths.
8. the system according to claim 5 or 6, is characterized in that, described second decoder module, specifically for obtaining the voice segments information in the preset decoded voice command of class of service decoding network corresponding to order keyword; Obtain the voice signal corresponding with command parameter part in institute speech commands; In described service parameter decoding network, described voice signal is decoded, select and determine to have with described phonetic feature sequence the 3rd decoding paths of maximum similarity; The voice command parameter corresponding to described voice signal is determined according to described 3rd decoding paths; According to institute's speech segment information and institute's speech commands parameter, determine the complete content of institute's speech commands.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310287147.9A CN104282301A (en) | 2013-07-09 | 2013-07-09 | Voice command processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310287147.9A CN104282301A (en) | 2013-07-09 | 2013-07-09 | Voice command processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104282301A true CN104282301A (en) | 2015-01-14 |
Family
ID=52257124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310287147.9A Pending CN104282301A (en) | 2013-07-09 | 2013-07-09 | Voice command processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104282301A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105426357A (en) * | 2015-11-06 | 2016-03-23 | 武汉卡比特信息有限公司 | Fast voice selection method |
WO2016202064A1 (en) * | 2015-06-18 | 2016-12-22 | 中兴通讯股份有限公司 | Instruction processing method and apparatus |
CN106653013A (en) * | 2016-09-30 | 2017-05-10 | 北京奇虎科技有限公司 | Speech recognition method and device |
CN106683662A (en) * | 2015-11-10 | 2017-05-17 | 中国电信股份有限公司 | Speech recognition method and device |
CN107293294A (en) * | 2016-03-31 | 2017-10-24 | 腾讯科技(深圳)有限公司 | A kind of voice recognition processing method and device |
CN107437416A (en) * | 2017-05-23 | 2017-12-05 | 阿里巴巴集团控股有限公司 | A kind of consultation service processing method and processing device based on speech recognition |
CN108899028A (en) * | 2018-06-08 | 2018-11-27 | 广州视源电子科技股份有限公司 | Voice awakening method, searching method, device and terminal |
CN108932944A (en) * | 2017-10-23 | 2018-12-04 | 北京猎户星空科技有限公司 | Coding/decoding method and device |
WO2021072955A1 (en) * | 2019-10-16 | 2021-04-22 | 科大讯飞股份有限公司 | Decoding network construction method, voice recognition method, device and apparatus, and storage medium |
-
2013
- 2013-07-09 CN CN201310287147.9A patent/CN104282301A/en active Pending
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016202064A1 (en) * | 2015-06-18 | 2016-12-22 | 中兴通讯股份有限公司 | Instruction processing method and apparatus |
CN105426357A (en) * | 2015-11-06 | 2016-03-23 | 武汉卡比特信息有限公司 | Fast voice selection method |
CN106683662A (en) * | 2015-11-10 | 2017-05-17 | 中国电信股份有限公司 | Speech recognition method and device |
CN107293294B (en) * | 2016-03-31 | 2019-07-16 | 腾讯科技(深圳)有限公司 | A kind of voice recognition processing method and device |
CN107293294A (en) * | 2016-03-31 | 2017-10-24 | 腾讯科技(深圳)有限公司 | A kind of voice recognition processing method and device |
CN106653013A (en) * | 2016-09-30 | 2017-05-10 | 北京奇虎科技有限公司 | Speech recognition method and device |
CN106653013B (en) * | 2016-09-30 | 2019-12-20 | 北京奇虎科技有限公司 | Voice recognition method and device |
CN107437416A (en) * | 2017-05-23 | 2017-12-05 | 阿里巴巴集团控股有限公司 | A kind of consultation service processing method and processing device based on speech recognition |
CN107437416B (en) * | 2017-05-23 | 2020-11-17 | 创新先进技术有限公司 | Consultation service processing method and device based on voice recognition |
CN112802459A (en) * | 2017-05-23 | 2021-05-14 | 创新先进技术有限公司 | Consultation service processing method and device based on voice recognition |
CN112802459B (en) * | 2017-05-23 | 2024-06-18 | 创新先进技术有限公司 | Consultation service processing method and device based on voice recognition |
CN108932944A (en) * | 2017-10-23 | 2018-12-04 | 北京猎户星空科技有限公司 | Coding/decoding method and device |
CN108932944B (en) * | 2017-10-23 | 2021-07-30 | 北京猎户星空科技有限公司 | Decoding method and device |
CN108899028A (en) * | 2018-06-08 | 2018-11-27 | 广州视源电子科技股份有限公司 | Voice awakening method, searching method, device and terminal |
WO2021072955A1 (en) * | 2019-10-16 | 2021-04-22 | 科大讯飞股份有限公司 | Decoding network construction method, voice recognition method, device and apparatus, and storage medium |
US12223947B2 (en) | 2019-10-16 | 2025-02-11 | Iflytek Co., Ltd. | Decoding network construction method, voice recognition method, device and apparatus, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104282301A (en) | Voice command processing method and system | |
US10043520B2 (en) | Multilevel speech recognition for candidate application group using first and second speech commands | |
KR101418163B1 (en) | Speech recognition repair using contextual information | |
CN106816148B (en) | Speech recognition apparatus and method | |
CN107644638B (en) | Audio recognition method, device, terminal and computer readable storage medium | |
CN106683677B (en) | Voice recognition method and device | |
KR102046486B1 (en) | Information inputting method | |
US20140350933A1 (en) | Voice recognition apparatus and control method thereof | |
JP2018005218A (en) | Automatic interpretation method and apparatus | |
CN107039038A (en) | Learn personalised entity pronunciation | |
CN101681365A (en) | Method and apparatus for distributed voice searching | |
US20080162125A1 (en) | Method and apparatus for language independent voice indexing and searching | |
CN103903619A (en) | Method and system for improving accuracy of speech recognition | |
CN102549652A (en) | Information retrieving apparatus, information retrieving method and navigation system | |
CN103377652A (en) | Method, device and equipment for carrying out voice recognition | |
JP5717794B2 (en) | Dialogue device, dialogue method and dialogue program | |
CN110956955B (en) | Voice interaction method and device | |
CN101415259A (en) | System and method for searching information of embedded equipment based on double-language voice enquiry | |
US20150317998A1 (en) | Method and apparatus for recognizing speech, and method and apparatus for generating noise-speech recognition model | |
US20170301346A1 (en) | Hierarchical speech recognition decoder | |
CN105487668A (en) | Display method and apparatus for terminal device | |
WO2015119267A1 (en) | Uttered sentence collection apparatus and method | |
KR102140391B1 (en) | Search method and electronic device using the method | |
CN106653006B (en) | Searching method and device based on interactive voice | |
KR102536944B1 (en) | Method and apparatus for speech signal processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150114 |