CN115376513B - Voice interaction method, server and computer readable storage medium - Google Patents
Voice interaction method, server and computer readable storage medium Download PDFInfo
- Publication number
- CN115376513B CN115376513B CN202211276398.2A CN202211276398A CN115376513B CN 115376513 B CN115376513 B CN 115376513B CN 202211276398 A CN202211276398 A CN 202211276398A CN 115376513 B CN115376513 B CN 115376513B
- Authority
- CN
- China
- Prior art keywords
- voice
- information
- state machine
- state
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000003993 interaction Effects 0.000 title claims abstract description 56
- 238000012545 processing Methods 0.000 claims abstract description 45
- 238000004364 calculation method Methods 0.000 claims abstract description 22
- 238000011068 loading method Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims description 41
- 230000009471 action Effects 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 4
- 230000003068 static effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 238000003672 processing method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R16/00—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
- B60R16/02—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
- B60R16/037—Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
- B60R16/0373—Voice control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mechanical Engineering (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Navigation (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a voice interaction method, which comprises the following steps: receiving a user voice request forwarded by a vehicle after a vehicle voice function is awakened; loading a state machine configuration template according to the voice request of the user to analyze the state machine configuration template to obtain an analyzer; carrying out logic calculation according to the analyzer to obtain a matching state; and updating the refusal processing of each sound zone in the vehicle seat cabin according to the matching state so as to complete voice interaction. In the invention, the vehicle cabin is divided into a plurality of sound areas, and the state machine configuration template is loaded aiming at the received voice request forwarded by the vehicle, so that the state machine configuration template can be analyzed to obtain the analyzer. The parser can determine the matching condition of the current state and the rule of the state machine configuration template, so as to confirm the switching or changing of the state machine state according to the matching condition. The configurable templates in the state machine are convenient for users to set or change according to specific requirements, and have strong scalability and better user experience.
Description
Technical Field
The present invention relates to the field of speech technology, and in particular, to a speech interaction method, a server, and a computer readable storage medium.
Background
With the development of automatic driving technology, vehicles may support voice control services, such as voice control of window opening, and the like. In an actual vehicle scenario, a user may make voices from multiple voice areas in the vehicle, and the made voices are not all requests for the vehicle-mounted system, which requires that the vehicle-mounted voice processor be able to reject recognition of useless information in all voices, extract a voice request for the user and respond.
In the related art, the rejection processing of the voice request can only aim at a single-tone area scene, and the rejection of the irrelevant voice request in the single-tone area scene can not meet the requirement of multi-tone area voice interaction in the vehicle by combining current text information, an automatic voice recognition technology, a confidence level representation voice characteristic and the like.
Disclosure of Invention
The invention provides a voice interaction method, a server and a computer readable storage medium.
The voice interaction method of the invention comprises the following steps:
receiving a user voice request forwarded by the vehicle after the vehicle voice function is awakened;
loading a state machine configuration template according to the user voice request to analyze the state machine configuration template to obtain an analyzer;
performing logic calculation according to the analyzer to obtain a matching state;
and updating the refusal processing of each sound zone in the vehicle seat cabin according to the matching state so as to complete voice interaction.
In this way, in the invention, the vehicle cabin is divided into a plurality of sound areas, and the state machine configuration template is loaded for the received voice request forwarded by the vehicle, so that the state machine configuration template can be analyzed to obtain the analyzer. The parser can determine the matching condition of the current state and the rule of the state machine configuration template, so as to confirm the switching or changing of the state machine state according to the matching condition. The configurable templates in the state machine are convenient for users to set or change according to specific requirements, and have strong scalability and better user experience.
The loading the state machine configuration template according to the user voice request to parse the state machine configuration template to obtain a parser, including:
determining a target state machine configuration template from pre-written state machine configuration templates according to the user voice request;
and loading the target state machine configuration template through a template analysis class and analyzing the target state machine configuration template to obtain the analyzer.
Thus, the template loading class can fill in each specific information of the voice request into the state machine configuration template, and define loading and processing methods to obtain a parser under corresponding states and logic configuration so as to facilitate subsequent logic calculation or introduction of more templates.
The determining a target state machine configuration template in a pre-written state machine configuration template according to the user voice request comprises the following steps:
determining matching round information, wake-up voice zone information, rejection sub-tag confidence information and current rejection mode state information of a state machine corresponding to the user voice request;
and matching the matching round information, the wake-up sound zone information, the voice zone information, the rejection sub-label confidence information and the current rejection mode state information in the pre-written state machine configuration template to determine the target state machine configuration template.
In this way, the matching round information, wake-up voice zone information, rejection sub-label confidence information and current rejection mode state information of the state machine corresponding to the voice request of the user are determined, and matching is carried out in a pre-written state machine configuration template according to the above information of the voice request, so that the state machine configuration template conforming to the current state information is determined.
The matching is performed in the pre-written state machine configuration template according to the matching round information, the wake-up sound zone information, the voice zone information, the rejection sub-label confidence information and the current rejection mode state information to determine the target state machine configuration template, and the method comprises the following steps:
and matching the state description template written in advance according to the matching round information, the wake-up sound zone information, the dialogue sound zone information, the rejection sub-label confidence information and the current rejection mode state information to determine a target state description template.
In this way, the relevant information of the specific voice request is matched with the pre-written state description template to determine the state description template conforming to the current state information.
The matching is performed in the pre-written state machine configuration template according to the matching round information, the wake-up sound zone information, the voice zone information, the rejection sub-label confidence information and the current rejection mode state information to determine the target state machine configuration template, and the method comprises the following steps:
and matching the logic description template written in advance according to the matching round information, the wake-up sound zone information, the voice zone information, the rejection sub-label confidence information and the current rejection mode state information to determine a target logic description template.
In this way, the relevant information of the specific voice request is matched with the pre-written state description template to determine the logic description template conforming to the current state information.
The logic calculation according to the parser obtains a matching state, including:
and mapping the state description template and the logic description template analyzed by the analyzer through a logic calculation class, and calculating to obtain the matching state.
Thus, the logic calculation class module can compare and calculate the current actual state description template analyzed by the analyzer with the constructed logic description template to obtain a matching state so as to facilitate the jump of a subsequent state machine.
Updating rejection processing of each sound zone in the vehicle cabin according to the matching state to complete voice interaction, including:
and updating refusal processing of each sound zone in the vehicle cabin through the state machine action class under the condition that the matching state is successful, so as to complete voice interaction.
Thus, the state machine action class determines that the current state information is matched with the logic rule according to the logic calculation class output, the state machine state can be converted, rejection processing of each sound zone in the vehicle seat cabin is updated, and the voice interaction process is completed.
Updating rejection processing of each sound zone in the vehicle cabin according to the matching state to complete voice interaction, including:
and under the condition that the matching state is unsuccessful, the action class of the state machine keeps the refusal processing of each sound zone in the vehicle cabin so as to complete voice interaction.
Thus, the state machine action class determines that the current state information is not matched with the logic rule according to the logic calculation class output, the state machine state can not be converted, the refusal processing of each voice zone in the vehicle cabin is maintained, and the voice interaction process is completed.
The server of the present invention comprises a memory and a processor, wherein the memory stores a computer program, and the computer program realizes the method when being executed by the processor.
The computer readable storage medium of the present invention stores a computer program which, when executed by one or more processors, implements the method described above.
Additional aspects and advantages of embodiments of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a voice interaction method of the present invention;
fig. 2 is a schematic view of a vehicle cabin of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present invention and are not to be construed as limiting the embodiments of the present invention.
Referring to fig. 1, the present invention provides a voice interaction method, which includes:
01: receiving a user voice request forwarded by a vehicle after a vehicle voice function is awakened;
02: loading a state machine configuration template according to the voice request of the user to analyze the state machine configuration template to obtain an analyzer;
03: carrying out logic calculation according to the analyzer to obtain a matching state;
04: and updating the refusal processing of each sound zone in the vehicle seat cabin according to the matching state so as to complete voice interaction.
The invention also provides a server, which comprises a memory and a processor. The voice interaction method of the invention can be realized by the server of the invention. Specifically, the memory stores a computer program, and the processor is configured to receive a user voice request forwarded by the vehicle after a voice function of the vehicle is awakened, load a state machine configuration template according to the user voice request to parse the state machine configuration template to obtain a parser, perform logic calculation according to the parser to obtain a matching state, and update rejection processing of each voice zone in a vehicle cabin according to the matching state to complete voice interaction.
In particular, the voice assistant of the vehicle-mounted system provides convenience for users in the cabin, and the users can control software or vehicle parts in the cabin through voice interaction. For ease of interaction, the voice assistant may support a continuous conversation, as the in-vehicle space belongs to a shared environment, the voice assistant may be faced with receiving conversations from different users with the voice assistant, conversations between different users, and so on. By setting the semantic rejection rule, the voice assistant can give the same feedback to the same voice request which appears again, and meanwhile, hope that the voice assistant can modify the feedback rule generated by the voice assistant to the specific voice request as conveniently as possible according to the user requirement, so that the voice assistant can better serve the user, and the use experience of voice interaction of the user is improved.
It will be appreciated that in a scenario of a multi-zone continuous conversation, i.e. after the voice assistant has been awakened, the users at different locations within the support cabin are jointly engaged in a multi-turn conversation with the voice assistant. Multiple users may interact with each other around the same theme with higher degrees of freedom, and the setting of the semantic rejection rules needs to be more careful than the case of a single voice zone.
The wake-up vehicle voice function is a voice assistant that wakes up the vehicle, and the wake-up voice request may be a wake-up word set by a manufacturer or customized by a user. After the voice assistant wakes up, the user in the cabin may have a continuous number of rounds of conversations with the voice assistant. After the dialogue reaches a set round threshold, or a voice request of the user is not received within a predetermined time, the dialogue ends.
The wake-up voice zone is the voice zone position where the user who sends the wake-up voice request is located. If the main driver wakes up the voice assistant, the wake-up zone is the main driver zone. The wake-up tone region information is tone region position information corresponding to the wake-up tone region.
And regarding the voice zone, namely the voice zone position of the user in which the voice assistant acquires the voice interaction, and the voice zone in which the conversation is in progress is the conversation voice zone. For example, in a certain scene, after the voice assistant is awakened, the main driving user and the auxiliary driving user interact with the voice assistant successively, so that in the scene, voice requests sent by the main driving user and the auxiliary driving user are acquired successively by the voice assistant, and the voice areas where the main driving user and the auxiliary driving user are located belong to opposite voice areas. The voice zone and wake-up zone may be the same or different.
The rejection process is used to discriminate during the interaction which ones of the user's voice requests are spoken to the voice assistant, recall them and execute them, and which ones are not spoken to the voice assistant, filtering them as noise. In the invention, two kinds of refusing processes with different refusing degrees are provided, wherein, the refusing process of only the voice request with high recall correlation degree is the first refusing process, and the refusing process with low refusing degree is the second refusing process.
In the invention, a state machine is introduced, and is used for recording the refusing mode of each voice zone in the voice interaction process, and continuously updating the state machine according to the received corresponding voice zone information and the voice request of the user in the voice interaction process. In an actual car scene, the requirement of the user on the refusal rule of the voice assistant is not necessarily constant. When the voice assistant wakes up, the rejection process of each voice zone needs to be updated following the progress of the voice interaction. The user can modify the rejection rules of the voice assistant according to the change of the user's own requirements, and the modularized state machine configuration template ensures that the user can conveniently add, delete or modify the specific rejection rules of the voice assistant.
In summary, in the invention, the vehicle cabin is divided into a plurality of voice zones, and the state machine configuration template is loaded for the received voice request forwarded by the vehicle, so that the state machine configuration template can be analyzed to obtain the analyzer. The parser can determine the matching condition of the current state and the rule of the state machine configuration template, so as to confirm the switching or changing of the state machine state according to the matching condition. The configurable templates in the state machine are convenient for users to set or change according to specific requirements, and have strong scalability and better user experience.
021: determining a target state machine configuration template from pre-written state machine configuration templates according to a user voice request;
022: loading a target state machine configuration template through the template analysis class and analyzing the target state machine configuration template to obtain an analyzer.
The processor is used for determining a target state machine configuration template in pre-written state machine configuration templates according to a user voice request, and is used for loading the target state machine configuration template through a template analysis class and analyzing the target state machine configuration template to obtain an analyzer.
Specifically, in the invention, a state machine configuration template is provided for a user to configure, and comprises a state description template and a logic description template. After the state machine configuration template is completed, loading the target state machine configuration template by the template analysis class variable and analyzing the target state machine configuration template to obtain an analyzer. The corresponding logic module and the state machine jump module can be loaded from the computer memory, so that the state machine can complete the subsequent logic judgment.
Taking the first rejection process as an example, a configuration item "light_state_template" is a key value queue table (direct) type condition set, namely a state template, wherein various types of label information about the voice request, including information such as business rules, response round number, rejection sub-labels and confidence level thereof, can be filled in. The configuration item "light_logical_template" is a key-value queue table (direct) type condition set, i.e. a logical template, in which there is a condition judgment rule statement that can be filled in with part of the relevant information about the voice request. The filled state templates and logic templates are classified into self-state-template and self-state-template modules which can be analyzed by the template analysis class, the template analysis class defines functions of load-state-template and load-logic-template, loads state templates of load-state-template and logic template, processes the corresponding relation of the states and logic and defines a processing function of process-logic-template as an output analyzer.
It will be appreciated that the parser formed by the template parse classes may facilitate further computation by the logic processing classes. Besides the state templates and the logic templates, the parser can process more than two or even more types of templates in a packing way, so that the state and logic parsing is facilitated.
Thus, the template loading class can fill in each specific information of the voice request into the state machine configuration template, and define loading and processing methods to obtain a parser under corresponding states and logic configuration so as to facilitate subsequent logic calculation or introduction of more templates.
Step 021 comprises:
0211: determining matching round information, wake-up sound zone information, opposite voice zone information, refused sub-tag confidence information and current refused mode state information of a state machine corresponding to a user voice request;
0212: and matching the voice zone information, the rejection sub-label confidence information and the current rejection mode state information in a pre-written state machine configuration template according to the matching round information, the wake-up voice zone information, and the target state machine configuration template.
The processor is used for determining matching round information, wake-up sound zone information, voice zone information, rejection sub-label confidence information and current rejection mode state information of the state machine corresponding to the user voice request, and matching the voice zone information, the rejection sub-label confidence information and the current rejection mode state information in a pre-written state machine configuration template according to the matching round information, the wake-up sound zone information, the voice zone information, the rejection sub-label confidence information and the current rejection mode state information to determine a target state machine configuration template.
Referring to fig. 2, for example, in a passenger cabin, the passenger cabin may be divided into 5 sound zones including a main driving sound zone, a secondary driving sound zone, a rear left-hand rear sound zone, a rear middle sound zone, and a rear right-hand rear sound zone. When the state machine template is configured, one or more voice areas can be selected as state condition configuration contents, and a plurality of voice pickup devices can be arranged in the seat cabin, so that the position information of the voice areas where a user sending a voice request is located is judged according to the acquired state information of the voice request.
Specifically, the existence of a condition variable in a state machine configuration template is required to populate a static description of the particular variable to form a state trigger. The status trigger name may be set to "triggerName" and the type is string (str) class. A key-value queue table (direct) type condition set can also be established, the name can be set as triggerDetail, and unordered parallel state variable information can be filled in the table.
The matching round information characterizes the times of voice requests sent by users in a voice assistant after the voice assistant wakes up. The variable name may be set to "turn" and the data type is integer (int).
The wake-up voice zone information is the voice zone position information corresponding to the wake-up voice zone, and the wake-up voice zone is the voice zone position of the user who sends the wake-up voice request. The variable name may be set to "soundLocation", and as described above, the type is an integer (int) class.
And regarding the voice zone, namely the voice zone position of the user in which the voice assistant acquires the voice interaction, and the voice zone in which the conversation is in progress is the conversation voice zone. The variable name may be set to "soundtrack", and as described above, the type is a string (str) class.
The rejection sub-tag information includes valid voice requests and invalid voice requests, and the validity or invalidity of the voice requests is determined by a rejection mode of the state machine. The variable name may be set to "rejSublabel" and the type is a string (str) class.
The reject sub-label confidence information characterizes the confidence level of the reject sub-label. The variable name may be set to "rejsublibel" with a type of floating point (float) class.
The rejection mode state information is information for representing the rejection processing state of the state machine for any voice request, and includes a current state and a target state. The variable names may be set to "source" and "dest", respectively, of the type string (str) class.
Matching the acquired voice request with a pre-written state machine configuration template, and determining a target state machine configuration template corresponding to the current voice request.
In one example, the user requirement is "front-row wake-up, back-row enter first rejection processing", the variables that the state machine setting template needs to be specifically configured are wake-up tone region information "soundLocation", voice region information "soundArea", and state information "dest" of target rejection processing, and the rejection sub-tag, rejection sub-tag confidence, matching round information and current rejection processing may not be set or set to any state. Specifically, { "source": "," triggerDetail ": {" turn ": null," rejsubabael ": null," rejConf ": null } }. Wherein "source" represents a state that does not define the current rejection mode; "turn": null, "rejSublabel": null, "rejConf": null stands for matching round, reject sub-label and reject sub-label confidence information rule are not set.
In this way, the matching round information, wake-up voice zone information, rejection sub-label confidence information and current rejection mode state information of the state machine corresponding to the voice request of the user are determined, and matching is carried out in a pre-written state machine configuration template according to the above information of the voice request, so that the state machine configuration template conforming to the current state information is determined.
Step 0212 comprises:
02121: and matching the voice zone information, the rejection sub-label confidence information and the current rejection mode state information in a pre-written state description template according to the matching round information, the wake-up zone information so as to determine a target state description template.
The processor is used for matching the voice zone information, the refusal identification sub-label confidence information and the current refusal identification mode state information in a pre-written state description template according to the matching round information, the wake-up zone information so as to determine a target state description template.
Specifically, the state machine configuration template needs to fill in specific static description of each state variable in the current scene to form a state trigger, that is, after the state trigger is filled with the static description condition of the specific state variable, whether the current scene state meets the state machine jump condition can be judged. The name of the state trigger may be set to "triggerName" and the type is string (str) class. A key-value pair list (direct) type data set can also be established, the name can be set as triggerDetail, and unordered parallel state variable information can be filled in the list.
The matching round information characterizes the times of voice requests sent by users in a voice assistant after the voice assistant wakes up. The variable name can be set to "turn" and the data type is integer (int), i.e. the variable can take all natural numbers.
In particular, in order to distinguish the wake-up voice zone and the dialogue voice zone, different identification methods can be used for the wake-up voice zone and the dialogue voice zone, as in the invention, if the voice zone is the wake-up voice zone, the five voice zones of main driving, auxiliary driving, left rear, middle and right rear can be respectively represented by integer (int) 1, 2, 3, 4 and 5; if the voice region is the current pair of voice regions, it can be represented by a string (str) LF, RF, LR, MR, RR, respectively.
The wake-up voice zone information is the voice zone position information corresponding to the wake-up voice zone, and the wake-up voice zone is the voice zone position of the user who sends the wake-up voice request. The variable name may be set to "soundLocation", and as described above, the type is an integer (int) class. Specifically, if the main driver wakes up the voice assistant, the wake-up sound zone is the main driver sound zone, and can be expressed as "soundLocation" in the state machine, namely "1". It can be understood that when the same state machine template is configured, a plurality of sound areas are also selected as wake-up sound area conditions, for example, if the condition to be set is primary driving or secondary driving as the wake-up sound area, that is, if the front-row wake-up can meet the user requirement, the sound area can be expressed as "soundLocation" in the state machine, namely "1/2".
And regarding the voice zone, namely the voice zone position of the user in which the voice assistant acquires the voice interaction, and the voice zone in which the conversation is in progress is the conversation voice zone. The variable name may be set to "soundtrack", and as described above, the type is a string (str) class. Specifically, if the left rear, middle and right rear voice zones are simultaneously used for conversation, the voice zones are all voice zones of the rear row, and can be expressed as 'soundtrack' in a state machine, wherein 'LR/MR/RR'.
The rejection sub-tag information includes valid voice requests and invalid voice requests, and the validity or invalidity of the voice requests is determined by rejection processing of the state machine. The variable name may be set to "rejSublabel" and the type is a string (str) class. As in the present invention, there are two kinds of valid voice requests "clear" and invalid voice requests "noise".
The reject sub-label confidence information characterizes the confidence level of the reject sub-label. The variable name may be set to "rejsublibel" with a type of floating point (float) class. In the present invention, floating point numbers of 0.00 to 1.00 may be taken.
The rejection mode state information is information for representing the rejection processing state of the state machine for any voice request, and includes a current state and a target state. The variable names may be set to "source" and "dest", respectively, of the type string (str) class. As in the present invention, there are two kinds of valid voice requests "clear" and invalid voice requests "noise".
Matching the acquired voice request with a pre-written state machine configuration template, and determining a target state machine configuration template corresponding to the current voice request.
In one example, the user demand is "wake up as front tier," then the back tier enters the first rejection process "specifically configured as {" triggerName ":" front_wakeup "," source ":", "triggerDetail": { "soundLocation": "1/2", "soundArea": "LR/RR/MR", "turn": null "," rejSubleb ": null", "rejConf": null }, "dest": "light" }. Wherein "soundLocation" 1/2 "represents front-row wakeup; "soundrea" is that "LR/RR/MR" represents that the current speaker is the back row; "turn": null, "rejSublabel": null, "rejConf": null stands for rule not set; "Source": ". Times.represents what can be any state at present; "dest" means "right" that the target state is the first rejection process.
In the interaction process, according to the acquired voice request of front-row wake-up, a template written according to the requirement that the front-row wake-up is performed, and the back-row enters the first refusal processing is matched as a target state machine configuration template.
In this way, the relevant information of the specific voice request is matched with the pre-written state description template to determine the state description template conforming to the current state information.
Step 02121 includes:
021211: and matching the voice zone information, the rejection sub-label confidence information and the current rejection mode state information in a pre-written logic description template according to the matching round information, the wake-up zone information so as to determine a target logic description template.
The processor is used for matching the voice zone information, the rejection sub-label confidence information and the current rejection mode state information in a pre-written logic description template according to the matching round information, the wake-up zone information so as to determine a target logic description template.
Specifically, the logic description template needs to be filled with the static description of the specific logic rule variable, the static description of the rule variable corresponds to the state variable item one by one, and a state trigger is formed, namely after the state trigger is filled with the static description condition of the specific state variable, whether the current scene state meets the state machine jump condition can be judged. The status trigger name may be set to "triggerName" and the type is string (str) class. A key-value pair list (direct) type data set can also be established, the name can be set as triggerDetail, and unordered and parallel logic rule information can be filled in the list.
The variable names of the matching round information, the wake-up voice area information, the rejection sub-tag confidence information and the current rejection mode state information are disclosed in step 02121, and are not described herein.
In particular, all rules contained in a key value pair list (subject) in the logic description template should be logic judgment sentences, so that all types of logic rule judgment results of a voice request are set as character string (str) type variables, and four types of logic judgment results including "exist", "less than" thin "," more than "thin" and "exist" can be set, wherein less than "thin" and "more than" thin "only support value type judgment, including integer (int) and floating point (float), and the existence of" exist "and" exist "are simultaneously supported for value type judgment and character string type judgment.
Matching the acquired voice request with a pre-written state machine configuration template, and determining a target state machine configuration template corresponding to the current voice request.
In one example, the user demand is "wake up as front tier," then the back tier enters the first rejection process "specifically configured as {" triggerName ":" front_wakeup "," source ": null", "triggerDetail": { "soundLocation": "exist", "soundArea": "exist", "turn": null "," rejsubabael ": null", "rejConf": null }, dest ": null }. Wherein "soundLocation" is represented by "exist" in "1/2" of the current wake-up zone presence status template; "soundDarea" in "LR/RR/MR" representing the current speaker voice zone presence status template; "turn": null, "rejSublabel": null, "rejConf": null stands for rule not set; "source" means that null represents that the rule is not set, i.e., it can be in any state at present; "dest" means that the current target state is the first rejection process.
In the interaction process, according to the acquired voice request of front-row wake-up, a template written according to the requirement that the front-row wake-up is performed, and the back-row enters the first refusal processing is matched as a target state machine configuration template.
In this way, the relevant information of the specific voice request is matched with the pre-written state description template to determine the logic description template conforming to the current state information.
031: and mapping the state description template and the logic description template analyzed by the analyzer through the logic calculation class, and calculating to obtain a matching state.
The processor is used for carrying out mapping processing on the state description template and the logic description template analyzed by the analyzer through the logic calculation class and calculating to obtain a matching state.
Specifically, in the invention, the logic calculation class has the mapping processing according to the principle of one-to-one correspondence with the state description template and the logic description template which are analyzed by the analyzer in the logic description template, and the logic calculation is carried out to obtain the matching state.
Taking the first refusal processing jump with the requirement of 'front-row awakening' and the requirement of 'back-row entering into the first refusal processing' as an example, firstly, acquiring the rule which is not 'null' in the 'triggerDetail' table of the logic description template 'right_logical_template', namely, an awakening sound zone 'soundLocation' variable and a sound zone 'soundtrack' variable. The logic computation class can define functions of "exist", "less_than", "more_than", "equal" to perform logic judgment, and map the state template and the logic template according to a one-to-one correspondence principle. In this example, it is determined whether the actual values of the "soundLocation" and "soundtrack" variables of the current system exist within the limit value range of the logic description template "light_logical_template". If both the data types are satisfied, the output result 'match' can be stored in the character string (str) type data 'self result'; if not, outputting other results or directly jumping out of the processing process without outputting any results.
Further, the computing methods of the logical computing classes increase as the number of processing items increases.
Thus, the logic calculation class module can compare and calculate the current actual state description template analyzed by the analyzer with the constructed logic description template to obtain a matching state so as to facilitate the jump of a subsequent state machine.
041: under the condition that the matching state is successful, updating the refusal processing of each voice zone in the vehicle cabin through the state machine action class so as to complete voice interaction.
And updating the refusal processing of each sound zone in the vehicle seat cabin by the processor under the condition that the matching state is successful through the state machine action class so as to complete voice interaction.
Specifically, under the condition that the matching state is successful, the state machine action class updates rejection processing of each sound zone in the vehicle seat cabin to complete voice interaction.
Taking the first rejection processing jump with the requirement of ' front-row wake-up ' and the requirement of the rear-row entering the first rejection processing mode ' as an example, the state machine action class can define functions of ' get_parameter ', ' get_transition ' and ' get_trigger ' to respectively obtain a parser, a current jump action and a jump state, and under the condition that the matching state is successful, namely the logic operation class output result ' self.result ' is ' match ', the state machine action class can update the rejection processing of each sound zone in the vehicle seat cabin through the ' get_transition ' function to finish voice interaction.
Further, state Machine jumps by the "get_transition" function may be implemented using the Python's own transition toolkit Machine class.
Thus, the state machine action class determines that the current state information is matched with the logic rule according to the logic calculation class output, the state machine state can be converted, rejection processing of each sound zone in the vehicle seat cabin is updated, and the voice interaction process is completed.
042: under the condition that the matching state is unsuccessful, the action class of the state machine keeps the rejection processing of each voice zone in the vehicle cabin so as to complete voice interaction.
The processor is used for maintaining rejection processing of each sound zone in the vehicle cabin to complete voice interaction under the condition that the matching state is unsuccessful through the state machine action class.
Specifically, if the matching state is unsuccessful, the state machine action class does not update the rejection processing of each voice zone, and the state machine keeps the current state, so that voice interaction is completed.
Taking the first refusal processing jump with the requirement of ' front-row awakening and the back-row entering the first refusal processing ' as an example, the state machine action class can define functions of ' get_parameter ', ' get_transition ' and ' get_trigger ' to respectively obtain a parser, a current jump action and a jump state, and under the condition that the matching state is not successful, namely the logic operation class output result ' self.result ' is not match ', the refusal processing update of each voice region is not carried out, the state machine keeps the current state, and the voice interaction is completed.
Further, when the matching state is that the matching is not successful, and the logic operation class output result "self.result" is not "match", other matching results can be output to achieve the goal that the state machine does not jump, and the jump flow can also be directly jumped out without the output result, so that voice interaction is completed.
Thus, the state machine action class determines that the current state information is not matched with the logic rule according to the logic calculation class output, the state machine state can not be converted, the refusal processing of each voice zone in the vehicle cabin is maintained, and the voice interaction process is completed.
The computer readable storage medium of the present invention stores a computer program which, when executed by one or more processors, implements the method described above.
The configuration of the state templates and the logical templates is illustrated below with two scene examples:
example one: the user requirements and specific configurations are shown in table 1. In the setting of the state template, "soundLocation" is that "1/2" represents front-row wake-up; "soundtrack" means "LR/RR/MR" that the current speaker is in the back row, i.e. the voice zones are left back, middle and right back voice zones; "turn" 2 means 2 rounds of matching, i.e. the number of times the back-row voice zone makes a voice request is 2 times; "rejSublabel": clear "represents that only valid voice requests are included in the voice assistant count; "Source": x "stands for the current voice assistant can be in any refused mode state; "dest" means that the current voice assistant's target state is the second rejection process, i.e., no matter what rejection process the current voice assistant is under, the current state needs to be maintained or jumped to the second rejection process if it meets the template requirements. In the logic template configuration, "soundLocation" means "exist" representing that the current wake-up sound zone exists in "soundLocation" of the state template, "1/2", namely the current wake-up sound zone is in the front row; "soundtrack" means "exist" representing the current speaker voice zone presence status template in "soundtrack" means "LR/RR/MR", i.e. the current pair voice zone is in the back row; "turn". More_than represents no less than match, i.e. the back row in the current state needs to reach 2 times or more set in the state template for the dialog turn; "rejSublabel" means that the equivalent represents a perfect match, i.e., only valid voice requests are identified; "source" means that null stands for rule not to set, namely do not make any limit to the existing refusal to process; "dest" means that the current target state is the second rejection process, i.e., no matter what rejection process the current voice assistant is under, the current state needs to be held or jumped to the first rejection process if it meets the template requirements.
TABLE 1
Example two: the user requirements and specific configurations are shown in table 2. In the setting of the state template, "soundLocation" is that "1/2" represents front-row wake-up; "soundtrack" means "LR/RR/MR" that the current speaker is in the back row, i.e. the voice zones are left back, middle and right back voice zones; "turn" 3 represents 3 rounds of matching, i.e., the number of times the back-row voice zone makes a voice request is 3; "rejSublabel" means that invalid voice requests are counted by the voice assistant recognition; "Source": the current voice assistant can be in any refusal processing state; "dest" means that the current voice assistant's target state is the first rejection process, i.e., no matter what rejection process the current voice assistant is under, the current state needs to be held or jumped to the first rejection process if it meets the template requirements. In the logic template configuration, "soundLocation" means "exist" representing that the current wake-up sound zone exists in "soundLocation" of the state template, "1/2", namely the current wake-up sound zone is in the front row; "soundtrack" means "exist" representing the current speaker voice zone presence status template in "soundtrack" means "LR/RR/MR", i.e. the current pair voice zone is in the back row; "turn". More_than represents no less than match, i.e. the back row in the current state needs to reach 2 times or more set in the state template for the dialog turn; "rejSublabel" means that the equivalent represents a perfect match, i.e., only invalid voice requests are counted; "source" means that null stands for rule not to set, namely do not make any limit to the existing refusal to process; "dest" means that the current target state is the first rejection process, i.e., no matter what rejection process the current voice assistant is under, the current state needs to be held or jumped to the first rejection process if it meets the template requirements.
TABLE 2
In the description of the present specification, reference to the terms "above," "specifically," "understandably," "further," and the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable requests for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (10)
1. A method of voice interaction, comprising:
receiving a user voice request forwarded by a vehicle after a vehicle voice function is awakened;
loading a state machine configuration template according to the user voice request to analyze the state machine configuration template to obtain an analyzer, wherein the state machine configuration template has a condition judgment rule statement which can be filled in and comprises various label information about the voice request and part of related information about the voice request, wherein the various label information about the voice request comprises a business rule, a response round number, a wake-up voice area, a voice area, a rejection sub-label and confidence level of the voice request;
performing logic calculation according to the analyzer to obtain a matching state;
and updating the refusal processing of each sound zone in the vehicle seat cabin according to the matching state so as to complete voice interaction.
2. The voice interaction method of claim 1, wherein loading a state machine configuration template according to the user voice request to parse the state machine configuration template results in a parser, comprising:
determining a target state machine configuration template from pre-written state machine configuration templates according to the user voice request;
and loading the target state machine configuration template through a template analysis class and analyzing the target state machine configuration template to obtain the analyzer.
3. The voice interaction method of claim 2, wherein determining a target state machine configuration template from among pre-written state machine configuration templates according to the user voice request comprises:
determining matching round information, wake-up voice zone information, rejection sub-tag confidence information and current rejection mode state information of a state machine corresponding to the user voice request;
and matching the matching round information, the wake-up sound zone information, the voice zone information, the rejection sub-label confidence information and the current rejection mode state information in the pre-written state machine configuration template to determine the target state machine configuration template.
4. The voice interaction method of claim 3, wherein the matching in the pre-written state machine configuration templates according to the matching round information, the wake-up zone information, the dialogue zone information, the rejection sub-tag confidence information, and the current rejection mode state information to determine the target state machine configuration template comprises:
and matching the state description template written in advance according to the matching round information, the wake-up sound zone information, the dialogue sound zone information, the rejection sub-label confidence information and the current rejection mode state information to determine a target state description template.
5. The voice interaction method of claim 4, wherein the matching in the pre-written state machine configuration templates according to the matching round information, the wake-up zone information, the dialogue zone information, the rejection sub-tag confidence information, and the current rejection mode state information to determine the target state machine configuration template comprises:
and matching the logic description template written in advance according to the matching round information, the wake-up sound zone information, the voice zone information, the rejection sub-label confidence information and the current rejection mode state information to determine a target logic description template.
6. The voice interaction method according to claim 5, wherein the performing logic calculation according to the parser to obtain a matching state includes:
and mapping the state description template and the logic description template analyzed by the analyzer through a logic calculation class, and calculating to obtain the matching state.
7. The voice interaction method according to claim 6, wherein updating the rejection process of each sound zone in the vehicle cabin according to the matching status to complete voice interaction comprises:
and updating refusal processing of each sound zone in the vehicle cabin through the state machine action class under the condition that the matching state is successful, so as to complete voice interaction.
8. The voice interaction method according to claim 6, wherein updating the rejection process of each sound zone in the vehicle cabin according to the matching status to complete voice interaction comprises:
and under the condition that the matching state is unsuccessful, the action class of the state machine keeps the refusal processing of each sound zone in the vehicle cabin so as to complete voice interaction.
9. A server comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, implements the method of any of claims 1-8.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by one or more processors, implements the method according to any of claims 1-8.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211276398.2A CN115376513B (en) | 2022-10-19 | 2022-10-19 | Voice interaction method, server and computer readable storage medium |
PCT/CN2023/125013 WO2024083128A1 (en) | 2022-10-19 | 2023-10-17 | Voice interaction method, server, and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211276398.2A CN115376513B (en) | 2022-10-19 | 2022-10-19 | Voice interaction method, server and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115376513A CN115376513A (en) | 2022-11-22 |
CN115376513B true CN115376513B (en) | 2023-05-12 |
Family
ID=84072707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211276398.2A Active CN115376513B (en) | 2022-10-19 | 2022-10-19 | Voice interaction method, server and computer readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115376513B (en) |
WO (1) | WO2024083128A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115376513B (en) * | 2022-10-19 | 2023-05-12 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and computer readable storage medium |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6480823B1 (en) * | 1998-03-24 | 2002-11-12 | Matsushita Electric Industrial Co., Ltd. | Speech detection for noisy conditions |
US6671669B1 (en) * | 2000-07-18 | 2003-12-30 | Qualcomm Incorporated | combined engine system and method for voice recognition |
CN103186416B (en) * | 2011-12-29 | 2016-06-22 | 比亚迪股份有限公司 | Build the method for multi-task multi-branch process, state machine and the method for execution |
US10462619B2 (en) * | 2016-06-08 | 2019-10-29 | Google Llc | Providing a personal assistant module with a selectively-traversable state machine |
CN107665708B (en) * | 2016-07-29 | 2021-06-08 | 科大讯飞股份有限公司 | Intelligent voice interaction method and system |
CN107316643B (en) * | 2017-07-04 | 2021-08-17 | 科大讯飞股份有限公司 | Voice interaction method and device |
CN111008532B (en) * | 2019-12-12 | 2023-09-12 | 广州小鹏汽车科技有限公司 | Voice interaction method, vehicle and computer readable storage medium |
CN111063350B (en) * | 2019-12-17 | 2022-10-21 | 思必驰科技股份有限公司 | Voice interaction state machine based on task stack and implementation method thereof |
CN112164401B (en) * | 2020-09-18 | 2022-03-18 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and computer-readable storage medium |
CN112927692B (en) * | 2021-02-24 | 2023-06-16 | 福建升腾资讯有限公司 | Automatic language interaction method, device, equipment and medium |
WO2022222045A1 (en) * | 2021-04-20 | 2022-10-27 | 华为技术有限公司 | Speech information processing method, and device |
CN114267347A (en) * | 2021-11-01 | 2022-04-01 | 惠州市德赛西威汽车电子股份有限公司 | A Multimodal Rejection Method and System Based on Intelligent Voice Interaction |
CN114155853A (en) * | 2021-12-08 | 2022-03-08 | 斑马网络技术有限公司 | Rejection method, device, equipment and storage medium |
CN113990300B (en) * | 2021-12-27 | 2022-05-10 | 广州小鹏汽车科技有限公司 | Voice interaction method, vehicle, server and computer-readable storage medium |
CN115376513B (en) * | 2022-10-19 | 2023-05-12 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and computer readable storage medium |
-
2022
- 2022-10-19 CN CN202211276398.2A patent/CN115376513B/en active Active
-
2023
- 2023-10-17 WO PCT/CN2023/125013 patent/WO2024083128A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024083128A1 (en) | 2024-04-25 |
CN115376513A (en) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115376513B (en) | Voice interaction method, server and computer readable storage medium | |
CN110413753B (en) | Question-answer sample expansion method and device | |
CN115457959B (en) | Voice interaction method, server and computer readable storage medium | |
CN115457951B (en) | A voice control method, device, electronic device and storage medium | |
CN108882202A (en) | A kind of vehicle-mounted exchange method and device based on smart phone | |
CN112149419A (en) | Method, device and system for normalized automatic naming of fields | |
CN116486815A (en) | Vehicle-mounted voice signal processing method and device | |
CN111858865B (en) | Semantic recognition method, semantic recognition device, electronic equipment and computer readable storage medium | |
US20120021783A1 (en) | Telecommunications services apparatus and methods | |
CN110321945A (en) | Exptended sample method, terminal, device and readable storage medium storing program for executing | |
CN117316159B (en) | Vehicle voice control method, device, equipment and storage medium | |
CN115662400A (en) | Processing method, device and equipment for voice interaction data of vehicle machine and storage medium | |
CN115512704B (en) | Voice interaction method, server and computer readable storage medium | |
CN117789696A (en) | Large language model prompt message determining method, server and storage medium | |
CN116069842A (en) | Data dump method and device | |
CN116010572A (en) | Processing method and device of man-machine conversation, storage medium and electronic equipment | |
CN111291889B (en) | Knowledge base construction method and device | |
JP2019145002A (en) | Information processing method, program, information processing device, and information processing system | |
CN107031617A (en) | The method and device that a kind of automobile intelligent drives | |
CN112863514A (en) | Voice application control method and electronic equipment | |
CN113033816A (en) | Processing method and device of machine learning model, storage medium and electronic equipment | |
CN115910035B (en) | Voice interaction method, server and computer readable storage medium | |
CN117454885B (en) | Method, device and storage medium for identifying intention of voice text | |
CN117407585A (en) | User guiding method, device, electronic equipment, readable storage medium and vehicle | |
US20050165601A1 (en) | Method and apparatus for determining when a user has ceased inputting data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |