CN111223485A - Intelligent interaction method and device, electronic equipment and storage medium - Google Patents
Intelligent interaction method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111223485A CN111223485A CN201911319401.2A CN201911319401A CN111223485A CN 111223485 A CN111223485 A CN 111223485A CN 201911319401 A CN201911319401 A CN 201911319401A CN 111223485 A CN111223485 A CN 111223485A
- Authority
- CN
- China
- Prior art keywords
- user
- information
- intention
- intelligent
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention provides an intelligent interaction method, which comprises the following steps: the intelligent voice assistant acquires voice information of a user; verifying the user identity according to the sound information; after the user identity authentication is passed, the intelligent voice assistant starts an open domain dialog, and the user intention is identified according to the open domain dialog; determining a business level according to the user intention; carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation; acquiring slot position values according to the key information and filling slot positions; and when the filled slot position meets the threshold value, executing the operation corresponding to the user intention. The invention also provides an intelligent interaction device, electronic equipment and a storage medium. According to the invention, the intelligent voice assistant can safely talk with the user and execute the operation after recognizing the intention of the dialogue.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an intelligent interaction method, an intelligent interaction device, electronic equipment and a storage medium.
Background
With the development of the artificial intelligence industry, the intelligent voice assistant also becomes a relatively mature field of the application of the artificial intelligence system. In the prior art, an intelligent voice assistant is generally applied to a mobile terminal, and a user may perform voice interaction with a machine assistant using a voice assistant function of the mobile terminal, so that the machine assistant may perform various operations on the mobile terminal under the voice control of the user. However, the existing intelligent voice assistant has low accuracy rate of intention recognition, so that the human-computer interaction fluency is poor.
Disclosure of Invention
In view of the foregoing, there is a need for an intelligent interaction method, apparatus, electronic device and storage medium, which can safely talk with a user through the intelligent voice assistant and perform operations after accurately recognizing the intention of the talk.
A first aspect of the present invention provides an intelligent interaction method, including:
the intelligent voice assistant acquires voice information of a user;
verifying the user identity according to the sound information;
after the user identity authentication is passed, the intelligent voice assistant starts an open domain dialog, and the user intention is identified according to the open domain dialog;
determining a business level according to the user intention;
carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation;
acquiring slot position values according to the key information and filling slot positions; and
and when the filled slot position meets a threshold value, executing the operation corresponding to the user intention.
Preferably, the step of verifying the user identity according to the voice information comprises:
extracting voiceprint features in the sound information;
matching the extracted voiceprint features with a pre-constructed voiceprint model;
when the extracted voiceprint features are matched with a pre-constructed voiceprint model, confirming that the user identity authentication is passed;
and when the extracted voiceprint features are not matched with the constructed voiceprint model, confirming that the user identity authentication is not passed.
Preferably, the service level is determined by querying a pre-established association table of intention and service level, wherein the association table of intention and service level is a corresponding relation between intention and service level established according to service logic of an application field and a knowledge base of the application field.
Preferably, the method further comprises:
receiving information authorized by a user and storing the authorized information, wherein the authorized information comprises account information;
after determining the service level according to the user intention, carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation;
acquiring slot position values according to the authorized information and the key information and filling the slot positions; and
and when the filled slot position meets a threshold value, executing the operation corresponding to the user intention.
Preferably, when the voice instruction corresponding to the user intention contains a plurality of services of a plurality of levels, determining an execution sequence of the services of the plurality of levels according to the closed domain dialog, and executing corresponding operations according to the execution sequence.
Preferably, the method further comprises:
when the voice instruction corresponding to the user intention contains a plurality of services with different levels, identifying the lowest level service in the services with different levels according to the intention and service level association table;
inquiring the superior service corresponding to the lowest level service;
and providing all lower-level services contained in the upper-level services for the user to select.
Preferably, the method further comprises:
when the filled slot position does not meet the threshold value, the intelligent voice assistant sends out a voice prompt according to the slot position value lacking in the slot;
when a plurality of missing slot position values exist, the intelligent voice assistant carries out voice prompt in sequence and fills the missing slot position values in sequence according to the reply of a user;
and starting a task corresponding to the filled slot position to execute an operation corresponding to the user intention.
A second aspect of the invention provides an intelligent interaction device, the device comprising:
the acquisition module is used for acquiring the voice information of the user through the intelligent voice assistant;
the verification module is used for verifying the identity of the user according to the sound information;
the recognition module is used for starting an open domain dialog by the intelligent voice assistant after the user identity authentication passes, and recognizing the user intention according to the open domain dialog;
a determining module for determining a service level according to the user intention;
the identification module is also used for carrying out closed domain conversation according to the service level and identifying key information in the closed domain conversation;
the acquisition module is further used for acquiring slot position values according to the key information and filling the slot positions; and
and the execution module is used for executing the operation corresponding to the user intention when the filled slot position meets the threshold value.
A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the intelligent interaction method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the intelligent interaction method.
The invention relates to an intelligent interaction method, an intelligent interaction device, electronic equipment and a storage medium. After the user identity authentication is passed, the intelligent voice assistant starts an open domain conversation, identifies the user intention according to the open domain conversation, determines the service level according to the user intention, carries out closed domain conversation according to the service level, identifies key information in the closed domain conversation, acquires slot positions and fills the slot positions according to the key information, and executes the operation corresponding to the user intention when the filled slot positions meet a threshold value. According to the method and the device, the user intention can be accurately identified, after the user intention enters the closed domain dialogue, the user enters the primary service interface according to the user intention, question-answering type communication is carried out in the primary service interface, the task execution is more intelligent, and the human-computer interaction is higher.
In addition, the invention can process the condition that the voice instruction corresponding to the user intention comprises a plurality of services with flat levels and a plurality of services with different levels, and can guide the user operation when the user expression is ambiguous until the whole closed-loop operation is completed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an intelligent interaction method according to an embodiment of the present invention.
Fig. 2 is a functional block diagram of an intelligent interaction device according to a second embodiment of the present invention.
Fig. 3 is a schematic diagram of an electronic device according to a third embodiment of the present invention.
The following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The terms "first," "second," and "third," etc. in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The intelligent interaction method provided by the embodiment of the invention is applied to electronic equipment. For the electronic device which needs to perform intelligent interaction, the intelligent interaction function provided by the method of the invention can be directly integrated on the electronic device, or a client for implementing the method of the invention is installed. For another example, the method provided by the present invention may also be run on a device such as a server in the form of a Software Development Kit (SDK), and an interface of the intelligent interaction function is provided in the form of an SDK, and the electronic device or other devices may implement the intelligent interaction function through the provided interface.
Example one
Fig. 1 is a flowchart of an intelligent interaction method according to an embodiment of the present invention. The execution sequence in the flow chart can be changed and some steps can be omitted according to different requirements.
In step 1, the intelligent voice assistant obtains voice information of the user.
In this embodiment, the intelligent interaction method is applied to an intelligent voice assistant, which may be a bank intelligent voice assistant. When the user processes related business at the bank, the user can directly interact with the intelligent voice assistant of the bank. The intelligent voice assistant receives the voice of the user through the microphone, so that the user can be identified and banking business processing can be carried out according to the intention of the user.
For example, when a user needs to perform an operation of querying an account balance, the intelligent voice assistant may be awakened first, and sound information of the user is acquired while the intelligent voice assistant is awakened.
And step S2, verifying the user identity according to the sound information.
In the present embodiment, voiceprint features in the sound information are extracted; matching the extracted voiceprint features with a pre-constructed voiceprint model; when the extracted voiceprint features are matched with a pre-constructed voiceprint model, confirming that the user identity authentication is passed; and when the extracted voiceprint features are not matched with the constructed voiceprint model, confirming that the user identity authentication is not passed.
Specifically, the step of identifying the user identity according to the voice information includes: in the voiceprint registration stage, a user voice sample is input into the system, Mel Frequency Cepstrum Coefficient (MFCC) of user voice information is extracted, then end-to-end training is carried out by using a resnet + ghestvlad network, voiceprint characteristics in the user voice information are obtained, and a voiceprint model of the user is constructed; and in the voiceprint authentication stage, when the user wakes up the intelligent voice assistant through the wake-up word in the near field, the intelligent voice assistant acquires the voice information of the user. And extracting the voiceprint features in the voice information, and matching the extracted voiceprint features with the constructed voiceprint model to verify the identity of the user. When the extracted voiceprint features are matched with the constructed voiceprint model, the user is determined to be a legal user; and when the extracted voiceprint features are not matched with the constructed voiceprint model, confirming that the user is illegal.
In another embodiment, in the voiceprint registration stage, a user voice sample may be input into the system, a spectrum of a user voice signal is extracted through short-time fourier transform, and then end-to-end training is performed using a resnet + ghostvlad network to obtain a voiceprint feature in the user voice signal, so as to construct a user voiceprint model.
For example, when a user uses a wake-up word and ten digits 0-9 in the near field as a voice sample, by extracting voiceprint features in the voice sample. A user's voiceprint model can be built during the voiceprint enrollment phase. Therefore, the user can be ensured to determine whether the user is a legal user according to the digital pronunciation specified by the intelligent voice assistant during the authentication. Therefore, the authentication accuracy can be effectively improved, the phenomenon that someone records in advance to make a fake can be avoided, and the safety is improved.
Preferably, the method further comprises: and when the number of mismatching times of the extracted voiceprint features and the constructed voiceprint model is greater than or equal to the preset number (such as 3 times), starting the password verification function.
And step S3, after the user identity authentication is passed, the intelligent voice assistant starts an open domain dialog, and the user intention is identified according to the open domain dialog.
In this embodiment, after the user identity authentication is passed, the intelligent voice assistant starts an open domain conversation, converts voice information in the open domain conversation into characters, and then performs intent recognition.
In this embodiment, a joint model of intent recognition and slot filling may be employed to recognize user intent in the open domain dialog. Specifically, the combined model of intention recognition and slot filling comprises three layers, wherein the first layer is one-hot coding of a question text; the second layer is characterized by a network structure formed by combining BLSTM and CNN, and the sharing of semantic information and intention information is learned by expressions; and the third layer is a CRF layer, decodes the shared representation, and uses a uniform loss function to learn an intention identification task and a slot filling task together. And the intention recognition and slot filling combined model performs one-hot coding on the question in the open domain dialog to obtain a sentence vector, inputs the sentence vector into a BLSTM model to obtain a new sequence vector representation, then performs CNN model processing to obtain a feature vector, and splices the feature vector and the sequence vector to obtain an output vector. The output vector is fed to the CRF layer, jointly decoding the best tag sequence, which is represented by associating each character wt in question u with a BIO tag. Where BIO denotes start (begin), continue (in), and other (out), respectively. Input tag X is denoted w1, w2 … wn and output tag Y is denoted s1, s2 … sn. For the combined model of intention identification and slot filling, an additional label is added at the end of an input question, and an intention information mark is connected at the tail end of an output label corresponding to the additional label to obtain a new input label and a new output label. The final hidden layer of the model contains the latent semantic representation of the entire input question for use in the intent recognition of the question.
In other embodiments, the user intention in the open-domain dialog may be identified by one or more combinations of an intention identification method based on a rule template, an intention identification method based on statistical feature classification, an intention identification method based on a word vector, an intention identification method based on a convolutional neural network, and the like, which will not be described herein again.
Step S4, determining a business level according to the user intention.
In the present embodiment, the service level is determined by referring to a table of pre-established association of intent and service level. The association table may establish a correspondence between the intention and the service level according to the service logic of the application field and the knowledge base of the field. For example, in the field of banking applications, the corresponding relationship between the intention and the business level may be established according to the business logic of the banking field and the knowledge base of the banking field.
For example, the primary service includes a credit card service, a payment service, a loan service, and the like. The second-level business corresponding to the credit card business comprises a consumption bill, a repayment amount, a repayment date and the like; the second-level service corresponding to the payment service comprises the payment of electric charge, the payment of gas charge and the payment of telephone charge; the secondary services corresponding to the loan service include quick credit, cash credit and intelligent credit.
In this embodiment, the primary service is strongly related to the user intention.
And step S5, performing closed domain dialogue according to the service level, and identifying key information in the closed domain dialogue.
In the present embodiment, a closed domain dialog refers to a dialog that is performed to clarify a user's purpose (or to clarify task details) after recognizing a user's intention. The key information is information extracted from lower level services in the closed domain dialog. For example, if only the first-level service information is received, the voice assistant broadcasts information according to the second-level service information corresponding to the received first-level service information, so as to prompt the user what the second-level service to be executed is.
For example, when the voice assistant receives the information that the primary service is "credit card" and does not receive other secondary service messages corresponding to the credit card, the voice assistant sends out a voice prompt "ask for whether you need to check the bill for consumption" or "ask for whether you need to inquire the payment amount" or "ask for whether you need to inquire the payment date", etc. When the user hears the voice prompt to reply, the intelligent voice assistant can determine secondary service information according to the reply information. Therefore, the user can enter the primary service interface according to the intention of the user and perform question-answering type communication in the primary service interface. So as to obtain the secondary service which needs to be executed by the user, thereby making the executing task more intelligent.
And step S6, acquiring slot position values according to the key information and filling the slots.
The fill slot refers to a process of completing information in order to convert the user's intention into an instruction specific to the user. In this embodiment, the slot position value is obtained according to the key information, and then the slot position is filled according to the slot position value. For example, when the text information corresponding to the voice information collected by the voice assistant is "view the bill for consumption in my credit card", the key information may be obtained as follows: i, credit card, and bill for consumption. The intelligent voice assistant can obtain slot position values according to the key information and fill the slot positions.
And step S7, when the filled slot position meets the threshold value, executing the operation corresponding to the user intention.
In this embodiment, when the filled slot position satisfies a threshold, the user intention is converted into a voice instruction, and the intelligent voice assistant performs an operation according to the voice instruction. The threshold is related to user intent. For example, when the user's intent is to perform a transfer transaction, two parameters are required, a transfer account number and a transfer amount. Then the corresponding threshold is also two. And if any one of the two thresholds is not completed, the operation corresponding to the user intention cannot be executed.
For example, when the text information corresponding to the voice information collected by the intelligent voice assistant is "view my credit card", it may be recognized that the user intention is: a credit card. The first-level service corresponding to the credit card is a credit card service. The intelligent voice assistant enters a closed domain of the credit card service to carry out conversation, extracts slot position information according to the slot position and calls a target interface. For example, a prompt voice "ask you for a credit card bill for consumption, a payment amount, a payment date or whether there is overdue" is issued. When the voice assistant receives the 'repayment amount' replied by the user, the intelligent voice assistant inquires the credit card consumption condition of the user and replies to the user according to the inquiry result. For example, the voice assistant reports "you should repay 2033 yuan this month".
In addition, the intelligent voice assistant can also directly call the target service interface to acquire information or perform operation. For example, when the received user voice message is "check how many elements i/m need to pay for the credit card in this month", the repayment balance in the secondary service interface is directly called to obtain the balance information. And then voice broadcasting is carried out to 'you should repay 2033 yuan this month'.
Preferably, before the intelligent voice assistant performs the operation corresponding to the user's intention, a prompt message is sent out for the user to confirm. For example, the intelligent voice assistant plays the task voice to the user for confirmation before execution, and executes the operation corresponding to the user intention after receiving the confirmation information of the user. And the intelligent voice assistant feeds back the result to the user when the execution is successful or not.
Preferably, the intelligent voice assistant stores information authorized by the user and performs corresponding operations according to the authorized information and the identified user intention. Specifically, receiving user authorized information and storing the authorized information, wherein the authorized information includes account information (e.g., gas account); after determining the service level according to the user intention, carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation; acquiring slot position values according to the authorized information and the key information and filling the slot positions; and when the filled slot position meets the threshold value, executing the operation corresponding to the user intention.
For example, when the intelligent voice assistant stores the information that the user authorizes the householder to pay the gas fee, the intelligent voice assistant has a memory function for the information authorized by the user, and can remember the defined authorization information of the user without multiple inquiries. When the user says "help me grandma to pay gas fee", the intelligent assistant recognizes the intention: and (6) paying. Determining the secondary service as follows according to the user intention: and (5) paying gas fee. The key information in the closed domain conversation is identified as ' I ' po (i.e. the user needing to pay) and pay the gas fee ', the gas account of the po can be searched according to the memory, and the gas fee can be paid by the po in the application program directly.
Preferably, when the voice instruction corresponding to the user intention contains a plurality of services of a plurality of levels, determining an execution sequence of the services of the plurality of levels according to the closed domain dialog, and executing corresponding operations according to the execution sequence.
When the voice command intended by the user contains two services of the same level, it is necessary to clarify which service interface the user wants to perform. For example, when the user says "help me inquire about the credit card service and the loan service", the intelligent voice assistant will prompt the user "you want to inquire about the credit card service or the loan service first"; after the intention of inquiring the credit card service and then inquiring the loan service in response to the user is obtained, the intelligent voice assistant executes the credit card service inquiry first and then executes the loan service inquiry.
Preferably, when the voice instruction corresponding to the user intention includes a plurality of services of different levels, the user is prompted to select an upper level service corresponding to a lowest level service of the services of different levels, and then all lower level services included in the upper level service are provided for the user to select. Specifically, when the voice instruction corresponding to the user intention contains a plurality of different levels of services, identifying the lowest level service in the plurality of different levels of services according to the intention and service level association table; inquiring the superior service corresponding to the lowest level service; and providing all lower-level services contained in the upper-level services for the user to select.
For example, when a voice instruction corresponding to the user intention contains two services at upper and lower levels, the upper level service of the two services is identified, prompt voice is sent to clarify the second-level service contained under the upper level service for the user, and the service requirement of the user is confirmed through closed domain dialogue again. When the user says "help me see intelligent credit under the credit card service", the intelligent voice assistant will prompt the user "do you want to inquire about intelligent credit under the credit card service? There is no intelligent loan service under the credit card service ". When the user confirms the inquiry loan service, the intelligent voice assistant gives all subordinate services (such as quick credit, intelligent credit and cash credit) under the loan service for the user to select.
Preferably, when the filled slot position does not meet the threshold value, the intelligent voice assistant sends out a voice prompt according to the slot position value lacking in the slot; when a plurality of missing slot position values exist, the intelligent voice assistant carries out voice prompt in sequence and fills the missing slot position values in sequence according to the reply of a user; and starting a task corresponding to the filled slot position to execute an operation corresponding to the user intention. Therefore, when the filled slot position does not meet the threshold value, the targeted question can be asked according to the slot position value lacking in the slot. When a plurality of grooves need to be clarified, questions need to be asked in sequence to ensure that the real groove position value information of the user is obtained, and the intelligent voice assistant can conveniently start the corresponding tasks of the grooves.
For example, when the user says "i want to pay", since it is not clear what fee to pay, and who to pay. Thus, the filled slot does not satisfy the threshold. At the moment, the intelligent voice assistant sends out voice prompts according to the slot position values lacking in the slots. For example, "ask for what fee to pay", "pay fee for who". And the intelligent voice assistant carries out voice prompt according to the sequence due to the current two missing slot position values. For example, the intelligent voice assistant prompts ' ask for what fee to pay ' by voice ', and when the user replies ' pay gas fee ', the gas fee is filled into the missing slot position; the intelligent voice assistant continues to prompt who pays for the user by voice, and fills the lacking slot position with the wife when the user replies that the wife pays for the wife; and starting a task corresponding to the filled slot position (namely paying gas fee for the grandma), so as to search a gas account of the grandma, and directly paying the gas fee for the grandma in an application program.
In summary, the intelligent interaction method provided by the present invention includes that the intelligent voice assistant obtains the voice information of the user; verifying the user identity according to the sound information; after the user identity authentication is passed, the intelligent voice assistant starts an open domain dialog, and the user intention is identified according to the open domain dialog; determining a business level according to the user intention; carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation; acquiring slot position values according to the key information and filling slot positions; and when the filled slot position meets the threshold value, executing the operation corresponding to the user intention. The user voiceprint recognition system is added, the intelligent voice assistant of the bank obtains the user voice information while being awakened, the user identity is judged after voiceprint features in voice are extracted, when the user controls the operation of the user through voice, only voice verification is needed, extra verification operation is not needed, the operation flow is simplified, and the safety is improved. The invention can accurately identify the user intention, and after entering the closed domain dialogue, the user intention enters the primary service interface according to the user intention, and the question-answer type communication is carried out in the primary service interface, so that the execution task is more intelligent, and the human-computer interaction is higher. In addition, the invention can process the condition that the voice instruction corresponding to the user intention comprises a plurality of services with flat levels and a plurality of services with different levels, and can guide the user operation when the user expression is ambiguous until the whole closed-loop operation is completed.
The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.
With reference to fig. 2 and fig. 3, a functional module and a hardware structure of an electronic device for implementing the intelligent interaction method are respectively described below.
Example two
FIG. 2 is a functional block diagram of an intelligent interactive device according to a preferred embodiment of the present invention.
In some embodiments, the intelligent interaction device 20 operates in an electronic device. The intelligent interaction device 20 may comprise a plurality of functional modules consisting of program code segments. Program code for various program segments in the intelligent interaction device 20 may be stored in the memory and executed by the at least one processor to perform intelligent interaction functions.
In this embodiment, the intelligent interactive device 20 may be divided into a plurality of functional modules according to the functions performed by the intelligent interactive device. The functional module may include: an acquisition module 201, a verification module 202, an identification module 203, a determination module 204, and an execution module 205. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In some embodiments, the functionality of the modules will be described in greater detail in subsequent embodiments.
The acquiring module 201 is used for acquiring the voice information of the user through the intelligent voice assistant.
In this embodiment, the intelligent voice assistant may be a bank intelligent voice assistant. When the user processes related business at the bank, the user can directly interact with the intelligent voice assistant of the bank. The intelligent voice assistant receives the voice of the user through the microphone, so that the user can be identified and banking business processing can be carried out according to the intention of the user.
For example, when a user needs to perform an operation of querying an account balance, the intelligent voice assistant may be awakened first, and sound information of the user is acquired while the intelligent voice assistant is awakened.
The verification module 202 is configured to verify the user identity according to the voice message.
In this embodiment, the verification module 202 is configured to extract a voiceprint feature in the sound information; matching the extracted voiceprint features with a pre-constructed voiceprint model; when the extracted voiceprint features are matched with a pre-constructed voiceprint model, confirming that the user identity authentication is passed; and when the extracted voiceprint features are not matched with the constructed voiceprint model, confirming that the user identity authentication is not passed.
Specifically, the identifying the user identity according to the sound information includes: in the voiceprint registration stage, a user voice sample is input into the system, Mel Frequency Cepstrum Coefficient (MFCC) of user voice information is extracted, then end-to-end training is carried out by using a resnet + ghestvlad network, voiceprint characteristics in the user voice information are obtained, and a voiceprint model of the user is constructed; and in the voiceprint authentication stage, when the user wakes up the intelligent voice assistant through the wake-up word in the near field, the intelligent voice assistant acquires the voice information of the user. And extracting the voiceprint features in the voice information, and matching the extracted voiceprint features with the constructed voiceprint model to verify the identity of the user. When the extracted voiceprint features are matched with the constructed voiceprint model, the user is determined to be a legal user; and when the extracted voiceprint features are not matched with the constructed voiceprint model, confirming that the user is illegal.
In another embodiment, in the voiceprint registration stage, a user voice sample may be input into the system, a spectrum of a user voice signal is extracted through short-time fourier transform, and then end-to-end training is performed using a resnet + ghostvlad network to obtain a voiceprint feature in the user voice signal, so as to construct a user voiceprint model.
For example, when a user uses a wake-up word and ten digits 0-9 in the near field as a voice sample, by extracting voiceprint features in the voice sample. A user's voiceprint model can be built during the voiceprint enrollment phase. Therefore, the user can be ensured to determine whether the user is a legal user according to the digital pronunciation specified by the intelligent voice assistant during the authentication. Therefore, the authentication accuracy can be effectively improved, the phenomenon that someone records in advance to make a fake can be avoided, and the safety is improved.
Preferably, the intelligent interaction device may further: and when the number of mismatching times of the extracted voiceprint features and the constructed voiceprint model is greater than or equal to the preset number (such as 3 times), starting the password verification function.
The identification module 203 is used for starting an open domain dialog by the intelligent voice assistant after the user identity authentication is passed, and identifying the user intention according to the open domain dialog.
In this embodiment, after the user identity authentication is passed, the intelligent voice assistant starts an open domain conversation, converts voice information in the open domain conversation into characters, and then performs intent recognition.
In this embodiment, a joint model of intent recognition and slot filling may be employed to recognize user intent in the open domain dialog. Specifically, the combined model of intention recognition and slot filling comprises three layers, wherein the first layer is one-hot coding of a question text; the second layer is characterized by a network structure formed by combining BLSTM and CNN, and the sharing of semantic information and intention information is learned by expressions; and the third layer is a CRF layer, decodes the shared representation, and uses a uniform loss function to learn an intention identification task and a slot filling task together. And the intention recognition and slot filling combined model performs one-hot coding on the question in the open domain dialog to obtain a sentence vector, inputs the sentence vector into a BLSTM model to obtain a new sequence vector representation, then performs CNN model processing to obtain a feature vector, and splices the feature vector and the sequence vector to obtain an output vector. The output vector is fed to the CRF layer, jointly decoding the best tag sequence, which is represented by associating each character wt in question u with a BIO tag. Where BIO denotes start (begin), continue (in), and other (out), respectively. Input tag X is denoted w1, w2 … wn and output tag Y is denoted s1, s2 … sn. For the combined model of intention identification and slot filling, an additional label is added at the end of an input question, and an intention information mark is connected at the tail end of an output label corresponding to the additional label to obtain a new input label and a new output label. The final hidden layer of the model contains the latent semantic representation of the entire input question for use in the intent recognition of the question.
In other embodiments, the user intention in the open-domain dialog may be identified by one or more combinations of an intention identification method based on a rule template, an intention identification method based on statistical feature classification, an intention identification method based on a word vector, an intention identification method based on a convolutional neural network, and the like, which will not be described herein again.
The determining module 204 is configured to determine a service level according to the user intention.
In the present embodiment, the service level is determined by referring to a table of pre-established association of intent and service level. The association table may establish a correspondence between the intention and the service level according to the service logic of the application field and the knowledge base of the field. For example, in the field of banking applications, the corresponding relationship between the intention and the business level may be established according to the business logic of the banking field and the knowledge base of the banking field.
For example, the primary service includes a credit card service, a payment service, a loan service, and the like. The second-level business corresponding to the credit card business comprises a consumption bill, a repayment amount, a repayment date and the like; the second-level service corresponding to the payment service comprises the payment of electric charge, the payment of gas charge and the payment of telephone charge; the secondary services corresponding to the loan service include quick credit, cash credit and intelligent credit.
In this embodiment, the primary service is strongly related to the user intention.
The identification module 203 is further configured to perform a closed domain dialogue according to the service level, and identify key information in the closed domain dialogue.
In the present embodiment, a closed domain dialog refers to a dialog that is performed to clarify a user's purpose (or to clarify task details) after recognizing a user's intention. The key information is information extracted from lower level services in the closed domain dialog. For example, if only the first-level service information is received, the voice assistant broadcasts information according to the second-level service information corresponding to the received first-level service information, so as to prompt the user what the second-level service to be executed is.
For example, when the voice assistant receives the information that the primary service is "credit card" and does not receive other secondary service messages corresponding to the credit card, the voice assistant sends out a voice prompt "ask for whether you need to check the bill for consumption" or "ask for whether you need to inquire the payment amount" or "ask for whether you need to inquire the payment date", etc. When the user hears the voice prompt to reply, the intelligent voice assistant can determine secondary service information according to the reply information. Therefore, the user can enter the primary service interface according to the intention of the user and perform question-answering type communication in the primary service interface. So as to obtain the secondary service which needs to be executed by the user, thereby making the executing task more intelligent.
The obtaining module 201 is further configured to obtain a slot position value according to the key information and fill the slot position.
The fill slot refers to a process of completing information in order to convert the user's intention into an instruction specific to the user. In this embodiment, the slot position value is obtained according to the key information, and then the slot position is filled according to the slot position value. For example, when the text information corresponding to the voice information collected by the voice assistant is "view the bill for consumption in my credit card", the key information may be obtained as follows: i, credit card, and bill for consumption. The intelligent voice assistant can obtain slot position values according to the key information and fill the slot positions.
The executing module 205 is configured to execute an operation corresponding to the user intention when the filled slot satisfies a threshold.
In this embodiment, when the filled slot position satisfies a threshold, the user intention is converted into a voice instruction, and the intelligent voice assistant performs an operation according to the voice instruction. The threshold is related to user intent. For example, when the user's intent is to perform a transfer transaction, two parameters are required, a transfer account number and a transfer amount. Then the corresponding threshold is also two. And if any one of the two thresholds is not completed, the operation corresponding to the user intention cannot be executed.
For example, when the text information corresponding to the voice information collected by the intelligent voice assistant is "view my credit card", it may be recognized that the user intention is: a credit card. The first-level service corresponding to the credit card is a credit card service. The intelligent voice assistant enters a closed domain of the credit card service to carry out conversation, extracts slot position information according to the slot position and calls a target interface. For example, a prompt voice "ask you for a credit card bill for consumption, a payment amount, a payment date or whether there is overdue" is issued. When the voice assistant receives the 'repayment amount' replied by the user, the intelligent voice assistant inquires the credit card consumption condition of the user and replies to the user according to the inquiry result. For example, the voice assistant reports "you should repay 2033 yuan this month".
In addition, the intelligent voice assistant can also directly call the target service interface to acquire information or perform operation. For example, when the received user voice message is "check how many elements i/m need to pay for the credit card in this month", the repayment balance in the secondary service interface is directly called to obtain the balance information. And then voice broadcasting is carried out to 'you should repay 2033 yuan this month'.
Preferably, before the intelligent voice assistant performs the operation corresponding to the user's intention, a prompt message is sent out for the user to confirm. For example, the intelligent voice assistant plays the task voice to the user for confirmation before execution, and executes the operation corresponding to the user intention after receiving the confirmation information of the user. And the intelligent voice assistant feeds back the result to the user when the execution is successful or not.
Preferably, the intelligent voice assistant stores information authorized by the user and performs corresponding operations according to the authorized information and the identified user intention. Specifically, receiving user authorized information and storing the authorized information, wherein the authorized information includes account information (e.g., gas account); after determining the service level according to the user intention, carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation; acquiring slot position values according to the authorized information and the key information and filling the slot positions; and when the filled slot position meets the threshold value, executing the operation corresponding to the user intention.
For example, when the intelligent voice assistant stores the information that the user authorizes the householder to pay the gas fee, the intelligent voice assistant has a memory function for the information authorized by the user, and can remember the defined authorization information of the user without multiple inquiries. When the user says "help me grandma to pay gas fee", the intelligent assistant recognizes the intention: and (6) paying. Determining the secondary service as follows according to the user intention: and (5) paying gas fee. The key information in the closed domain conversation is identified as ' I ' po (i.e. the user needing to pay) and pay the gas fee ', the gas account of the po can be searched according to the memory, and the gas fee can be paid by the po in the application program directly.
Preferably, when the voice instruction corresponding to the user intention contains a plurality of services of a plurality of levels, determining an execution sequence of the services of the plurality of levels according to the closed domain dialog, and executing corresponding operations according to the execution sequence.
When the voice command intended by the user contains two services of the same level, it is necessary to clarify which service interface the user wants to perform. For example, when the user says "help me inquire about the credit card service and the loan service", the intelligent voice assistant will prompt the user "you want to inquire about the credit card service or the loan service first"; after the intention of inquiring the credit card service and then inquiring the loan service in response to the user is obtained, the intelligent voice assistant executes the credit card service inquiry first and then executes the loan service inquiry.
Preferably, when the voice instruction corresponding to the user intention includes a plurality of services of different levels, the user is prompted to select an upper level service corresponding to a lowest level service of the services of different levels, and then all lower level services included in the upper level service are provided for the user to select. Specifically, when the voice instruction corresponding to the user intention contains a plurality of different levels of services, identifying the lowest level service in the plurality of different levels of services according to the intention and service level association table; inquiring the superior service corresponding to the lowest level service; and providing all lower-level services contained in the upper-level services for the user to select.
For example, when a voice instruction corresponding to the user intention contains two services at upper and lower levels, the upper level service of the two services is identified, prompt voice is sent to clarify the second-level service contained under the upper level service for the user, and the service requirement of the user is confirmed through closed domain dialogue again. When the user says "help me see intelligent credit under the credit card service", the intelligent voice assistant will prompt the user "do you want to inquire about intelligent credit under the credit card service? There is no intelligent loan service under the credit card service ". When the user confirms the inquiry loan service, the intelligent voice assistant gives all subordinate services (such as quick credit, intelligent credit and cash credit) under the loan service for the user to select.
Preferably, when the filled slot position does not meet the threshold value, the intelligent voice assistant sends out a voice prompt according to the slot position value lacking in the slot; when a plurality of missing slot position values exist, the intelligent voice assistant carries out voice prompt in sequence and fills the missing slot position values in sequence according to the reply of a user; and starting a task corresponding to the filled slot position to execute an operation corresponding to the user intention. Therefore, when the filled slot position does not meet the threshold value, the targeted question can be asked according to the slot position value lacking in the slot. When a plurality of grooves need to be clarified, questions need to be asked in sequence to ensure that the real groove position value information of the user is obtained, and the intelligent voice assistant can conveniently start the corresponding tasks of the grooves.
For example, when the user says "i want to pay", since it is not clear what fee to pay, and who to pay. Thus, the filled slot does not satisfy the threshold. At the moment, the intelligent voice assistant sends out voice prompts according to the slot position values lacking in the slots. For example, "ask for what fee to pay", "pay fee for who". And the intelligent voice assistant carries out voice prompt according to the sequence due to the current two missing slot position values. For example, the intelligent voice assistant prompts ' ask for what fee to pay ' by voice ', and when the user replies ' pay gas fee ', the gas fee is filled into the missing slot position; the intelligent voice assistant continues to prompt who pays for the user by voice, and fills the lacking slot position with the wife when the user replies that the wife pays for the wife; and starting a task corresponding to the filled slot position (namely paying gas fee for the grandma), so as to search a gas account of the grandma, and directly paying the gas fee for the grandma in an application program.
In summary, the intelligent interaction device 20 provided by the present invention includes an obtaining module 201, a verifying module 202, an identifying module 203, a determining module 204, and an executing module 205. The acquiring module 201 is used for acquiring user voice information through an intelligent voice assistant; the verification module 202 is configured to verify the user identity according to the sound information; the identification module 203 is used for starting an open domain dialog by the intelligent voice assistant after the user identity authentication passes, and identifying the user intention according to the open domain dialog; the determining module 204 is configured to determine a service level according to the user intention; the identification module 203 is further configured to perform a closed domain dialogue according to the service level, and identify key information in the closed domain dialogue; the obtaining module 201 is further configured to obtain a slot position value according to the key information and fill the slot position; and the executing module 205 is configured to execute the operation corresponding to the user intention when the filled slot satisfies the threshold. The user voiceprint recognition system is added, the intelligent voice assistant of the bank obtains the user voice information while being awakened, the user identity is judged after voiceprint features in voice are extracted, when the user controls the operation of the user through voice, only voice verification is needed, extra verification operation is not needed, the operation flow is simplified, and the safety is improved. The invention can accurately identify the user intention, and after entering the closed domain dialogue, the user intention enters the primary service interface according to the user intention, and the question-answer type communication is carried out in the primary service interface, so that the execution task is more intelligent, and the human-computer interaction is higher. In addition, the invention can process the condition that the voice instruction corresponding to the user intention comprises a plurality of services with flat levels and a plurality of services with different levels, and can guide the user operation when the user expression is ambiguous until the whole closed-loop operation is completed.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a dual-screen device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
Fig. 3 is a schematic diagram of an electronic device according to a third embodiment of the present invention.
The electronic device 3 includes: a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, at least one communication bus 34, and a database 35.
The at least one processor 32 implements the steps in the above-described intelligent interaction method embodiments when executing the computer program 33.
Illustratively, the computer program 33 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the at least one processor 32 to carry out the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 33 in the electronic device 3.
The electronic device 3 may be a device such as a mobile phone, a tablet computer, a Personal Digital Assistant (PDA) and the like, which is installed with an application program. It will be understood by those skilled in the art that the schematic diagram 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input-output device, a network access device, a bus, etc.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. In addition, the memory 31 may include a high speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.
The memory 31 has program code stored therein, and the at least one processor 32 can call the program code stored in the memory 31 to perform related functions. For example, the modules (the obtaining module 201, the verifying module 202, the identifying module 203, the determining module 204 and the executing module 205) illustrated in fig. 2 are program codes stored in the memory 31 and executed by the at least one processor 32, so as to realize the functions of the modules for the purpose of intelligent interaction.
The acquiring module 201 is used for acquiring user voice information through an intelligent voice assistant;
the verification module 202 is configured to verify the user identity according to the sound information;
the identification module 203 is used for starting an open domain dialog by the intelligent voice assistant after the user identity authentication passes, and identifying the user intention according to the open domain dialog;
the determining module 204 is configured to determine a service level according to the user intention;
the identification module 203 is further configured to perform a closed domain dialogue according to the service level, and identify key information in the closed domain dialogue;
the obtaining module 201 is further configured to obtain a slot position value according to the key information and fill the slot position; and
the executing module 205 is configured to execute an operation corresponding to the user intention when the filled slot satisfies a threshold.
Said Database (Database)35 is a repository built on said electronic device 3 that organizes, stores and manages data according to a data structure. Databases are generally classified into hierarchical databases, network databases, and relational databases. In the present embodiment, the database 35 is used to store user voice information and the like.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and which, when executed by a processor, may implement the steps of the above-described embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided by the present invention, it should be understood that the disclosed electronic device and method can be implemented in other ways. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, the division of the units is only one logical functional division, and there may be other divisions when the actual implementation is performed.
In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit scope of the technical solutions of the present invention.
Claims (10)
1. An intelligent interaction method, the method comprising:
the intelligent voice assistant acquires voice information of a user;
verifying the user identity according to the sound information;
after the user identity authentication is passed, the intelligent voice assistant starts an open domain dialog, and the user intention is identified according to the open domain dialog;
determining a business level according to the user intention;
carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation;
acquiring slot position values according to the key information and filling slot positions; and
and when the filled slot position meets a threshold value, executing the operation corresponding to the user intention.
2. The intelligent interaction method of claim 1, wherein the step of verifying the user's identity based on the voice information comprises:
extracting voiceprint features in the sound information;
matching the extracted voiceprint features with a pre-constructed voiceprint model;
when the extracted voiceprint features are matched with a pre-constructed voiceprint model, confirming that the user identity authentication is passed;
and when the extracted voiceprint features are not matched with the constructed voiceprint model, confirming that the user identity authentication is not passed.
3. The intelligent interaction method of claim 1, wherein the service level is determined by querying a pre-established intent-to-service level association table, wherein the intent-to-service level association table is an intent-to-service level correspondence established according to a service logic of an application domain and a knowledge base of the application domain.
4. The intelligent interaction method of claim 1, wherein the method further comprises:
receiving information authorized by a user and storing the authorized information, wherein the authorized information comprises account information;
after determining the service level according to the user intention, carrying out closed domain conversation according to the service level, and identifying key information in the closed domain conversation;
acquiring slot position values according to the authorized information and the key information and filling the slot positions; and
and when the filled slot position meets a threshold value, executing the operation corresponding to the user intention.
5. The intelligent interaction method according to claim 1, wherein when the voice command corresponding to the user's intention contains a plurality of services of a plurality of levels, an execution sequence of the services of the plurality of levels is determined according to the closed domain dialog, and corresponding operations are executed according to the execution sequence.
6. The intelligent interaction method of claim 3, wherein the method further comprises:
when the voice instruction corresponding to the user intention contains a plurality of services with different levels, identifying the lowest level service in the services with different levels according to the intention and service level association table;
inquiring the superior service corresponding to the lowest level service;
and providing all lower-level services contained in the upper-level services for the user to select.
7. The intelligent interaction method of claim 1, wherein the method further comprises:
when the filled slot position does not meet the threshold value, the intelligent voice assistant sends out a voice prompt according to the slot position value lacking in the slot;
when a plurality of missing slot position values exist, the intelligent voice assistant carries out voice prompt in sequence and fills the missing slot position values in sequence according to the reply of a user;
and starting a task corresponding to the filled slot position to execute an operation corresponding to the user intention.
8. An intelligent interaction device, the device comprising:
the acquisition module is used for acquiring the voice information of the user through the intelligent voice assistant;
the verification module is used for verifying the identity of the user according to the sound information;
the recognition module is used for starting an open domain dialog by the intelligent voice assistant after the user identity authentication passes, and recognizing the user intention according to the open domain dialog;
a determining module for determining a service level according to the user intention;
the identification module is also used for carrying out closed domain conversation according to the service level and identifying key information in the closed domain conversation;
the acquisition module is further used for acquiring slot position values according to the key information and filling the slot positions; and
and the execution module is used for executing the operation corresponding to the user intention when the filled slot position meets the threshold value.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to implement the intelligent interaction method as claimed in any one of claims 1 to 7 when executing a computer program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the intelligent interaction method according to any one of claims 1 to 7.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911319401.2A CN111223485A (en) | 2019-12-19 | 2019-12-19 | Intelligent interaction method and device, electronic equipment and storage medium |
| PCT/CN2020/105636 WO2021120631A1 (en) | 2019-12-19 | 2020-07-29 | Intelligent interaction method and apparatus, and electronic device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911319401.2A CN111223485A (en) | 2019-12-19 | 2019-12-19 | Intelligent interaction method and device, electronic equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111223485A true CN111223485A (en) | 2020-06-02 |
Family
ID=70827894
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201911319401.2A Pending CN111223485A (en) | 2019-12-19 | 2019-12-19 | Intelligent interaction method and device, electronic equipment and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN111223485A (en) |
| WO (1) | WO2021120631A1 (en) |
Cited By (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111767384A (en) * | 2020-07-08 | 2020-10-13 | 上海风秩科技有限公司 | Man-machine conversation processing method, device, equipment and storage medium |
| CN111933151A (en) * | 2020-08-16 | 2020-11-13 | 云知声智能科技股份有限公司 | Method, device, device and storage medium for processing call data |
| CN111986024A (en) * | 2020-08-25 | 2020-11-24 | 北京文思海辉金信软件有限公司 | Transaction processing method and device and electronic terminal |
| CN112035623A (en) * | 2020-09-11 | 2020-12-04 | 杭州海康威视数字技术股份有限公司 | Intelligent question answering method, device, electronic device and storage medium |
| CN112331185A (en) * | 2020-11-10 | 2021-02-05 | 珠海格力电器股份有限公司 | Voice interaction method, system, storage medium and electronic equipment |
| CN112740323A (en) * | 2020-12-26 | 2021-04-30 | 华为技术有限公司 | Voice understanding method and device |
| CN112820285A (en) * | 2020-12-29 | 2021-05-18 | 北京搜狗科技发展有限公司 | Interaction method and earphone equipment |
| WO2021120631A1 (en) * | 2019-12-19 | 2021-06-24 | 深圳壹账通智能科技有限公司 | Intelligent interaction method and apparatus, and electronic device and storage medium |
| CN113113012A (en) * | 2021-04-15 | 2021-07-13 | 北京蓦然认知科技有限公司 | Method and device for interaction based on collaborative voice interaction engine cluster |
| WO2022017152A1 (en) * | 2020-07-24 | 2022-01-27 | 深圳市声扬科技有限公司 | Resource transfer method and apparatus, computer device, and storage medium |
| CN114255752A (en) * | 2021-12-17 | 2022-03-29 | 中国电信股份有限公司 | Method, device and storage medium for invoking application open capability through voice assistant |
| CN114329398A (en) * | 2021-11-25 | 2022-04-12 | 泰康保险集团股份有限公司 | Data processing method and device and physical robot |
| CN115064167A (en) * | 2022-08-17 | 2022-09-16 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and storage medium |
| CN115146038A (en) * | 2021-03-31 | 2022-10-04 | 辉达公司 | Conversational AI platform with closed domain and open domain conversation integration |
| CN117059095A (en) * | 2023-07-21 | 2023-11-14 | 广州市睿翔通信科技有限公司 | IVR-based service providing method and device, computer equipment and storage medium |
| WO2023246609A1 (en) * | 2022-06-24 | 2023-12-28 | 华为技术有限公司 | Speech interaction method, electronic device and speech assistant development platform |
Families Citing this family (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113782035A (en) * | 2021-09-10 | 2021-12-10 | 中国银行股份有限公司 | Service processing method and device, electronic equipment and storage medium |
| WO2023042988A1 (en) * | 2021-09-14 | 2023-03-23 | Samsung Electronics Co., Ltd. | Methods and systems for determining missing slots associated with a voice command for an advanced voice interaction |
| CN114356276B (en) * | 2021-12-22 | 2024-08-23 | 科大讯飞股份有限公司 | Voice interaction method and related device |
| CN114663176A (en) * | 2022-02-15 | 2022-06-24 | 北京元年科技股份有限公司 | Business operation execution method, device, equipment and computer readable storage medium |
| CN114974236A (en) * | 2022-05-07 | 2022-08-30 | 阳光保险集团股份有限公司 | Method and device for identifying user intention, storage medium and electronic equipment |
| CN115527534B (en) * | 2022-09-07 | 2025-09-02 | 广州小鹏汽车科技有限公司 | Vehicle voice interaction method, vehicle and storage medium |
| CN115862622A (en) * | 2022-11-30 | 2023-03-28 | 航天信息股份有限公司 | Warehouse management information processing method, device, electronic device and storage medium |
| CN116312521A (en) * | 2023-03-20 | 2023-06-23 | 长城汽车股份有限公司 | Speech recognition method, device, speech recognition device and vehicle |
| CN116564302A (en) * | 2023-03-22 | 2023-08-08 | 南京视通天下数字科技有限公司 | Intelligent voice interaction system based on AI autonomous learning |
| CN116662555B (en) * | 2023-07-28 | 2023-10-20 | 成都赛力斯科技有限公司 | Request text processing method and device, electronic equipment and storage medium |
| CN117312523A (en) * | 2023-10-17 | 2023-12-29 | 抖音视界有限公司 | Prompt information generation method and device, computer equipment and storage medium |
| CN117556864B (en) * | 2024-01-12 | 2024-04-16 | 阿里云计算有限公司 | Information processing method, electronic device, and storage medium |
| CN117725185B (en) * | 2024-02-06 | 2024-05-07 | 河北神玥软件科技股份有限公司 | Intelligent dialogue generation method and system |
| CN118964688B (en) * | 2024-10-15 | 2025-08-08 | 北京字跳网络技术有限公司 | Interaction method and device and computer readable storage medium |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1321295A (en) * | 1998-10-02 | 2001-11-07 | 国际商业机器公司 | System for efficient voice navigation through generic hierarchical objects |
| CN103139404A (en) * | 2013-01-25 | 2013-06-05 | 西安电子科技大学 | System and method for generating interactive voice response display menu based on voice recognition |
| WO2018006489A1 (en) * | 2016-07-06 | 2018-01-11 | 深圳Tcl数字技术有限公司 | Terminal voice interaction method and device |
| CN109671438A (en) * | 2019-01-28 | 2019-04-23 | 武汉恩特拉信息技术有限公司 | It is a kind of to provide the device and method of ancillary service using voice |
| CN109922213A (en) * | 2019-01-17 | 2019-06-21 | 深圳壹账通智能科技有限公司 | Data processing method, device, storage medium and terminal device when voice is seeked advice from |
| CN110377720A (en) * | 2019-07-26 | 2019-10-25 | 中国工商银行股份有限公司 | The more wheel exchange methods of intelligence and system |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10332513B1 (en) * | 2016-06-27 | 2019-06-25 | Amazon Technologies, Inc. | Voice enablement and disablement of speech processing functionality |
| CN106776936B (en) * | 2016-12-01 | 2020-02-18 | 上海智臻智能网络科技股份有限公司 | Intelligent interaction method and system |
| CN107886948A (en) * | 2017-11-16 | 2018-04-06 | 百度在线网络技术(北京)有限公司 | Voice interactive method and device, terminal, server and readable storage medium storing program for executing |
| CN108763568A (en) * | 2018-06-05 | 2018-11-06 | 北京玄科技有限公司 | The management method of intelligent robot interaction flow, more wheel dialogue methods and device |
| CN109473108A (en) * | 2018-12-15 | 2019-03-15 | 深圳壹账通智能科技有限公司 | Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition |
| CN111223485A (en) * | 2019-12-19 | 2020-06-02 | 深圳壹账通智能科技有限公司 | Intelligent interaction method and device, electronic equipment and storage medium |
-
2019
- 2019-12-19 CN CN201911319401.2A patent/CN111223485A/en active Pending
-
2020
- 2020-07-29 WO PCT/CN2020/105636 patent/WO2021120631A1/en not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1321295A (en) * | 1998-10-02 | 2001-11-07 | 国际商业机器公司 | System for efficient voice navigation through generic hierarchical objects |
| CN103139404A (en) * | 2013-01-25 | 2013-06-05 | 西安电子科技大学 | System and method for generating interactive voice response display menu based on voice recognition |
| WO2018006489A1 (en) * | 2016-07-06 | 2018-01-11 | 深圳Tcl数字技术有限公司 | Terminal voice interaction method and device |
| CN109922213A (en) * | 2019-01-17 | 2019-06-21 | 深圳壹账通智能科技有限公司 | Data processing method, device, storage medium and terminal device when voice is seeked advice from |
| CN109671438A (en) * | 2019-01-28 | 2019-04-23 | 武汉恩特拉信息技术有限公司 | It is a kind of to provide the device and method of ancillary service using voice |
| CN110377720A (en) * | 2019-07-26 | 2019-10-25 | 中国工商银行股份有限公司 | The more wheel exchange methods of intelligence and system |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021120631A1 (en) * | 2019-12-19 | 2021-06-24 | 深圳壹账通智能科技有限公司 | Intelligent interaction method and apparatus, and electronic device and storage medium |
| CN111767384A (en) * | 2020-07-08 | 2020-10-13 | 上海风秩科技有限公司 | Man-machine conversation processing method, device, equipment and storage medium |
| WO2022017152A1 (en) * | 2020-07-24 | 2022-01-27 | 深圳市声扬科技有限公司 | Resource transfer method and apparatus, computer device, and storage medium |
| CN111933151A (en) * | 2020-08-16 | 2020-11-13 | 云知声智能科技股份有限公司 | Method, device, device and storage medium for processing call data |
| CN111986024A (en) * | 2020-08-25 | 2020-11-24 | 北京文思海辉金信软件有限公司 | Transaction processing method and device and electronic terminal |
| CN112035623A (en) * | 2020-09-11 | 2020-12-04 | 杭州海康威视数字技术股份有限公司 | Intelligent question answering method, device, electronic device and storage medium |
| CN112331185A (en) * | 2020-11-10 | 2021-02-05 | 珠海格力电器股份有限公司 | Voice interaction method, system, storage medium and electronic equipment |
| CN112331185B (en) * | 2020-11-10 | 2023-08-11 | 珠海格力电器股份有限公司 | Voice interaction method, system, storage medium and electronic equipment |
| CN112740323A (en) * | 2020-12-26 | 2021-04-30 | 华为技术有限公司 | Voice understanding method and device |
| CN112820285A (en) * | 2020-12-29 | 2021-05-18 | 北京搜狗科技发展有限公司 | Interaction method and earphone equipment |
| CN112820285B (en) * | 2020-12-29 | 2024-09-20 | 北京搜狗科技发展有限公司 | Interaction method and earphone device |
| CN115146038A (en) * | 2021-03-31 | 2022-10-04 | 辉达公司 | Conversational AI platform with closed domain and open domain conversation integration |
| CN115146038B (en) * | 2021-03-31 | 2025-08-19 | 辉达公司 | Conversational AI platform with closed domain and open domain conversation integration |
| CN113113012A (en) * | 2021-04-15 | 2021-07-13 | 北京蓦然认知科技有限公司 | Method and device for interaction based on collaborative voice interaction engine cluster |
| CN114329398A (en) * | 2021-11-25 | 2022-04-12 | 泰康保险集团股份有限公司 | Data processing method and device and physical robot |
| CN114255752A (en) * | 2021-12-17 | 2022-03-29 | 中国电信股份有限公司 | Method, device and storage medium for invoking application open capability through voice assistant |
| WO2023246609A1 (en) * | 2022-06-24 | 2023-12-28 | 华为技术有限公司 | Speech interaction method, electronic device and speech assistant development platform |
| CN115064167A (en) * | 2022-08-17 | 2022-09-16 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and storage medium |
| CN115064167B (en) * | 2022-08-17 | 2022-12-13 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and storage medium |
| CN117059095A (en) * | 2023-07-21 | 2023-11-14 | 广州市睿翔通信科技有限公司 | IVR-based service providing method and device, computer equipment and storage medium |
| CN117059095B (en) * | 2023-07-21 | 2024-04-30 | 广州市睿翔通信科技有限公司 | IVR-based service providing method and device, computer equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2021120631A1 (en) | 2021-06-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111223485A (en) | Intelligent interaction method and device, electronic equipment and storage medium | |
| CN112434501B (en) | Method, device, electronic equipment and medium for intelligent generation of worksheet | |
| CN111696558A (en) | Intelligent outbound method, device, computer equipment and storage medium | |
| CN110399609B (en) | Intention recognition method, device, equipment and computer readable storage medium | |
| CN111128182B (en) | Intelligent voice recording method and device | |
| CN109727041A (en) | Intelligent customer service takes turns answering method, equipment, storage medium and device more | |
| CN113132214B (en) | Dialogue method, dialogue device, dialogue server and dialogue storage medium | |
| CN109087639B (en) | Method, apparatus, electronic device and computer readable medium for speech recognition | |
| CN113159901A (en) | Method and device for realizing financing lease service session | |
| CN112434677B (en) | Contract auditing method, device, equipment and storage medium | |
| CN112529585A (en) | Interactive awakening method, device, equipment and system for risk transaction | |
| CN112561535A (en) | Transaction dispute data processing method, device, equipment and storage medium | |
| CN110277098A (en) | A kind of wisdom scenic spot information service system | |
| CN118450053A (en) | Call service processing method, system, device, storage medium and program product | |
| CN107590374A (en) | Control method, intelligent terminal and the storage device of voice assistant authority | |
| CN110728984A (en) | Database operation and maintenance method and device based on voice interaction | |
| CN112435127A (en) | Contract signing method, device, equipment and storage medium based on block chain | |
| CN109871129B (en) | Human-computer interaction method and device, customer service equipment and storage medium | |
| CN111554296B (en) | Client information modification method, device, server and storage medium | |
| CN109801169A (en) | Financing lease application method, device, computer equipment and storage medium | |
| CN110163617B (en) | Television shopping payment method supporting voiceprint-based | |
| CN118571223A (en) | Voice-controlled transfer method, device, equipment, medium and program product | |
| CN115602160A (en) | Service handling method and device based on voice recognition and electronic equipment | |
| CN119416762B (en) | Collection dialogue generation method and system based on AI semantic understanding | |
| CN119204219B (en) | Online mediation scheme generation method and system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200602 |
|
| WD01 | Invention patent application deemed withdrawn after publication |