Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present application provide a voice control method and apparatus.
In a first aspect, an embodiment of the present application provides a voice control method, where the method includes:
acquiring voice instruction information;
extracting target application self-identification program information and operation behavior information from the voice instruction information;
generating a release message according to the target application self-identification program information and the operation behavior information;
searching for a subscription message matched with the target application self-identification program information;
taking the application self-body program corresponding to the subscription message matched with the target application self-body program information as a target application self-body program;
and sending the issuing message to the target application self-distinguishing program so that the target application self-distinguishing program obtains a corresponding execution instruction according to the operation behavior information and executes a corresponding action according to the execution instruction.
Optionally, before searching for the subscription message matching with the target application split program information, the method further includes:
acquiring a message subscription request sent by an application self-body program, wherein the message subscription request comprises source application self-body program information;
extracting source application self-body program information from the message subscription request to generate subscription messages of the application self-body programs;
searching for the subscription message matched with the target application self-identification program information, wherein the searching comprises the following steps:
and searching the subscription message of which the source application self-body program information is matched with the target application self-body program information.
Optionally, the message subscription request is generated when the user logs into the application avatar program.
Optionally, extracting the target application self-identification program information and the operation behavior information from the voice instruction information includes:
and carrying out voice recognition and semantic analysis on the voice instruction information to obtain target application self-identification program information and operation behavior information.
Optionally, performing voice recognition and semantic analysis on the voice instruction information to obtain target application self-body program information and operation behavior information, including:
performing voice recognition and semantic analysis on the voice instruction information to obtain target application self-identification program information and operation content information;
and searching a control instruction number corresponding to the operation content information, and taking the control instruction number as operation behavior information.
Optionally, the target application self-identification program information includes a target application program name and a target account id; the source application avatar program information includes: the source application name and the source account id.
Optionally, the target account id includes a target user account and a corresponding target user password; the source account identification includes a source user account and a corresponding source user password.
Optionally, the method is based on an MQTT protocol, the target application self-service program information is a topic of a published message, the operation behavior information is content of the published message, and the source application self-service program information is a topic of a subscription message.
Optionally, acquiring the voice instruction information includes:
and acquiring the voice instruction information acquired by the voice acquisition device.
In a second aspect, an embodiment of the present application provides a voice control apparatus, including:
the acquisition module is used for acquiring voice instruction information;
the extraction module is used for extracting the target application self-identification program information and the operation behavior information from the voice instruction information;
the first generation module is used for generating a release message according to the target application self-identification program information and the operation behavior information;
the matching module is used for searching the subscription message matched with the target application self-identification program information;
the target determination module is used for taking the application self-body program corresponding to the subscription message matched with the target application self-body program information as a target application self-body program;
and the control module is used for sending the issuing message to the target application self-distinguishing program so as to enable the target application self-distinguishing program to obtain a corresponding execution instruction according to the operation behavior information and execute a corresponding action according to the execution instruction.
In a third aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of the method according to any one of the preceding claims.
In a fourth aspect, embodiments of the present application provide a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to perform the steps of the method according to any of the preceding claims.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
according to the technical scheme provided by the embodiment of the application, the release message is generated according to the voice instruction information based on the voice acquisition technology, the voice recognition technology and the semantic analysis technology, and the release message is pushed to the target application self-body-separating program subscribed with the release message, so that the application self-body-separating program is accurately and directionally controlled by the remote voice, and the method is safe and reliable. In addition, the application self-body program is interactively accessed with the hardware, so that the application self-body program can be remotely controlled by voice so as to control the hardware accessed with the application self-body program.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application is based on a subscription and publishing mode to perform voice control, the subscription and publishing mode comprises a subscription end, a proxy server end and a publishing end, the subscription end subscribes messages to the proxy server end according to respective service logics, the publishing end publishes corresponding messages to the proxy server end according to the generated services, and when the published messages are subscribed by one or more subscription ends, the proxy server end pushes the published messages to the subscription end subscribing the published messages.
FIG. 1 is a diagram of an exemplary implementation of a voice control method; referring to fig. 1, the voice control method is applied to a voice control system. The voice control system comprises a terminal group 10, a server 20 and a voice acquisition device 30, wherein each terminal (a terminal 11, a terminal 12, a terminal 13 and the like are not limited to the above) in the terminal group 10 and the voice acquisition device 30 are respectively connected with the server 20 through a network. The server 20 receives the voice signal acquired by the voice acquisition device 30 to obtain voice instruction information, extracts target application self-identification program information and operation behavior information from the voice instruction information, and generates a release message according to the target application self-identification program information and the operation behavior information; searching for a subscription message matched with the target application self-identification program information; taking the application self-body program corresponding to the subscription message matched with the target application self-body program information as a target application self-body program; and sending the issuing message to the target application self-distinguishing program so that the target application self-distinguishing program obtains a corresponding execution instruction according to the operation behavior information and executes a corresponding action according to the execution instruction. The target application affiliate program is registered with a terminal in the terminal group 10.
In the present embodiment, the voice collecting apparatus 30 belongs to a publishing terminal, the application programs in the terminals of the terminal group 10 belong to a subscribing terminal, and the server 20 belongs to a proxy server.
The application avatar program may be an application program registered through an account, or may also be an application Web page registered through an account, and the like, but is not limited thereto.
The terminal may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. Each terminal is provided with an application program, and each application program is logged in through different account numbers to obtain different application programs for distinguishing from each other. A variety of applications may be installed on each terminal. The server 20 may be implemented as a stand-alone server or as a server cluster comprised of a plurality of servers.
The voice collecting device 30 may be a common smart speaker, and is configured to receive a voice command of a user and transmit the voice command to the server for voice recognition. The intelligent sound box can be carried on the terminal.
Different types of terminal applications can be registered in the server 20, and each terminal application can have a plurality of different account logins, that is, the same terminal application can simultaneously run a plurality of application differentiation programs logged by different accounts, so that the server 20 can remotely control the same terminal application of different accounts and can also control different types of terminal applications.
FIG. 2 is a flow diagram illustrating a voice control method according to one embodiment; referring to fig. 2, the voice control method includes the steps of:
s100: and acquiring voice instruction information.
Specifically, the server 20 acquires voice instruction information of the user. The voice instruction information comprises voice information of the target application self-identification program, voice information of the target account, voice information of the operation behavior and the like.
S200: and extracting the target application self-identification program information and the operation behavior information from the voice instruction information.
Specifically, the server 20 has a voice recognition function, and the server 20 processes the received voice instruction information to extract information in a non-voice form of the target application split program information and the operation behavior information from the voice information, for example, to convert the information in the voice form into information in a text form.
S300: and generating a release message according to the target application self-identification program information and the operation behavior information.
Specifically, the server 20 may process the received voice instruction information to generate a publish message, where the publish message includes information of the target application avatar to be controlled by voice and information of the operation behavior to be executed by the target application avatar.
S400: and searching the subscription message matched with the target application self-body program information.
Specifically, the subscription message is subscribed to the server 20 by a plurality of different application programs, and may be a subscription message list stored in the server 20 in advance. Each subscription message corresponds to an application-aware program.
S500: and taking the application self-body program corresponding to the subscription message matched with the target application self-body program information as the target application self-body program.
Specifically, there may be a plurality of published messages generated by the server 20, and a plurality of obtained subscribed messages, and only if the published message matches the subscribed message, the matched published message will be sent to the application avatar program that subscribes to the matched subscribed message.
S600: and sending the issuing message to the target application self-distinguishing program so that the target application self-distinguishing program obtains a corresponding execution instruction according to the operation behavior information and executes a corresponding action according to the execution instruction.
Specifically, the subscription message corresponding to the target application avatar program is matched with the published message. The issuing message carries instruction information, and the target application self-identification program can analyze the instruction information and execute corresponding actions after receiving the issuing message. Namely, the voice remote control terminal application split program is completed.
In one embodiment, before step S400, the method further comprises:
acquiring a message subscription request sent by an application self-body program, wherein the message subscription request comprises source application self-body program information;
and extracting the source application self-body program information from the message subscription request to generate the subscription message of the application self-body program.
Specifically, the application self-body program may initiate a message subscription request to the server, where the message subscription request includes the application self-body program information, that is, the source application self-body program information; and after receiving the message subscription request, the server extracts the source application self-body program information so as to generate the subscription message of the application self-body program.
Step S400 specifically includes: and searching the subscription message of which the source application self-body program information is matched with the target application self-body program information.
When the source application subscriber information of a subscription message matches the target application subscriber information of a published message, the published message will be sent to the application subscriber subscribing to the subscription message.
In one embodiment, the message subscription request is generated when the user logs into the application avatar program.
Specifically, when a user logs in an application self-service program by using an account at a certain terminal, a message subscription request is sent to the server so as to receive a release message which is sent by the server and generated according to a voice control instruction in time, and thus, the remote voice control is responded in time.
In one embodiment, step S200 specifically includes:
and carrying out voice recognition and semantic analysis on the voice instruction information to obtain target application self-identification program information and operation behavior information.
Because the subsequent matching action can be carried out only by converting the voice information into the character information, the voice instruction information can be recognized by adopting a voice recognition and semantic analysis cloud service platform, and the voice instruction information is converted into the corresponding character information. The recognition result is recognized character information, namely target application self-identification program information and operation behavior information in a character form. The operation behavior information is an action that the user wants the target application to perform, such as opening a home page, forecasting weather, controlling hardware interfacing with the application, e.g., turning the hardware on and off, switching the operating mode of the hardware, and the like.
In one embodiment, performing speech recognition and semantic parsing on the voice instruction information to obtain the target application split program information and the operation behavior information includes:
performing voice recognition and semantic analysis on the voice instruction information to obtain target application self-identification program information and operation content information;
and searching a control instruction number corresponding to the operation content information, and taking the control instruction number as operation behavior information.
Specifically, the operation behavior information may be more text information, and may be converted into a control instruction number, which facilitates message transmission.
In one embodiment, the target application avatar program information includes a target application program name and a target account id; the source application avatar program information includes: the source application name and the source account id.
Because different application programs can be generated by logging in the same application program through different account numbers, the application programs can be effectively distinguished by taking the application program names and the account number identifications as the unique identifications of the application programs.
In one embodiment, the target account identification includes a target user account and a corresponding target user password; the source account identification includes a source user account and a corresponding source user password.
When the account identification comprises the user account and the user password, the safety can be improved, and the safety of operation can be effectively ensured because the user password is relatively secret. Meanwhile, account identification containing a user password needs to be sent between the subscription terminal and the proxy server terminal, and if the account identification is not encrypted, the risk of eavesdropping exists, so that the account identification is encrypted at the generation terminal when being generated, and eavesdropping in the data transmission process is avoided. Specifically, when the source account identifier is generated at the subscriber, the subscriber encrypts the source account identifier and sends the encrypted source account identifier to the proxy server; when the target account identification is generated at the proxy server, the target account identification is encrypted by the same encryption method and then matched with the encrypted source account identification.
In one embodiment, step S100 specifically includes: and acquiring the voice instruction information acquired by the voice acquisition device.
Specifically, the voice collecting device 30 may be a smart speaker, and the smart speaker is configured to receive the voice instruction information of the user and send the voice instruction information to the server 20.
The voice collecting device 30 may set a wake-up word, and the voice collecting device 30 collects the voice command after receiving the wake-up word.
In one embodiment, the method is based on the MQTT protocol, the target application self-identification program information is the topic of the published message, the operation behavior information is the content of the published message, and the source application self-identification program information is the topic of the subscription message. Alternatively, the method may also use a long connection or polling.
In one embodiment, the server 20 may include a voice recognition server, a message publishing server, an MQTT server, wherein the MQTT server applies an MQTT communication protocol. Of course, in one embodiment, the voice recognition server, the message publishing server, and the MQTT server may be integrated into one server. The voice recognition server is used for acquiring and recognizing voice instruction information and extracting target application self-identification program information and operation behavior information from the voice instruction information; the message publishing server is used for generating a publishing message according to the target application self-distinguishing program information and the operation behavior information; the MQTT server is used for receiving a subscription request sent by the application self-service program to obtain subscription information, matching the published information with the subscription information and sending the published information to a target application self-service program subscribed with the subscription information.
The MQTT is a message transmission protocol based on a publish/subscribe mode. The method has the characteristics of light weight, openness, simplicity, easiness in implementation, low communication bandwidth requirement and the like. These features make it a good choice for machine-to-machine communication and internet of things applications. And thus also to cell phone APP and Web applications.
In one embodiment, an application avatar program can simultaneously run a plurality of application avatar programs, each application avatar program logs in with a different account number, and the application avatar program where the account number is located can be uniquely determined according to the account number logged in on the application avatar program in a single-point login manner (that is, one account number can only log in on one application avatar program, and the login of the application avatar program can be forcibly logged out when the login is repeatedly performed on application avatar programs of the same type).
In one embodiment, the application avatar may be a local application avatar or a web application, and may also be a login-free presentation program or web application. The hardware provides an interactive interface to be connected with the terminal application, and the execution instruction finally triggers hardware operation, namely the final target of the operation is also the hardware.
In one embodiment, the voice collecting device 30 can also be used as a result feedback terminal, and the voice expresses the control result.
According to the technical scheme, different types of application programs can be controlled through voice, and the same account number can exist in the different types of application programs due to different application programs without conflict. When there is only one type of application, the subject of the message may be identified only by the account number, without the need for an application name.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
FIG. 3 is a block diagram of the voice control apparatus according to an embodiment; referring to fig. 3, the voice control apparatus 100 includes:
and the acquisition module 110 is used for acquiring voice instruction information.
The extracting module 120 is configured to extract the target application split program information and the operation behavior information from the voice instruction information.
The first generating module 130 is configured to generate a publishing message according to the target application self-identification program information and the operation behavior information.
And the matching module 140 is used for searching the subscription message matched with the target application self-identification program information.
And the target determination module 150 is configured to use the application avatar corresponding to the subscription message matched with the target application avatar information as the target application avatar.
And the control module 160 is configured to send the issue message to the target application differentiation program, so that the target application differentiation program obtains a corresponding execution instruction according to the operation behavior information, and executes a corresponding action according to the execution instruction.
In one embodiment, the apparatus further comprises:
the subscription module is used for acquiring a message subscription request sent by the application self-identification program, wherein the message subscription request comprises source application self-identification program information.
And the second generation module is used for extracting the source application self-body program information from the message subscription request and generating the subscription message of the application self-body program.
The matching module 140 is specifically configured to search for a subscription message in which the source application avatar program information matches the target application avatar program information.
In one embodiment, the message subscription request is generated when the user logs into the application avatar program.
In one embodiment, the extracting module 120 is specifically configured to perform speech recognition and semantic parsing on the speech instruction information to obtain the target application split program information and the operation behavior information.
In one embodiment, performing speech recognition and semantic analysis on the speech instruction information to obtain the target application self-body program information and the operation behavior information specifically includes: performing voice recognition and semantic analysis on the voice instruction information to obtain target application self-identification program information and operation content information; and searching a control instruction number corresponding to the operation content information, and taking the control instruction number as operation behavior information.
In one embodiment, the target application avatar program information includes a target application program name and a target account id; the source application avatar program information includes: the source application name and the source account id.
In one embodiment, the target account identification includes a target user account and a corresponding target user password; the source account identification includes a source user account and a corresponding source user password.
In one embodiment, the collection module 110 is specifically configured to obtain the voice instruction information collected by the voice collection device.
The device in this embodiment is used to execute the voice control method, and the working principle and the beneficial effect thereof are in one-to-one correspondence, which is not described again.
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment. Referring to fig. 4, the computer device may be specifically the server 20 in fig. 1. The computer equipment comprises a processor, a memory, a network interface, an input device, a voice device and a display screen which are connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the voice control method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform the speech control method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like. The voice device can be a smart speaker or the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the voice control apparatus 100 provided in the present application may be implemented in a form of a computer program, and the computer program may be run on a computer device as shown in fig. 4. The memory of the computer device may store various program modules constituting the voice control apparatus 100, such as the acquisition module 110, the extraction module 120, the first generation module, the matching module 140, the target determination module 150, and the control module 160 shown in fig. 3. The computer program constituted by the respective program modules causes the processor to execute the steps in the voice control method of the respective embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 4 can obtain the voice instruction information through the acquisition module 110 in the voice control apparatus 100 shown in fig. 3. The computer device may extract the target application split program information and the operation behavior information from the voice instruction information through the extraction module 120. The computer device may generate the publishing message according to the target application self-identification program information and the operation behavior information through the generating module 130. The computer device may look up a subscription message matching the targeted application avatar information via the matching module 140. The computer device may treat the application avatar corresponding to the subscription message matched with the target application avatar information as the target application avatar through the targeting module 150. The computer device may send the issue message to the target application avatar program through the control module 160, so that the target application avatar program obtains a corresponding execution instruction according to the operation behavior information, and executes a corresponding action according to the execution instruction.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: acquiring voice instruction information; extracting target application self-identification program information and operation behavior information from the voice instruction information; generating a release message according to the target application self-identification program information and the operation behavior information; searching for a subscription message matched with the target application self-identification program information; taking the application self-body program corresponding to the subscription message matched with the target application self-body program information as a target application self-body program; and sending the issuing message to the target application self-distinguishing program so that the target application self-distinguishing program obtains a corresponding execution instruction according to the operation behavior information and executes a corresponding action according to the execution instruction.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring voice instruction information; extracting target application self-identification program information and operation behavior information from the voice instruction information; generating a release message according to the target application self-identification program information and the operation behavior information; searching for a subscription message matched with the target application self-identification program information; taking the application self-body program corresponding to the subscription message matched with the target application self-body program information as a target application self-body program; and sending the issuing message to the target application self-distinguishing program so that the target application self-distinguishing program obtains a corresponding execution instruction according to the operation behavior information and executes a corresponding action according to the execution instruction.
The technical solution of the present application is described below with a specific application scenario. The user logs in a user account number B in the application program A, and the user password is as follows: "pass", then the application will automatically subscribe to the MQTT server for the topic: the subscription message of "A/B/pass" defines the application avatar as application avatar Z.
The user sends a voice instruction to the intelligent sound box: the smart speaker receives a voice instruction and then sends a voice message to a voice recognition server in the server.
The voice recognition server receives the voice message, then performs voice recognition and semantic analysis to obtain a recognition result: the 'B' is a target account number identifier, the 'application A' is a target terminal application identifier, the 'application A' can also be used as a wake-up word of the server and the target terminal application, and the 'pass' is a password of the target account number; the "get.. home page open" is operation content information, and "Q1" is obtained as operation behavior information from a comparison table of prestored operation content information and operation behavior information. The voice recognition server identifies words according to the target application split program and sends the identification result to the message publishing server to generate a publishing message. And the message publishing server sends the publishing message to the MQTT server. The MQTT server is also used for receiving a subscription request of the application self-identification program and generating a corresponding subscription message according to the subscription request. After receiving the published message, the MQTT server matches the topic of the published message 'A/B/Pass' with the topic of the subscription message, and sends the published message to a target application body-separating program Z corresponding to the subscription message which can be matched with the topic 'A/B/Pass'. And after receiving the issuing message, the target application body-separating program Z matches a corresponding execution instruction 'showHome' according to the control instruction number 'Q1' and executes the execution instruction, so that the target application body-separating program finally completes the operation of opening the homepage.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.