CN114121001A

CN114121001A - Voice control method and device and electronic equipment

Info

Publication number: CN114121001A
Application number: CN202111342166.8A
Authority: CN
Inventors: 曾理; 张晓帆
Original assignee: Hangzhou Douku Software Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-03-01

Abstract

The embodiment of the application discloses a voice control method and device and electronic equipment. The method comprises the following steps: in response to receiving a voice instruction, converting the voice instruction into a corresponding first instruction text; analyzing grammatical components of the first instruction text to obtain grammatical components of the first instruction text; replacing the content corresponding to the grammar component in the first instruction text with the corresponding standard content based on the grammar component to obtain a second instruction text; generating a control instruction based on the second instruction text; and executing the control instruction. Therefore, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved.

Description

Voice control method and device and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a voice control method and apparatus, and an electronic device.

Background

The combination of artificial intelligence technology and virtual personal assistant (voice assistant) can make the electronic device receive the voice command issued by the user through the hearing modality and complete the corresponding interactive task. However, due to personal habits of users, when a voice command is triggered, the triggered voice command is diversified, so that the actual control intention of the user accurately determined by the electronic device is influenced, and the accuracy of the control process needs to be improved.

Disclosure of Invention

In view of the foregoing, the present application provides a voice control method, an apparatus and an electronic device to achieve improvement of the foregoing problems.

In a first aspect, the present application provides a method for voice control, the method comprising: in response to receiving a voice instruction, converting the voice instruction into a corresponding first instruction text; analyzing grammatical components of the first instruction text to obtain grammatical components of the first instruction text; replacing the content corresponding to the grammar component in the first instruction text with the corresponding standard content based on the grammar component to obtain a second instruction text; generating a control instruction based on the second instruction text; and executing the control instruction.

In a second aspect, the present application provides a voice-controlled apparatus, the apparatus comprising: the instruction conversion unit is used for responding to the received voice instruction and converting the voice instruction into a corresponding first instruction text; the syntactic component analysis unit is used for carrying out syntactic component analysis on the first instruction text to obtain syntactic components of the first instruction text; the instruction processing unit is used for replacing the content corresponding to the grammar component in the first instruction text with the corresponding standard content based on the grammar component to obtain a second instruction text; the instruction generating unit is used for generating a control instruction based on the second instruction text; and the control unit is used for executing the control instruction.

In a third aspect, the present application provides an electronic device comprising one or more processors and a memory; one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the methods described above.

In a fourth aspect, the present application provides a computer-readable storage medium having a program code stored therein, wherein the program code performs the above method when running.

According to the voice control method, the voice control device and the electronic equipment, after a voice instruction is received, the voice instruction is converted into a corresponding first instruction text, grammatical component analysis is carried out on the first instruction text to obtain grammatical components of the first instruction text, then based on the grammatical components, the content corresponding to the grammatical components in the first instruction text is replaced by corresponding standard content to obtain a second instruction text, finally, a control instruction is generated based on the second instruction text, and the control instruction is executed. Therefore, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an application scenario of a speech control method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating an application scenario of another speech control method proposed in an embodiment of the present application;

fig. 3 is a flowchart illustrating a voice control method according to an embodiment of the present application;

FIG. 4 is a flow chart illustrating a voice control method according to another embodiment of the present application;

FIG. 5 is a flow chart illustrating a voice control method according to yet another embodiment of the present application;

FIG. 6 is a flow chart illustrating a voice control method according to another embodiment of the present application;

FIG. 7 is a flow chart illustrating a voice control method according to another embodiment of the present application;

fig. 8 is a block diagram illustrating a structure of a voice control apparatus according to an embodiment of the present application;

fig. 9 shows a block diagram of an electronic device proposed in the present application;

fig. 10 is a storage unit according to an embodiment of the present application, configured to store or carry program code for implementing a voice control method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The popularization of intelligent terminal equipment brings various conveniences to life. The combination of artificial intelligence technology and virtual personal assistant (voice assistant) can make the electronic device receive the voice command issued by the user through the hearing modality and complete the corresponding interactive task.

However, due to personal habits of users, when a voice command is triggered, the triggered voice command is diversified, so that the actual control intention of the user accurately determined by the electronic device is influenced, and the accuracy of the control process needs to be improved. For example, in the process of playing a video, if a user wants to control the video to stop playing by voice, the user may issue a voice command "stop", or "do not play", or "stop". For another example, if the user wants to control the start of playing a video with the name "XXX" by voice, the voice command issued by the user may be "open XXX", "play XXX", "i want to see XXX", "put XXX", etc. Therefore, the content of the voice command issued by the user may be various, so that the electronic device may not be able to accurately determine the actual control intention of the user.

Therefore, the inventor provides a voice control method, a voice control device and an electronic device in the present application, after receiving a voice instruction, the voice instruction is converted into a corresponding first instruction text, a grammatical component of the first instruction text is obtained by analyzing the grammatical component of the first instruction text, a content corresponding to the grammatical component in the first instruction text is replaced with a corresponding standard content based on the grammatical component to obtain a second instruction text, and finally a control instruction is generated based on the second instruction text and the control instruction is executed. Therefore, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved.

The following first introduces an application scenario related to the embodiment of the present application.

In the embodiment of the application, the provided voice control method can be executed by the electronic equipment. In this manner performed by the electronic device, all steps in the voice control method provided by the embodiment of the present application may be performed by the electronic device. For example, as shown in fig. 1, a voice acquisition device of the electronic device 100 may acquire a voice instruction and transmit the acquired voice acquisition instruction to a processor, so that the processor may acquire a first instruction text corresponding to the voice instruction and perform syntactic component analysis on the first instruction text to obtain a syntactic component of the first instruction text; replacing the content corresponding to the grammar component in the first instruction text with the corresponding standard content based on the grammar component to obtain a second instruction text, and then generating and executing a control instruction

Moreover, the voice control method provided by the embodiment of the application can also be executed by a server. Correspondingly, in this manner executed by the server, the electronic device may collect the voice instruction and send the collected voice instruction to the server, and then the server executes the voice control method provided in the embodiment of the present application to generate the control instruction, and then the server returns the control instruction to the electronic device, and the electronic device executes the control instruction. In addition, the method can be executed by cooperation of the electronic device and the server. In this manner, the electronic device and the server cooperatively perform some steps in the voice control method provided by the embodiment of the present application, and some other steps are performed by the electronic device and the server.

For example, as shown in fig. 2, the electronic device 100 may perform a voice control method including: in response to receiving a voice instruction, converting the voice instruction into a corresponding first instruction text; and analyzing grammatical components of the first instruction text to obtain grammatical components of the first instruction text. Then, the server 200 replaces the content corresponding to the grammar component in the first instruction text with the corresponding standard content based on the grammar component to obtain a second instruction text; and generating a control instruction based on the second instruction text. Then, the control command is returned to the electronic device 100, and the electronic device 100 is triggered to execute the control command.

It should be noted that, in this manner executed by the electronic device and the server cooperatively, the steps executed by the electronic device and the server respectively are not limited to the manner described in the above example, and in practical applications, the steps executed by the electronic device and the server respectively may be dynamically adjusted according to actual situations.

Embodiments of the present application will be described with reference to the accompanying drawings.

Referring to fig. 3, a voice control method provided in the present application includes:

s110: in response to receiving a voice instruction, converting the voice instruction into a corresponding first instruction text.

In the embodiment of the application, the user can express own control intention through voice. Correspondingly, the electronic device can take the voice sent by the user as the voice instruction. The electronic device may convert the voice command into corresponding text content based on a preconfigured Automatic Speech Recognition mode (Automatic Speech Recognition) after receiving the voice command, thereby obtaining a first command text. For example, if the received voice instruction is "open album", the first instruction text obtained after converting the voice instruction includes "open album".

S120: and analyzing grammatical components of the first instruction text to obtain grammatical components of the first instruction text.

In the embodiment of the present application, a grammar component refers to words such as a subject, a predicate, an object, and a shape included in a piece of text. As one mode, the electronic device may obtain a relationship between language unit (word, named entity) components in the first instruction text by using a Dependency Parsing (DP) technique in a natural language processing technique, and obtain a component of each language unit through the relationship, thereby obtaining a grammatical component of the first instruction text. For example, the syntactic component in a sentence can be obtained by dependency parsing. The specific labeling relationships may be as exemplified in the following table:

type of relationship	Label (R)	Description of the invention	Examples of the invention
				Relationship between major and minor	SBV	subject-verb	I send her a bunch of flowers (I ← send)
Moving guest relationship	VOB	Direct object, verb-object	I send her bunch of flowers (send → flower)
				Inter-guest relationships	IOB	Indirect object, indeect-object	I send her bunch of flowers (send → her)
Preposition object	FOB	Front-object of preceding object	He reads what book is (book ← read)
				Concurrent language	DBL	double	He asks me to eat (please → me)
Centering relationships	ATT	attribute	Red apple (Red ← apple)
				Middle structure	ADV	adverbial	Very beautiful (very special ← beautiful)
Dynamic compensation structure	CMP	complement	Done the job (do → done)
				In a parallel relationship	COO	coordinate	Mountain and sea (mountain → sea)
Intermediary relation	POB	preposition-object	In the trade area (in → in)
				Left additive relationship	LAD	left adjunct	Mountain and sea (He ← sea)
Right additive relationship	RAD	right adjunct	Kids (Children → people)
				Independent structure	IS	independent structure	The two separate sentences being structurally independent of each other
Core relationships	HED	head	Refers to the core of the whole sentence

Through dependency syntax analysis, a main-predicate relationship can be obtained, and a main language and a predicate are obtained; an object or the like can be obtained by the verb relationship and predicate. The dependency parsing tool used by the electronic device in the embodiment of the present application may be various, such as Hanlp, StanfordNLP, DDParser, LTP, and the like, and may be selected according to actual requirements.

S130: and replacing the content corresponding to the grammar component in the first instruction text with the corresponding standard content based on the grammar component to obtain a second instruction text.

It should be noted that there may be multiple syntax elements in the first instruction text. And the content replacement mode can be different for different grammar components. In the embodiment of the present application, content replacement is performed on the first instruction text based on the syntax components, which may be understood as content replacement performed on the first instruction text based on a content replacement mode corresponding to each syntax component. For example, if a predicate element and an object element are included in the first command text, the content corresponding to the predicate element in the first command text is replaced based on the content replacement method corresponding to the predicate element, and the content corresponding to the object element in the first command text is replaced based on the content replacement method corresponding to the object element. For another example, if a predicate element, an object element, and a state element are included in the first command text, the content corresponding to the predicate element in the first command text is replaced based on a content replacement method corresponding to the predicate element, the content corresponding to the object element in the first command text is replaced based on a content replacement method corresponding to the object element, and the content corresponding to the state element in the first command text is replaced based on a content replacement method corresponding to the state element.

Illustratively, the following table shows content substitution for a predicate element:

the generalized predicate refers to the contents of a predicate element that needs to be replaced in the first instruction text. And the standard predicate corresponding to the generalized predicate identifies the standard content after replacement.

For another example, the following table shows a content substitution mode corresponding to a shape language component:

after content replacement is performed on the first instruction text based on the grammar component, a second instruction text can be obtained. In the embodiment of the present application, the second instruction text is an instruction text for generating a control instruction. For example, the first instruction text obtained after converting the voice instruction into text may be "help me put yulochun", and then the second instruction text obtained after performing content replacement in the manner shown in the embodiment of the present application may be "click yulochun".

S140: and generating a control instruction based on the second instruction text.

After the second instruction text is obtained, semantic recognition can be performed on the second instruction text based on a preconfigured mode, so that a triple is obtained, and then a control instruction is generated based on the triple. Optionally, the intention, the control object and the object auxiliary information in the text may be extracted based on a Natural Language Understanding (NLU) manner, and integrated into a triple with a style of { action, object, information }. Wherein, the action represents the intention, or can be understood as the control purpose, the object represents the control object, and the information represents the object auxiliary information. For example, if the second instruction text is "play the displaying instruction". The way of understanding based on natural language can be understood that the user intention is: "play". The control object is 'a display instruction', the object auxiliary information is null, and the triple is recorded as: { Play, Chen-Emotion, Φ }. For another example, if the second instruction text is "help me search for the antique bureau of China", the intention is "search", the control object is "search", the object auxiliary information is "antique bureau of China", and the triple is: { search, antique bureau-in-bureau }.

S150: and executing the control instruction.

It should be noted that the purpose of the user triggering the voice instruction is generally to operate an operation object corresponding to the current user interface. The operation object may include the current user interface displayed by the electronic device itself, and may also include a control in the displayed interface. For example, if the operation object is the displayed current user interface, the interface may be controlled to slide by a voice instruction, or the current user interface may be triggered to exit from the displayed current user interface by the voice instruction, or the current user interface may be switched to another interface by the voice instruction. If the operation object is a control in the interface, the control can be clicked through a voice instruction.

The control instruction generated based on the second instruction text can be understood as an instruction that can be recognized and executed by the electronic device, and the control instruction functions to operate an operation object that the user desires to operate. Correspondingly, when the electronic device executes the control command generated based on the second command text, the electronic device operates the operation object desired by the user.

In the voice control method provided in this embodiment, after a voice instruction is received, the voice instruction is converted into a corresponding first instruction text, syntactic component analysis is performed on the first instruction text to obtain syntactic components of the first instruction text, and then, based on the syntactic components, content corresponding to the syntactic components in the first instruction text is replaced with corresponding standard content to obtain a second instruction text, and finally, a control instruction is generated based on the second instruction text, and the control instruction is executed. Therefore, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved.

Referring to fig. 4, a voice control method provided in the present application includes:

s210: in response to receiving a voice instruction, converting the voice instruction into a corresponding first instruction text.

S220: and analyzing grammatical components of the first instruction text to obtain grammatical components of the first instruction text.

S230: and if the grammar component represents that the first instruction text comprises a predicate component, replacing the content corresponding to the predicate component with the corresponding standard predicate content.

S240: and if the grammar component represents that the first instruction text also comprises a non-predicate component, replacing the content corresponding to the non-predicate component with the corresponding standard non-predicate content.

As one mode, the replacing the content corresponding to the non-predicate element with the corresponding standard non-predicate content includes: if the non-predicate elements comprise the state-word elements, replacing the content corresponding to the state-word elements with the corresponding standard state-word content to obtain standard non-predicate content; if the non-predicate elements comprise object elements, replacing the content corresponding to the object element with corresponding annotated object content to obtain standard non-predicate content; or if the non-predicate element comprises an object element and a state element, replacing the content corresponding to the state element with the corresponding standard object content, replacing the content corresponding to the state element with the corresponding standard state content to obtain standard non-predicate content, and obtaining standard non-predicate content based on the standard object content and the standard state content.

Similarly, there are many synonymous and near-sense expressions in terms of objects with noun structures, and generalization is required to obtain standard object contents. The following table shows a content substitution method corresponding to an object component:

the electronic device may also determine corresponding standard object content based on the similarity for the object components. It should be noted that, in the embodiments of the present application, the corresponding standard content may be determined based on the correspondence between the generalized content and the standard content. For example, the object component is replaced with a content according to the correspondence (for example, the table) between the generalized content corresponding to the object component and the standard object content. In one case, the content corresponding to the object component in the first instruction text may not yet exist in the correspondence relationship, and thus the content cannot be replaced with the corresponding standard object content. In this case, the electronic device may obtain the description information of the control included in the current user interface, and then perform similarity detection on the description information of the control and the content corresponding to the object component, and use the description information with the highest similarity as the standard object content.

In the embodiment of the present application, there may be a plurality of ways to calculate the similarity.

As a similarity calculation method, the electronic device may obtain, based on a longest common subsequence, a first reference similarity between the description information of the control in the current user interface and the content corresponding to the object component, to obtain a first reference similarity between the description information of each control and the content corresponding to the object component; acquiring second reference similarity of the description information of the control in the current user interface and the content corresponding to the object component based on the editing distance mode to obtain second reference similarity of the description information of each control and the content corresponding to the object component; and adding the first reference similarity and the second reference similarity of each control to obtain the similarity of the description information of each control and the content corresponding to the object component.

As another similarity calculation method, the electronic device may directly obtain a text vector of description information of a control in the current user interface based on the trained neural network model, obtain a text vector of content corresponding to an object component, and determine the similarity based on a distance between the text vectors.

Furthermore, in the embodiment of the present application, there may be multiple ways of obtaining the description information of the control in the current user interface.

Optionally, the electronic device may identify the current user interface through at least one of the following identification manners to obtain a control included in the current user interface and description information corresponding to the control: identifying the current user interface based on a code analysis mode; identifying the current user interface based on a teletext recognition approach (e.g., by optical character recognition); and identifying the current user interface based on an icon classification model.

As a mode, the electronic device may identify the current user interface by combining the three modes to obtain a control in the current user interface and description information corresponding to the control. For example, the electronic device may first identify the current user interface based on a code parsing manner, and if the identification is successful, may directly obtain a control included in the current user interface and description information corresponding to the control. If the identification is not successful, the electronic equipment can identify the current user interface by adopting an image-text identification mode or an icon classification model so as to obtain a control included in the current user interface and description information corresponding to the control.

As still another alternative, the replacing the content corresponding to the non-predicate element with the corresponding standard non-predicate content includes: if the non-predicate elements do not comprise the subject element and the object element, acquiring a current task scene; and replacing the content corresponding to the non-predicate element based on the task scene to obtain standard non-predicate content corresponding to the task scene.

It should be noted that, in the process of making a voice, the user may make the made voice more random due to the pronunciation habit problem, but the voice instruction corresponding to the more random voice may not enable the electronic device to accurately determine the control intention of the user. For example, if the content corresponding to the voice command itself is "next", the meaning corresponding to the next possible meaning may be next one, and the corresponding meaning may also be downloaded one. For example, the next possible meaning in an audio playback scenario may be the next one, e.g., playing the next song. In the software downloading scenario, the next possible corresponding meaning may be downloading one. For example, an application is downloaded.

In order to more accurately determine the real intention of a user, as a mode, after a first instruction text is obtained, the first instruction text is updated according to a task scene corresponding to a current user interface to obtain standard non-predicate content corresponding to the task scene. And the current user interface is the interface displayed when the voice instruction is acquired. For example, after obtaining the first instruction text with the content of the next music, if it is determined that the task scene corresponding to the current user interface is an audio playing scene, the electronic device may update the next music, and the updated standard non-predicate content may be the next music. If the task scene corresponding to the current user interface is determined to be an application program downloading scene, the next piece of music can be updated, and the updated standard non-predicate content can be a piece of music playing program downloaded.

As an embodiment, if the syntax element indicates that the first instruction text further includes a non-predicate element, before replacing the content corresponding to the non-predicate element with the corresponding standard non-predicate content, the method further includes:

s231: and if the grammar component represents that only the predicate component is included in the first instruction text, detecting whether the content corresponding to the predicate component represents that the overall operation is performed on the current user interface.

S232: and if the representation is to perform integral operation on the current user interface, generating a control instruction for executing the integral operation.

S233: and if the representation is not the integral operation of the current user interface and the representation is the operation of the control in the current user interface, generating a control instruction corresponding to the control.

It should be noted that, if only the predicate element exists in the first instruction text, the predicate element may be the control or the operation on the current user interface. Therefore, if the user expresses the contents of the first instruction text having only the predicate element through one complete sentence, the only predicate element should actually be an object element in the complete sentence. For example, if the first instruction text is "pause", the corresponding complete sentence thereof should be "click pause button", that is, the only predicate component "pause" in the first instruction text is actually the middle object component "pause button" of the corresponding complete sentence. Therefore, since the first instruction text may not constitute a complete sentence and the sentence component analysis method may fail, the electronic device may optionally determine that the first instruction text includes only the predicate component by the part of speech.

As described above, in the case where only predicate elements are present in the instruction text, the user may have two intentions. One possibility is to refer to a control, such as "play", "pause", "share", etc.; the other is to perform the whole operation on the page, such as left-hand stroke, return, exit, etc. In this case, the first instruction text may be converted into a corresponding complete sentence in the case where the first instruction text has only a predicate element, and then a triple may be generated based on the complete sentence, and then a corresponding control instruction may be generated based on the triple. For example, if the first instruction text is "pause", the converted complete sentence is "click pause button", and the triplet may be generated based on "click pause button".

S250: and obtaining a second instruction text based on the standard predicate content and the standard non-predicate content.

In the embodiment of the application, the process of obtaining the second instruction text may be understood as replacing nonstandard content in the first instruction text with standard content to obtain the second instruction text without changing the user intention corresponding to the first instruction text. The second instruction text obtained can be understood as the text that the electronic device can directly use for generating the control instruction. That is, in the embodiment of the present application, for the non-standard content, the electronic device cannot directly generate the control command, but the electronic device may generate the corresponding control command after replacing the non-standard content (for example, the generalized content shown in the table in this document) with the corresponding standard content. Alternatively, the electronic device may associate the standard content with the executable program code of the corresponding electronic device, and therefore, the electronic device may not be able to determine what kind of program code corresponds to the non-standard content. For example, a "stand-alone" for standard content may correspond to program code related to "Click". However, even if the control is performed for a single machine, the expressed content is nonstandard content such as "open" or "view", and the like, the control cannot be directly associated with the program code, and the control command cannot be accurately generated.

After the replacement of the content corresponding to the predicate element and the content corresponding to the non-predicate element is completed, the second instruction text may be further obtained according to the standard predicate content and the standard non-predicate content. For example, if the first instruction text is "help me press the rightmost button above". The grammatical component of the first instruction text obtained after the grammatical analysis can represent that the first instruction text comprises a predicate component and a state component, wherein the content corresponding to the predicate component is ' press ', the content corresponding to the state component is ' rightmost upper side ', the content corresponding to the predicate component is ' press ', the corresponding content corresponding to the predicate component is ' single machine ', the content corresponding to the state component is ' rightmost upper side ' is replaced by the corresponding standard predicate content, ' upper corner ', and the second instruction text obtained based on the standard predicate content and the standard non-predicate content is ' upper right corner ' of the single machine '.

S260: and generating a control instruction based on the second instruction text.

S270: and executing the control instruction.

Next, a flow according to an embodiment of the present application will be described with reference to fig. 5. As shown in fig. 5, after acquiring the voice instruction, the electronic device may perform voice recognition to obtain a first instruction text. The first instruction text is then parsed. After obtaining the syntax elements, it may be further determined whether a predicate exists in the first instruction text. If a predicate is determined, the contents corresponding to the predicate elements are generalized. The electronic device can generalize the contents corresponding to the predicate elements through the dictionary, so that the contents corresponding to the predicate elements are replaced with standard predicate contents. Further, the electronic device may determine whether or not only the content corresponding to the predicate element is present, and if not, may remove the content corresponding to the predicate element in the first instruction text and then perform non-predicate generalization on the remaining content corresponding to the non-predicate element. If only the contents corresponding to the predicate elements are present, the corresponding control command can be generated directly based on the contents of the predicate elements. When it is determined that there is no predicate element in the first command text, the first command text is directly subjected to non-predicate generalization.

According to the voice control method provided by the embodiment, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved. In addition, in this embodiment, different content replacement methods may be used according to whether a predicate element is included in a syntax element or not, and whether only a predicate element is included when a predicate element is included, so that flexibility and diversity of content replacement of syntax elements are improved, and finer granularity and more accurate content replacement are facilitated. In addition, the present embodiment can combine the application type and the state information (task scenario) to jointly decide and understand the instruction, and can solve the problem that the same instruction expresses different intentions in different contexts.

Referring to fig. 6, a voice control method provided in the present application includes:

s310: in response to receiving a voice instruction, converting the voice instruction into a corresponding first instruction text.

S320: and analyzing grammatical components of the first instruction text to obtain grammatical components of the first instruction text.

S330: and if the grammar component represents that the first instruction text does not comprise a predicate component and comprises a non-predicate component, replacing the content corresponding to the non-predicate component with the corresponding standard non-predicate content.

As one mode, if the non-predicate element includes a state element, replacing the content corresponding to the state element with the corresponding standard state content to obtain standard non-predicate content; if the non-predicate elements comprise object elements, replacing the content corresponding to the object element with corresponding annotated object content to obtain standard non-predicate content; or if the non-predicate element comprises an object element and a state element, replacing the content corresponding to the state element with the corresponding standard object content, replacing the content corresponding to the state element with the corresponding standard state content to obtain standard non-predicate content, and obtaining standard non-predicate content based on the standard object content and the standard state content.

S340: and obtaining a second instruction text based on the default predicate content and the standard non-predicate content.

S350: and generating a control instruction based on the second instruction text.

S360: and executing the control instruction.

According to the voice control method provided by the embodiment, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved. In addition, in this embodiment, when it is detected that there is no predicate in the instruction text, targeted replacement can be performed on the non-predicate elements, thereby improving flexibility and diversity of syntax element content replacement.

Referring to fig. 7, a voice control method provided in the present application includes:

s410: in response to receiving a voice instruction, converting the voice instruction into a corresponding first instruction text.

S420: and analyzing grammatical components of the first instruction text to obtain grammatical components of the first instruction text.

S430: and if the grammar component comprises a tone word component, removing the content corresponding to the tone word component in the first instruction text to obtain the instruction text from which the tone word is removed.

S440: and replacing the content corresponding to the entity grammar component in the instruction text without the tone words with the corresponding standard content to obtain a second instruction text, wherein the entity grammar component is the component left after the tone word component is removed from the grammar component.

S450: and generating a control instruction based on the second instruction text.

S460: and executing the control instruction.

As one way, the replacing, based on the grammar component, the content corresponding to the grammar component in the first instruction text with the corresponding standard content to obtain a second instruction text includes: and replacing the content corresponding to the grammar component in the first instruction text with corresponding standard content based on the grammar component and a dictionary to obtain a second instruction text, wherein the dictionary records a content replacement relation. Optionally, the method further includes: acquiring a current position; and acquiring a dictionary corresponding to the current position.

It should be noted that the dictionary may store a correspondence between the generalized content and the standard content corresponding to each grammar component. Moreover, the accent or language expression habits of users in different regions are different, so that corresponding dictionaries can be established for different regions, and the dictionaries in different regions can be different, so that the method can better adapt to the users in different regions, and further improve the accuracy of obtaining the actual control intention of the users. Furthermore, the electronic device can update the dictionary, so that the latest expression habit of the user can be better adapted.

According to the voice control method provided by the embodiment, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved. In addition, in this embodiment, the tone word in the instruction text may be removed first, so as to facilitate improving the accuracy of the generated control instruction. Moreover, in this embodiment, by constructing the dictionary, the problem of diversity of chinese language expressions can be solved, which mainly includes that the voice command is not standardized, such as special sentences like default sentence structure, inversion/question, synonyms/synonyms and different titles of the same entity in different regions, and the user refers to the entity by abbreviation/alternative name. And because the dictionary is used, a large-scale pre-training model is not needed, and the method does not bring too high computational load, so that the method can be directly deployed on the end-side equipment. Meanwhile, for the real-time updated object entity (the content in the dictionary), the configuration can be carried out in a mode of extending the dictionary at irregular intervals, the training and fine adjustment of a large number of linguistic data of the model are not required to be carried out repeatedly, and the iteration period is shortened.

Referring to fig. 8, the present application provides a voice control apparatus, where the apparatus 500 includes:

an instruction converting unit 510, configured to, in response to receiving a voice instruction, convert the voice instruction into a corresponding first instruction text;

a syntactic component analyzing unit 520, configured to perform syntactic component analysis on the first instruction text to obtain a syntactic component of the first instruction text;

the instruction processing unit 530 is configured to replace, based on the syntax component, content in the first instruction text corresponding to the syntax component with corresponding standard content to obtain a second instruction text;

an instruction generating unit 540, configured to generate a control instruction based on the second instruction text;

a control unit 550, configured to execute the control instruction.

As one mode, the instruction processing unit 530 is specifically configured to, if the syntax component indicates that a predicate component is included in the first instruction text, replace a content corresponding to the predicate component with a corresponding standard predicate content; if the grammar component represents that the first instruction text also comprises a non-predicate component, replacing the content corresponding to the non-predicate component with the corresponding standard non-predicate content; and obtaining a second instruction text based on the standard predicate content and the standard non-predicate content.

Optionally, the instruction processing unit 530 is specifically configured to, if the non-predicate element includes a state element, replace a content corresponding to the state element with a corresponding standard state content to obtain a standard non-predicate content; if the non-predicate elements comprise object elements, replacing the content corresponding to the object element with corresponding annotated object content to obtain standard non-predicate content; or if the non-predicate element comprises an object element and a state element, replacing the content corresponding to the state element with the corresponding standard object content, replacing the content corresponding to the state element with the corresponding standard state content to obtain standard non-predicate content, and obtaining standard non-predicate content based on the standard object content and the standard state content.

Optionally, the instruction processing unit 530 is specifically configured to, if the non-predicate element does not include a subject element and a subject element, obtain a current task scene; and replacing the content corresponding to the non-predicate element based on the task scene to obtain standard non-predicate content corresponding to the task scene.

As an embodiment, the instruction generating unit 540 is specifically configured to, before replacing the content corresponding to the non-predicate element with the corresponding standard non-predicate content if the syntax element indicates that the first instruction text further includes a non-predicate element, detect whether the content corresponding to the predicate element is indicated as performing an overall operation on the current user interface if the syntax element indicates that only a predicate element is included in the first instruction text; if the representation is to carry out integral operation on the current user interface, generating a control instruction for executing the integral operation; and if the representation is not the integral operation of the current user interface and the representation is the operation of the control in the current user interface, generating a control instruction corresponding to the control.

As an embodiment, the instruction processing unit 530 is specifically configured to, if the syntax element indicates that no predicate element is included in the first instruction text and a non-predicate element is included in the first instruction text, replace a content corresponding to the non-predicate element with a corresponding standard non-predicate content; and obtaining a second instruction text based on the default predicate content and the standard non-predicate content.

As a mode, the instruction processing unit 530 is specifically configured to, if the grammar component includes a tone word component, remove content corresponding to the tone word component in the first instruction text, to obtain an instruction text from which the tone word is removed; and replacing the content corresponding to the entity grammar component in the instruction text without the tone words with the corresponding standard content to obtain a second instruction text, wherein the entity grammar component is the component left after the tone word component is removed from the grammar component.

As one manner, the instruction processing unit 530 is specifically configured to replace, based on the syntax component and a dictionary, a content in the first instruction text corresponding to the syntax component with a corresponding standard content to obtain a second instruction text, where a content replacement relationship is recorded in the dictionary. Optionally, the instruction processing unit 530 is further specifically configured to obtain the current location; and acquiring a dictionary corresponding to the current position.

In the voice control device provided in this embodiment, after a voice instruction is received, the voice instruction is converted into a corresponding first instruction text, syntactic component analysis is performed on the first instruction text to obtain syntactic components of the first instruction text, and then, based on the syntactic components, content corresponding to the syntactic components in the first instruction text is replaced with corresponding standard content to obtain a second instruction text, and finally, a control instruction is generated based on the second instruction text, and the control instruction is executed. Therefore, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In several embodiments provided herein, the coupling of modules to each other may be electrical. In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

An electronic device provided by the present application will be described below with reference to fig. 9.

Referring to fig. 9, based on the voice control method and apparatus, an electronic device 1000 capable of executing the voice control method is further provided in the embodiment of the present application. The electronic device 1000 includes one or more processors 102 (only one shown), a memory 104, a camera 106, and an audio capture device 108 coupled to each other. The memory 104 stores programs that can execute the content of the foregoing embodiments, and the processor 102 can execute the programs stored in the memory 104.

Processor 102 may include one or more processing cores, among other things. The processor 102 interfaces with various components throughout the electronic device 1000 using various interfaces and circuitry to perform various functions of the electronic device 1000 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 104 and invoking data stored in the memory 104. Alternatively, the processor 102 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 102 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 102, but may be implemented by a communication chip. By one approach, the processor 102 may be a neural network chip. For example, it may be an embedded neural network chip (NPU).

The Memory 104 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 104 may be used to store instructions, programs, code sets, or instruction sets. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like.

Furthermore, the electronic device 1000 may further include a network module 110 and a sensor module 112 in addition to the aforementioned components.

The network module 110 is used for implementing information interaction between the electronic device 1000 and other devices, for example, transmitting a device control instruction, a manipulation request instruction, a status information acquisition instruction, and the like. When the electronic device 200 is embodied as a different device, the corresponding network module 110 may be different.

The sensor module 112 may include at least one sensor. Specifically, the sensor module 112 may include, but is not limited to: levels, light sensors, motion sensors, pressure sensors, infrared heat sensors, distance sensors, acceleration sensors, and other sensors.

Among other things, the pressure sensor may detect the pressure generated by pressing on the electronic device 1000. That is, the pressure sensor detects pressure generated by contact or pressing between the user and the electronic device, for example, contact or pressing between the user's ear and the mobile terminal. Thus, the pressure sensor may be used to determine whether contact or pressure has occurred between the user and the electronic device 1000, as well as the magnitude of the pressure.

The acceleration sensor may detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when stationary, and may be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for recognizing the attitude of the electronic device 1000, and related functions (such as pedometer and tapping) for vibration recognition. In addition, the electronic device 1000 may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer and a thermometer, which are not described herein again.

And the audio acquisition device 110 is used for acquiring audio signals. Optionally, the audio capturing device 110 includes a plurality of audio capturing devices, and the audio capturing devices may be microphones.

As one mode, the network module of the electronic device 1000 is a radio frequency module, and the radio frequency module is configured to receive and transmit electromagnetic waves, and implement interconversion between the electromagnetic waves and electrical signals, so as to communicate with a communication network or other devices. The radio frequency module may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and so forth. For example, the radio frequency module may interact with an external device through transmitted or received electromagnetic waves. For example, the radio frequency module may send instructions to the target device.

Referring to fig. 10, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 800 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.

The computer-readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 800 includes a non-volatile computer-readable storage medium. The computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 810 may be compressed, for example, in a suitable form.

In summary, according to the voice control method, the voice control device and the electronic device provided by the application, after a voice instruction is received, the voice instruction is converted into a corresponding first instruction text, grammatical component analysis is performed on the first instruction text to obtain grammatical components of the first instruction text, then based on the grammatical components, content corresponding to the grammatical components in the first instruction text is replaced by corresponding standard content to obtain a second instruction text, and finally, a control instruction is generated based on the second instruction text and executed. Therefore, after the grammatical component of the instruction text is analyzed, the content in the instruction text can be replaced by the corresponding standard content based on the grammatical component of the instruction text, so that the electronic equipment can more accurately determine the control intention of the user based on the replaced standard content, and the accuracy of the control process is improved.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. a voice control method, is characterized in that, described method comprises:

In response to receiving the voice command, converting the voice command into corresponding first command text;

performing grammatical component analysis on the first instruction text to obtain grammatical components of the first instruction text;

Based on the grammatical components, the content corresponding to the grammatical components in the first instruction text is replaced with corresponding standard content to obtain a second instruction text;

generating a control instruction based on the second instruction text;

Execute the control instruction.

2 . The method according to claim 1 , wherein, based on the grammatical components, the content corresponding to the grammatical components in the first instruction text is replaced with corresponding standard content, so as to obtain the second Instruction text, including:

If the grammatical component indicates that the first instruction text includes a predicate component, replace the content corresponding to the predicate component with the corresponding standard predicate content;

If the grammatical component indicates that the first instruction text also includes a non-predicate component, replace the content corresponding to the non-predicate component with the corresponding standard non-predicate content;

A second instruction text is obtained based on the standard predicate content and the standard non-predicate content.

3. The method according to claim 2, wherein, replacing the content corresponding to the non-predicate component with corresponding standard non-predicate content, comprising:

If the non-predicate component includes an adverbial component, replace the content corresponding to the adverbial component with the corresponding standard adverbial content to obtain standard non-predicate content;

If the non-predicate component includes an object component, replace the content corresponding to the adverbial component with the corresponding marked object content to obtain standard non-predicate content; or

If the non-predicate component includes an object component and an adverbial component, the content corresponding to the adverbial component is replaced with the corresponding standard object content, and the content corresponding to the adverbial component is replaced with the corresponding standard adverbial content to obtain the standard For non-predicate content, standard non-predicate content is obtained based on the standard object content and the standard adverbial content.

4. The method according to claim 2, characterized in that, replacing the content corresponding to the non-predicate component with corresponding standard non-predicate content, comprising:

If the non-predicate component does not include an adverbial component and an object component, obtain the current task scene;

The content corresponding to the non-predicate component is replaced based on the task scenario to obtain standard non-predicate content corresponding to the task scenario.

5 . The method according to claim 2 , wherein, if the grammatical component indicates that the first instruction text further includes a non-predicate component, the content corresponding to the non-predicate component is replaced with a corresponding one. 6 . Standard non-predicate content is also preceded by:

If the grammatical component represents that the first instruction text only includes a predicate component, detecting whether the content corresponding to the predicate component represents an overall operation on the current user interface;

If the representation is to perform an overall operation on the current user interface, generating a control instruction for performing the overall operation;

If the representation is not an overall operation on the current user interface and the representation is an operation on a control in the current user interface, a control instruction corresponding to the control is generated.

6 . The method according to claim 1 , wherein, based on the grammatical component, the content corresponding to the grammatical component in the first instruction text is replaced with corresponding standard content, so as to obtain the second Instruction text, including:

If the grammatical component indicates that the first instruction text does not include a predicate component and includes a non-predicate component, replace the content corresponding to the non-predicate component with the corresponding standard non-predicate content;

The second instruction text is obtained based on the default predicate content and the standard non-predicate content.

7 . The method according to claim 1 , wherein, based on the grammatical component, the content corresponding to the grammatical component in the first instruction text is replaced with corresponding standard content, so as to obtain the second Instruction text, including:

If the grammatical component includes a modal particle component, remove the content corresponding to the modal particle component in the first instruction text to obtain an instruction text with the modal particle removed;

Replace the content corresponding to the entity grammar component in the instruction text with the modal particle removed with the corresponding standard content, so as to obtain a second instruction text, the entity grammar component is the grammatical component remaining after the modal particle component is removed. Element.

8 . The method according to claim 1 , wherein, based on the grammatical component, the content corresponding to the grammatical component in the first instruction text is replaced with corresponding standard content, so as to obtain the second Instruction text, including:

Based on the grammatical component and the dictionary, the content corresponding to the grammatical component in the first instruction text is replaced with corresponding standard content, so as to obtain a second instruction text, wherein a content substitution relationship is recorded in the dictionary.

9. The method according to claim 8, wherein the method further comprises:

get the current location;

Get a dictionary corresponding to the current location.

10. A voice control device, characterized in that the device comprises:

an instruction conversion unit, configured to convert the voice instruction into a corresponding first instruction text in response to receiving a voice instruction;

a grammatical component analysis unit, configured to perform grammatical component analysis on the first instruction text to obtain grammatical components of the first instruction text;

an instruction processing unit, configured to replace, based on the grammatical component, the content corresponding to the grammatical component in the first instruction text with corresponding standard content to obtain a second instruction text;

an instruction generation unit, configured to generate a control instruction based on the second instruction text;

The control unit is used for executing the control instruction.

11. An electronic device, comprising one or more processors and a memory;

One or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs being configured to perform the method of any of claims 1-9.

12 . A computer-readable storage medium, wherein a program code is stored in the computer-readable storage medium, wherein the method according to any one of claims 1-9 is executed when the program code is executed. 13 .

13. A computer program product, comprising a computer program/instruction, characterized in that, when the computer program/instruction is executed by a processor, the steps of any one of the methods of claims 1-9 are implemented.