CN115373280B

CN115373280B - Remote voice control method, device and system

Info

Publication number: CN115373280B
Application number: CN202110549978.3A
Authority: CN
Inventors: 杜兆臣; 孟卫明; 王彦芳
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2024-12-13
Anticipated expiration: 2041-05-20
Also published as: CN115373280A

Abstract

The present application discloses a remote voice control method, device and system for performing identity verification on the voiceprint of voice commands to improve the security of remote voice control. The present application provides a remote voice control method, comprising: when a user successfully logs in to a local server through a terminal, receiving a user voice command sent by the terminal; performing voiceprint verification on the voice command, and when the verification passes, remotely controlling the smart home device through the voice command.

Description

Remote voice control method, device and system

Technical Field

The application relates to the technical field of intelligent home, in particular to a remote voice control method, device and system.

Background

Along with the development of science and technology, the remote control of voice also starts to appear slowly, and the user utilizes the cell-phone to send out voice command to remote intelligent household equipment etc..

However, the existing remote voice control only transmits voice instructions, does not perform voice instruction identity verification, and has the unsafe problem of voice instructions.

Disclosure of Invention

The embodiment of the application provides a remote voice control method, a device and a system, which are used for verifying the identity of voiceprints of voice instructions and improving the safety of remote voice control.

The remote voice control method provided by the embodiment of the application comprises the following steps:

When a user successfully logs in a local server through a terminal, receiving a user voice instruction sent by the terminal;

And carrying out voiceprint verification on the voice command, and realizing remote control on the intelligent household equipment through the voice command when verification is passed.

According to the method, when a user successfully logs in a local server through a terminal, a user voice command sent by the terminal is received, voiceprint verification is conducted on the voice command, and when verification is passed, remote control on intelligent home equipment is achieved through the voice command, so that identity verification on voiceprints of the voice command is achieved, and safety of remote voice control is improved.

Optionally, the method further comprises:

Receiving face images and voice information acquired by a user through a terminal;

Extracting face features of the face image and voiceprint features of the voice information;

and fusing the voiceprint features with the face features, and judging whether the user can log in a local server or not based on a fusion result.

Aiming at the problem that individual face verification of a user can be cracked by a photo or a mask, and voice prints have identification holes for recorded and cloned voices, the embodiment of the application adopts the fusion of face features and voice print features, and increases the reliability of user login verification information.

Optionally, fusing the voiceprint feature with the face feature, and judging whether the user can log in a local server based on a fusion result, which specifically includes:

Constructing a fusion feature vector based on the vector of the face feature and the vector of the voiceprint feature;

Determining a weight matrix of the fusion feature vector;

And adding each characteristic weight in the weight matrix to obtain a value finally, and if the value belongs to a preset value range, determining that the user successfully logs in the local server through the terminal.

Optionally, the remote control of the smart home device is realized through the voice command, which specifically includes:

And respectively transmitting the voice command to a coordinator and a sound box module, controlling a power switch of the intelligent household equipment through the coordinator, performing command classification transmission through the sound box module according to keywords of the intelligent household equipment contained in the voice command, transmitting the voice command to a sound box which is correspondingly arranged for the intelligent household equipment corresponding to the keywords, and playing the voice command to the intelligent household equipment corresponding to the keywords through the sound box.

Aiming at the problem that the remote voice control household equipment instruction cannot be fed back in time, the embodiment of the application classifies the user instruction, realizes the bidirectional communication between the user and the local server and between the local server and the intelligent household equipment, and realizes the timely feedback and optimization of the remote control household equipment information.

Optionally, the method further comprises:

And when the voice instruction cannot be executed by the corresponding intelligent home equipment, feeding back error instruction prompt information to the user terminal.

Optionally, when the verification passes, the method further comprises:

and if the voice command cannot be processed offline at the local server, invoking a cloud server to process the voice command.

The embodiment of the application provides a remote voice control device, which comprises:

A memory for storing program instructions;

And the processor is used for calling the program instructions stored in the memory and executing any one of the methods according to the obtained program.

The embodiment of the application provides a remote voice control system, which comprises a remote voice control device, a coordinator and a sound box module which are respectively connected with the remote voice control device, and at least one sound box which is connected with the sound box module,

The coordinator is used for receiving the voice command sent by the remote voice control device and controlling the power switch of the intelligent household equipment based on the voice command;

The sound box module is used for determining keywords of the intelligent household equipment contained in the voice command sent by the remote voice control device, sending the voice command to a sound box corresponding to the intelligent household equipment corresponding to the keywords, and playing the voice command through the sound box.

Optionally, the system further comprises an intelligent home device corresponding to each sound box.

Another embodiment of the present application provides a computing device including a memory for storing program instructions and a processor for invoking program instructions stored in the memory to perform any of the methods described above in accordance with the obtained program.

Another embodiment of the present application provides a computer storage medium storing computer-executable instructions for causing the computer to perform any of the methods described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an overall system framework provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of voiceprint feature collection and recognition provided by an embodiment of the present application;

fig. 3 is a schematic diagram of face feature collection and recognition according to an embodiment of the present application;

Fig. 4 is a schematic diagram of face and voiceprint feature fusion provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a user identity registration and identity registration process according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a frame for voice signal remote transmission and voice command analysis according to an embodiment of the present application;

Fig. 7 is a schematic diagram of a remote voice control smart home device framework according to an embodiment of the present application;

fig. 8 is a flow chart of a remote voice control method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a remote voice control device according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of another remote voice control device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiment of the application provides a remote voice control method and a remote voice control device, which are used for verifying the identity of a user voice command sender, increasing the reliability of user login verification information and improving the safety of remote voice control.

The method and the device are based on the same application, and because the principles of solving the problems by the method and the device are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.

Various embodiments of the application are described in detail below with reference to the drawings attached to the specification. It should be noted that, the display sequence of the embodiments of the present application only represents the sequence of the embodiments, and does not represent the advantages or disadvantages of the technical solutions provided by the embodiments.

The embodiment of the application provides a remote voice control intelligent home system which is reliable for users, and the system specifically comprises the following steps:

1. Aiming at the problems that the existing remote voice control only transmits voice instructions and does not carry out voice instruction identity verification and the voice instructions are unsafe, the embodiment of the application divides voice signals into voiceprint signals and voice instructions and carries out identity verification on each step of voice of a user;

2. Aiming at the problem that individual face verification of a user can be cracked by a photo or a mask, and voice prints have identification holes for recorded and cloned voice, the embodiment of the application adopts the fusion of face features and voice print features, so that the reliability of user login verification information is improved;

3. aiming at the problem that the remote voice control household equipment instruction cannot be fed back in time, the embodiment of the application classifies the user instruction, realizes the bidirectional communication between the user and the local server, and between the local server and the intelligent household equipment, and realizes the timely feedback and optimization of the remote control household equipment information.

In summary, the embodiment of the application has the following advantages:

1. The voiceprint and human face fusion characteristics are combined, a user registration and login page is constructed, a user login verification function of the fusion characteristics is realized, loopholes of human face independent verification or voiceprint independent verification are avoided, and the security of a login system is improved;

2. After the user logs in successfully, each voice is analyzed, and the voiceprint is checked for the first time, so that each piece of information is checked, the validity of monitoring the user behavior in real time is realized, and the safety of remote voice control is further ensured;

3. The intelligent home hardware system is constructed, user information is played to related equipment through the loudspeaker function of the intelligent sound box, the related equipment realizes related voice operation, information can be fed back to the user in time, and the realization and feedback of the remote control function are realized.

The embodiment of the application mainly relates to a remote control system based on real-time monitoring. A new remote instruction recognition mode can be provided for the user, and the privacy and property safety of the user are protected. The system framework provided by the embodiment of the application is shown in fig. 1, and the main contents include:

(1) And collecting voice print characteristics of the user and constructing a voice print database.

(2) And collecting the face characteristics of the user and constructing a face database.

(3) And fusing the voiceprint features with the face features.

(4) And constructing a user identity registration system and an identity login system.

(5) And a voice signal remote transmission and voice command analysis system.

(6) And (3) remote voice control intelligent household equipment frame design.

With respect to the above, specific embodiments are described as follows:

(1) Collecting voice print characteristics of a user and constructing a voice print database:

The embodiment of the application relates to voiceprint feature collection and recognition, which are shown in fig. 2 and comprise five stages of voice input, preprocessing, voice feature extraction, classifier and voiceprint database construction.

And voice input, namely designating relevant texts through a mobile terminal, and collecting user voices by reading the texts by a user.

Preprocessing, namely obtaining a sound fragment in original voice through time domain analysis, carrying out pre-emphasis treatment through a high-pass filter to improve the recognition effect of the voice frequency, carrying out framing through a window function to obtain an audio frame, detecting short-time energy and short-time zero-crossing rate of the audio frame, and determining a voice signal starting point and a voice signal ending point through setting a threshold.

Wherein the short-time energy represents the magnitude of the energy of each frame of speech signal.

The short-time zero-crossing rate represents the number of times the waveform of each frame of voice signal passes through the zero axis.

Regarding the setting of the threshold, the average short-time zero-crossing rate and short-time energy are used as the threshold of the voice signal end point detection, which is a basic double-threshold end point detection algorithm, wherein the end point detection effect is best when the high threshold of the short-time energy is set to be 4-5 times of the low threshold and the high threshold of the short-time zero-crossing rate is set to be about 2 times of the low threshold.

And extracting voice features, namely extracting the voice features by using a Mel Frequency Cepstrum Coefficient (MFCC) and a first-order difference coefficient, processing the preprocessed voice signals into feature vectors by a processing method of adding a Gaussian mixture model and a background model, and forming voice print features with low dimensionality and distinction by dimension reduction of the feature vectors.

Wherein the mel-frequency cepstrum coefficient (MFCC) is a branching frequency coefficient based on sensory judgment of pitch change of human ears and the like.

The Gaussian mixture model precisely quantizes objects by utilizing a Gaussian probability density function, and decomposes the objects into a plurality of models formed based on the Gaussian probability density function.

The background model refers to the background sound of the voice, and can remove the background sound by making difference between two adjacent frames of the voice signal, thereby avoiding interference.

The first-order difference coefficient is formed by making a difference between characteristic parameters of a next frame and a previous frame between two continuous frames, so that the connection between the current voice frame and the previous frame is reflected.

The feature vector is a vector formed by a plurality of parameters describing the features of the voice signal.

The dimension reduction is to reduce the high-dimension feature vector to a low dimension by an algorithm (e.g., PCA).

The classifier is characterized in that a voice database VoxForge is selected, and the voice characteristic vector is utilized to construct a classifier for identifying the voice pattern by using a support vector product (SVM).

The speech database VoxForge: VOxForge is an open-source speech corpus and an acoustic model library, is commonly used in academic, and has strong robustness to test speech models under different intonation and accent environments.

The Support Vector Machine (SVM) is a classifier for binary classification of data according to a supervised learning mode.

And constructing a voiceprint database by using the classifier for voiceprint recognition in the last step. And then inputting voice again, and finding the voice print information of the user in the voice print database through signal processing, feature extraction and recognition to complete matching.

(2) Collecting the face characteristics of a user and constructing a face database:

The face feature collection and recognition of the embodiment of the application are shown in fig. 3, and are divided into five stages of face collection, image preprocessing, image feature extraction, classifier and face database construction.

Face acquisition, namely acquiring face images through a camera at a mobile end;

And (3) image preprocessing, namely carrying out gray-scale processing on the collected face image data, then carrying out filtering processing on the image to realize noise reduction, and then carrying out histogram equalization on the image to enhance the characteristics of the image.

The feature extraction of the image, namely describing the feature of the face image by using a direction gradient Histogram (HOG), dividing the face image into 3 multiplied by 3 sub-blocks, extracting the HOG feature of each sub-block, and then carrying out dimension reduction treatment on the HOG feature of high dimension by using a Principal Component Analysis (PCA);

The classifier is that a face database CAS-PEAL is selected, the image features extracted in the last step are used as the classifying basis, and a face recognition classifier is constructed through a Support Vector Machine (SVM);

And constructing a face recognition database by the face recognition classifier obtained in the previous step, inputting a face image again, and finding the voiceprint information of the user in the face database through image preprocessing, feature extraction and recognition to complete matching.

(3) And fusing the voiceprint features with the face features.

In the embodiment of the application, the fusion of the face features and the voiceprint features is shown in fig. 4, and the feature level fusion of the voiceprint features and the face features is performed, and the method is divided into three parts, namely, the construction of fusion feature vectors, the setting of weight values for each dimension and the identification.

A fused feature vector is constructed, wherein f= (F ₁,f₂,f₃,…,f_m) represents the feature vector of the face, and F ₁,f₂,f₃,…,f_m represents a feature value describing each feature of the face. The feature vector of the voiceprint is represented by v= (V ₁,v₂,v₃,…,v_n), where V ₁,v₂,v₃,…,v_n represents a feature value describing each feature of the voiceprint. After dimension normalization, F, V is fused into a new feature vector s= (S ₁,s₂,s₃,…,s_m+n), wherein S ₁,s₂,s₃,…,s_m+n represents a feature value of each feature describing the fused feature of the face and the voiceprint;

The weight determination is performed by selecting two sets of fused feature vectors S₁＝(s₁₁,s₁₂,s₁₃,…,s_1(m+n))、S₂＝(s₂₁,s₂₂,s₂₃,…,s_2(m+n)),, wherein s ₁₁,s₁₂,s₁₃,…,s_1(m+n) represents the feature value of each feature of the first set of fused features describing the face and the voiceprint, and s ₂₁,s₂₂,s₂₃,…,s_2(m+n represents the feature value of each feature of the second set of fused features describing the face and the voiceprint. The average distance between S1 and S2 is calculated according to the following formula:

calculating the weight of the feature vector of the fusion feature of each face and voiceprint according to the following formula:

Obtaining a weight matrix W= (W ₁,w₂,w₃,…,w_m+n), repeating the calculation process, obtaining a plurality of different weight matrixes after calculation for a plurality of times, and then taking the average value of the weight matrixes to determine the final weight;

And (3) obtaining a weight matrix through the step (3), wherein each element of the weight matrix represents a characteristic weight, adding the characteristic weights to finally obtain a numerical value, obtaining a recognition result according to the numerical value range of the numerical value, and considering that the recognition is successful if the recognition result is eighty percent matched.

Specifically, for example, the obtained fused feature vector value is s= (0.8,0.7,0.5), the weight matrix of each feature is w= (0.5,0.4,0.1), and the feature weights are added to calculate the matching degree p=0.8×0.5+0.7×0.4+0.5×0.1=0.73. If the setting standard is that P is greater than or equal to 0.8, the matching is successful, the recognition result is 0.73 to less than 0.8, and the recognition is unsuccessful, namely the recognition is failed.

(4) Constructing a user identity registration system and an identity login system:

User identity registration and identity registration system as shown in fig. 5, the embodiment of the application is to construct a user identity registration system and an identity registration system, and is divided into two parts of voiceprint and face feature training and identity registration verification.

The voice print and human face feature training comprises the steps of collecting voice print and human face information of family members of a user, transmitting the collected voice print and human face information to a local server database of the family in a personal unit, extracting features of the voice print and the human face, carrying out training on the voice print and the human face in a local server after feature fusion, and storing training results to a local hard disk after the training is finished;

The user logs in at the mobile terminal, the mobile terminal inputs the information of the face and the voiceprint of the user login into the home local server, the data of the face and the voiceprint in the hard disk of the local server are called for matching, if the matching is successful, the user login is successful, otherwise, the login is failed;

The embodiment of the application aims to realize information registration of the user, when the user wants to remotely control the household equipment, the user needs to log in the mobile terminal App first, and the mobile terminal App login needs to synchronously verify voiceprint and human face, so that the reliability of the user login identity is ensured.

(5) Voice signal remote transmission and voice command analysis system:

In the embodiment of the application, a frame for voice signal remote transmission and voice instruction analysis is shown in fig. 6, and is divided into three parts of voice signal remote transmission, voice signal data analysis and identity verification and voice instruction transmission.

Voice signal remote transmission, namely inputting a voice signal at a mobile terminal, and transmitting the voice signal to a local server;

After obtaining the voice signal, dividing the voice data into voice print data and voice instruction data, firstly transmitting the voice print data to a local server for voice print database verification, if the voice print exists in the local database, the verification is passed, otherwise, the verification is failed;

And transmitting the voice command, namely transmitting the voice command to relevant intelligent household equipment through a local server after the voiceprint information passes verification, so as to realize remote control of voice.

In the embodiment of the application, after the user successfully logs in, each instruction of the user is further checked, so that the sender of each instruction of the user is ensured to be a registrant of a family member of the user, and the user is ensured to remotely control each instruction of the intelligent home to be safe and reliable.

(6) Remote voice control intelligent home equipment frame design:

The remote voice control intelligent home equipment framework designed by the embodiment of the application is shown in fig. 7, wherein a dotted line frame represents an intelligent home appliance and a loudspeaker box device serving as a loudspeaker for the intelligent home appliance, for example, a loudspeaker box 1 is used as a loudspeaker of an indoor environment acquisition module, the indoor environment acquisition module comprises a humidifier, an air purifier and the like, a loudspeaker box 2 is used as a loudspeaker of a home equipment module, the home equipment module comprises an air conditioner, a refrigerator, a television and the like, and a loudspeaker box 3 is used as a loudspeaker of a basic module, and the basic module comprises a lamp, a curtain and the like. The security module in fig. 7 only represents the integrity of the home device and is not necessarily within the scope of remote control.

According to the remote voice control intelligent home equipment framework designed by the embodiment of the application, a user inputs voice information to a local server at a terminal, the local server processes information which can be processed offline locally, and a cloud terminal is required to call related services for information which cannot be processed locally.

The services that need to be acquired in the cloud include, for example:

internet resources such as songs, sounds, etc.;

the voice recognition, the local server has insufficient computing power, and the computing power of the cloud server needs to be utilized, for example, voice cloning, voice synthesis and the like.

The voice module in the local server can realize bidirectional communication, the processing result of voice signals input by a user (for example, the user inputs a curtain opening operation, the local server performs the curtain opening operation and then transmits the curtain opening result to the user) to the coordinator and the voice box module respectively through the control module in the local server (the voice box module can be independently used as an intelligent voice box to realize basic intelligent voice box operation and can also transmit information of a household appliance controlled by the user), the coordinator is responsible for opening a power switch of the intelligent household related equipment to realize equipment power-on (if the voice box is powered on, the operation is not needed), the voice box module adopts distributed voice boxes (the voice boxes can be deployed in a plurality of rooms to realize communication), the voice box module receives related instructions, and performs instruction classification transmission (for example, the curtain opening operation and the curtain opening operation are carried out) according to keywords of household appliances contained in the related instructions, the instructions are transmitted to the voice boxes close to corresponding equipment, namely, different voice instructions are transmitted to different voice boxes to play, and the voice boxes 1,2 and 3 play the voice boxes when the voice boxes serve as the functions of the speakers, the voice boxes play the related instructions, and the voice boxes play the voice instructions, the voice boxes are played by the voice boxes, the voice boxes are correspondingly controlled by the voice boxes.

When the voice module in the local server detects that the voice command sent by the user terminal is not the command of the intelligent household equipment, the voice module feeds back to the user terminal and prompts the user terminal that the command fails.

In the embodiment of the application, the remote voice control intelligent home is divided into three modules according to the type of the received instruction:

The basic module comprises a lamp and a curtain, only receives related instructions of opening and closing, and returns other instructions (other than the instructions of opening and closing) to the local server, and the local server feeds back the instructions to a user to prompt the user that the instructions fail;

The indoor environment acquisition module comprises a humidifier and an air purifier, receives on-off and mode instructions, returns to a local server when other instructions appear, and feeds back to a user, wherein the other instructions, namely the instructions which cannot be identified by the indoor environment acquisition module, return to the local server, namely the instructions which cannot be executed by the indoor environment acquisition module, return to a signal which cannot be executed, and the signal returns to the local server;

The household equipment module comprises an air conditioner, a television, a refrigerator and the like, and other complex instructions except for instructions such as on-off, modes and the like are also related, if the related instructions do not exist, the related instructions are fed back to a user through a local server, the related instructions do not exist, namely, the instructions which cannot be executed by the household equipment module are fed back to the user through the local server, an unexecutable signal is fed back, and the signal is fed back to the local server and fed back to the user through the local server.

The above classification is merely an example, and the technical solution provided by the embodiment of the present application is not limited to this classification mode.

In summary, in order to solve the problem that the local server cannot process the relevant voice instruction, the local server protects the privacy of the user to a certain extent, and the local server transmits part of the instruction to the cloud end and can ensure the implementation of the relevant instruction, for example, when the user needs to clone voice, the voice clone model training process with high calculation power requirement on the server is implemented by the cloud end, and the voice model obtained by training the voice clone by the cloud end is put into the local server;

The embodiment of the application also solves the problem of powering on and starting related household appliances, and the related equipment is powered on by opening the related point touch switch through the coordinator;

The embodiment of the application uses the distributed sound box as a loudspeaker, solves the problem that household equipment is distributed in different rooms to receive instructions, classifies intelligent household equipment, and is convenient for timely feeding back user error instructions to users.

Referring to fig. 8, at a local server side, a remote voice control method provided by an embodiment of the present application includes:

s101, when a user successfully logs in a local server through a terminal, receiving a user voice instruction sent by the terminal;

S102, voiceprint verification is carried out on the voice command, and when verification is passed, remote control on the intelligent household equipment is achieved through the voice command.

In the embodiment of the application, after the user successfully logs in the local server through the terminal, voiceprint verification can be further carried out on each voice command, and the subsequent operation can be executed only after the verification is passed, so that the safety of remote voice control is improved.

Optionally, the method further comprises:

Constructing a fusion feature vector based on the vector of the face feature and the vector of the voiceprint feature, for example, using f= (F ₁,f₂,f₃,…,f_m) and v= (V ₁,v₂,v₃,…,v_n) to obtain s= (S ₁,s₂,s₃,…,s_m+n) above;

Determining a weight matrix of the fusion feature vector, e.g., w= (W ₁,w₂,w₃,…,w_m+n) described above;

And adding each characteristic weight in the weight matrix to obtain a value finally, and if the value belongs to a preset value range, determining that the user successfully logs in the local server through the terminal. For example, the obtained fused feature vector value is s= (0.8,0.7,0.5), the weight matrix of each feature is w= (0.5,0.4,0.1), and the feature weights are added to obtain a value, that is, the matching degree p=0.8×0.5+0.7×0.4+0.5×0.1=0.73 is calculated. If the setting standard is that P is greater than or equal to 0.8, the matching is successful, the recognition result is 0.73 to less than 0.8, and the recognition is unsuccessful, namely the recognition is failed.

Optionally, the method further comprises:

For example, if a signal that the basic module cannot execute the instruction return is received, an error instruction prompt message is fed back to the user terminal.

Optionally, when the verification passes, the method further comprises:

For example, the local server processes the information that can be processed offline locally, and for the information that cannot be processed locally, the relevant service needs to be called to the cloud server, which includes:

obtaining internet resources, such as songs, voices, etc., from a cloud server;

The cloud server performs voice recognition, so that the local server has insufficient computing power, and processes such as voice cloning and voice synthesis are required to be performed by utilizing the computing power of the cloud server.

Referring to fig. 9, a remote voice control apparatus provided in an embodiment of the present application includes:

a memory 520 for storing program instructions;

a processor 500 for calling program instructions stored in the memory, executing according to the obtained program:

Optionally, the processor 500 is further configured to call the program instructions stored in the memory, and execute according to the obtained program:

Determining a weight matrix of the fusion feature vector;

Optionally, when the verification passes, the processor 500 is further configured to call the program instructions stored in the memory, and execute according to the obtained program:

A transceiver 510 for receiving and transmitting data under the control of the processor 500.

Wherein in fig. 9, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 500 and various circuits of memory represented by memory 520, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 510 may be a number of elements, i.e., including a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 500 is responsible for managing the bus architecture and general processing, and the memory 520 may store data used by the processor 500 in performing operations.

The processor 500 may be a Central Processing Unit (CPU), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), or a complex Programmable logic device (Complex Programmable Logic Device, CPLD).

The embodiment of the application provides a remote voice control system, which can refer to fig. 7 (but is not limited to the structure shown in fig. 7), and comprises the remote voice control device (i.e. a local server), a coordinator and a sound box module respectively connected with the remote voice control device, and at least one sound box connected with the sound box module,

Optionally, the system further includes an intelligent home device corresponding to each of the speakers, for example, an indoor environment collection module corresponding to the speaker 1, a home device module corresponding to the speaker 2, and a base module corresponding to the speaker 3 in fig. 7.

It should be noted that, the above-mentioned sound boxes and the corresponding smart home devices may not have a connection relationship, but only be used as a speaker, that is, playing a voice command to the corresponding smart home device, and the smart home device executes a corresponding operation after receiving the voice command.

Referring to fig. 10, another remote voice control apparatus provided in an embodiment of the present application includes:

A first unit 11, configured to receive a user voice command sent by a terminal when a user successfully logs in to a local server through the terminal;

and the second unit 12 is used for carrying out voiceprint verification on the voice command, and when the verification is passed, the remote control on the intelligent household equipment is realized through the voice command.

Optionally, the first unit 11 is further configured to:

Determining a weight matrix of the fusion feature vector;

Optionally, the second unit 12 is further configured to:

Optionally, when the verification passes, the first unit 11 is further configured to:

It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice. In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The embodiment of the application provides a computing device which can be a desktop computer, a portable computer, a smart phone, a tablet Personal computer, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA) and the like. The computing device may include a central processing unit (Center Processing Unit, CPU), memory, input/output devices, etc., the input devices may include a keyboard, mouse, touch screen, etc., and the output devices may include a display device, such as a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), cathode Ray Tube (CRT), etc.

The memory may include Read Only Memory (ROM) and Random Access Memory (RAM) and provides the processor with program instructions and data stored in the memory. In the embodiment of the present application, the memory may be used to store a program of any of the methods provided in the embodiment of the present application.

The processor is configured to execute any of the methods provided by the embodiments of the present application according to the obtained program instructions by calling the program instructions stored in the memory.

An embodiment of the present application provides a computer storage medium storing computer program instructions for use in an apparatus provided in the embodiment of the present application, where the computer storage medium includes a program for executing any one of the methods provided in the embodiment of the present application.

The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), and semiconductor storage (e.g., ROM, EPROM, EEPROM, non-volatile storage (NAND FLASH), solid State Disk (SSD)), etc.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of remote voice control, the method comprising:

When a user successfully logs in a local server through a terminal, receiving a user voice instruction sent by the terminal, wherein when the user successfully logs in the local server through the terminal, based on a fused feature vector and a weight matrix obtained by the user through a face image and voice information acquired by the terminal, the determined numerical value after the addition of the feature weights belongs to a preset numerical value range, each weight in the weight matrix is the average value of the weights in the weight matrix of a plurality of groups of fused feature vector pairs of the user, the weight matrix of each group of fused feature vector pairs is the ratio of the average distance between a first fused feature vector S1 and a second fused feature vector S2 in the fused feature vector pair to (1+e ^-|S1i-S2i|), and i is an element serial number;

Performing voiceprint verification on the voice command, and when verification is passed, transmitting the voice command to a coordinator and a sound box module respectively, wherein the sound box module is deployed in a plurality of rooms by adopting distributed sound boxes;

the power switch of the intelligent household equipment is controlled through the coordinator, and command classification transmission is carried out through the sound box module according to keywords of the intelligent household equipment contained in the voice commands, so that the voice commands are transmitted to sound boxes which are correspondingly arranged for the intelligent household equipment corresponding to the keywords;

And playing the voice command through the sound box to the intelligent household equipment corresponding to the keyword, so that the intelligent household equipment executes the action corresponding to the voice command.

2. The method according to claim 1, characterized in that the method further comprises:

3. The method according to claim 2, wherein fusing the voiceprint features with the face features, and determining whether the user can log on to a local server based on the fusion result, specifically comprises:

Determining a weight matrix of the fusion feature vector;

4. The method according to claim 1, characterized in that the method further comprises:

5. The method of claim 1, wherein when the verification passes, the method further comprises:

6. A remote voice control apparatus, comprising:

A memory for storing program instructions;

a processor for invoking program instructions stored in said memory to perform the method of any of claims 1-5 in accordance with the obtained program.

7. A remote voice control system is characterized by comprising the device as claimed in claim 6, a coordinator and a sound box module respectively connected with the device, and at least one sound box connected with the sound box module, wherein the sound box module is deployed in a plurality of rooms by adopting distributed sound boxes,

The coordinator is used for receiving the voice command sent by the device and controlling the power switch of the intelligent household equipment based on the voice command;

the sound box module is used for determining keywords of intelligent household equipment contained in the voice instruction sent by the device, sending the voice instruction to a sound box corresponding to the intelligent household equipment corresponding to the keywords, and playing the voice instruction through the sound box to the intelligent household equipment corresponding to the keywords so that the intelligent household equipment executes actions corresponding to the voice instruction.

8. The system of claim 7, further comprising a smart home device disposed corresponding to each of the speakers.

9. A computer storage medium having stored thereon computer executable instructions for causing the computer to perform the method of any one of claims 1 to 5.