CN114255748A

CN114255748A - Floor sweeping robot

Info

Publication number: CN114255748A
Application number: CN202110367825.7A
Authority: CN
Inventors: 徐银海; 刘益帆
Original assignee: Beijing Ancsonic Technology Co ltd
Current assignee: Beijing Ancsonic Technology Co ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2022-03-29

Abstract

The application provides a sweeping robot, which comprises a microphone array, a signal processing module and a wireless communication module, wherein the microphone array comprises a first microphone and a second microphone, and the first microphone and the second microphone are used for acquiring acoustic signals containing voice control signals sent by a target sound source; the signal processing module is used for: determining cross-correlation information corresponding to a first sound signal acquired by a first microphone and a second sound signal acquired by a second microphone; determining time difference information of arrival of a voice control signal sent by a target sound source at a first microphone and a second microphone based on the cross-correlation information; acquiring pickup signals corresponding to the first microphone and the second microphone based on the time difference information; the wireless communication module is used for transmitting the pickup signals to the server. The recognition accuracy of the voice recognition function of the sweeping robot and/or the control accuracy of the voice control function can be effectively improved, so that the user experience sensitivity is effectively improved, and the product cost can be reduced.

Description

Floor sweeping robot

Technical Field

The application relates to the technical field of electric appliances, in particular to a sweeping robot.

Background

In recent years, with the rapid development of intelligent science and technology, a sweeping robot with a voice control function is produced. However, the floor sweeping robot generates extremely obvious working noise when running. Whether the environmental noise or the working noise is caused, the realization of the voice control function of the sweeping robot is possibly influenced, and the good feeling of the user experience is further reduced.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides a robot of sweeping floor.

An embodiment of the application provides a sweeping robot, which comprises a microphone array, a signal processing module connected with the microphone array and a wireless communication module connected with the signal processing module, wherein the microphone array comprises a first microphone and a second microphone, and the first microphone and the second microphone are used for collecting acoustic signals containing voice control signals sent by a target sound source; the signal processing module is used for: determining cross-correlation information corresponding to a first sound signal acquired by a first microphone and a second sound signal acquired by a second microphone; determining time difference information of arrival of a voice control signal sent by a target sound source at a first microphone and a second microphone based on the cross-correlation information; performing time delay compensation operation on the first sound signal and/or the second sound signal based on the time difference information to obtain pickup signals corresponding to the first microphone and the second microphone, wherein the time delay compensation operation is used for improving the signal-to-noise ratio of the signals collected by the first microphone and the second microphone; the wireless communication module is used for transmitting the pickup signal to the server so that the server determines a first type of control instruction matched with the voice control signal sent by the target sound source based on the pickup signal.

In an embodiment of the present application, before determining the cross-correlation information corresponding to the first acoustic signal collected by the first microphone and the second acoustic signal collected by the second microphone, the signal processing module is further configured to: determining a working noise signal corresponding to the sweeping robot in the first sound signal and/or the second sound signal; and performing noise reduction processing operation on the first sound signal and/or the second sound signal based on the working noise signal. Wherein, determining the cross-correlation information corresponding to the first acoustic signal collected by the first microphone and the second acoustic signal collected by the second microphone comprises: and determining the cross-correlation information corresponding to the first acoustic signal and the second acoustic signal after the noise reduction processing operation.

In an embodiment of the present application, determining an operating noise signal corresponding to the sweeping robot in the first acoustic signal and/or the second acoustic signal includes: and determining a working noise signal based on the motor rotating speed information of the sweeping robot.

In an embodiment of the present application, the microphone array further includes a third microphone located in a preset noise pickup area, and determining a working noise signal corresponding to the sweeping robot in the first acoustic signal and/or the second acoustic signal includes: and determining the working noise signal based on the third sound signal collected by the third microphone and the sound field transfer function information between the third microphone and the first microphone and/or the second microphone.

In an embodiment of the application, the preset noise picking area is located in a flow channel structure of the sweeping robot, the flow channel structure is provided with an active noise reduction system, and the third microphone is a reference microphone in the active noise reduction system.

In an embodiment of the application, the sweeping robot further comprises a voice recognition module connected with the signal processing module, and the voice recognition module is used for determining a second type of control instruction matched with a voice control signal sent by a target sound source based on the pickup signal.

In an embodiment of the application, the speech recognition module includes a language unit and an acoustic unit connected to the language unit, the language unit is configured to determine prior character sequence information based on a pickup signal, and the acoustic unit is configured to determine a second type of control instruction based on a prior speech signal and the pickup signal corresponding to the prior character sequence information.

In an embodiment of the present application, determining a second type of control instruction based on a priori voice signal and a pickup signal corresponding to the priori text sequence information includes: comparing the prior voice signal and the pickup signal corresponding to the prior character sequence information to obtain a comparison result; and determining a second type of control instruction based on the comparison result and the prior character sequence information.

In an embodiment of the present application, determining cross-correlation information corresponding to a first acoustic signal collected by a first microphone and a second acoustic signal collected by a second microphone includes: determining cross-power spectrum information corresponding to the first sound signal and the second sound signal; determining weighted spectrum function information corresponding to the first sound signal and the second sound signal based on the cross-power spectrum information; cross-correlation information is determined based on the cross-power spectral information and the weighted spectral function information.

In an embodiment of the present application, the cross-correlation information includes cross-correlation function information, and the determining, based on the cross-correlation information, time difference information of arrival at the first microphone and the second microphone of the voice control signal emitted by the target sound source includes: determining peak information of the cross-correlation function based on the cross-correlation function information; time difference information is determined based on the peak information.

The floor sweeping robot provided by the embodiment of the application realizes the determination of the cross-correlation information corresponding to the first sound signal collected by the first microphone and the second sound signal collected by the second microphone by means of the signal processing module, then determines the time difference information of the voice control signal sent by the target sound source reaching the first microphone and the second microphone based on the cross-correlation information, and carries out the delay compensation operation on the first sound signal and/or the second sound signal based on the time difference information so as to obtain the pickup signals corresponding to the first microphone and the second microphone, thereby effectively improving the signal to noise ratio of the signals collected by the first microphone and the second microphone. In addition, the robot of sweeping floor that this application embodiment provided can realize with the help of wireless communication module being connected with the server, and then realizes confirming the purpose with the help of the server that the voice control signal that the target sound source sent matches the first type of control instruction, and then avoids the condition of the complicated speech recognition module of robot product self deployment of sweeping floor, has reduced product cost. In other words, the recognition accuracy of the voice recognition function and/or the control accuracy of the voice control function of the sweeping robot can be effectively improved, so that the user experience sensitivity is effectively improved, and the product cost can be reduced.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a schematic structural diagram of a sweeping robot according to an embodiment of the present application.

Fig. 2 is a schematic alignment diagram of a first acoustic signal and a second acoustic signal according to an embodiment of the present application.

Fig. 3a to 3c are schematic structural views of a sweeping robot according to another embodiment of the present application.

Fig. 4 is a schematic structural diagram of a flow channel structure according to an embodiment of the present application.

Fig. 5 is a schematic structural view of a sweeping robot according to another embodiment of the present application.

Fig. 6 is a schematic structural diagram of a speech recognition module according to an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein.

Fig. 1 is a schematic structural diagram of a sweeping robot according to an embodiment of the present application. As shown in fig. 1, a sweeping robot 100 provided in the embodiment of the present application includes a microphone array 110, a signal processing module 120 connected to the microphone array 110, and a wireless communication module 140 connected to the signal processing module 120. Wherein the microphone array 110 includes a first microphone and a second microphone. The first microphone and the second microphone are used for collecting acoustic signals containing voice control signals emitted by a target sound source. For example, the target sound source is a user using the sweeping robot, and the voice control signal sent by the target sound source is a voice control signal sent by the user for the sweeping robot.

Specifically, the acoustic signal collected by the first microphone is a first acoustic signal, and the acoustic signal collected by the second microphone is a second acoustic signal. The signal processing module is used for determining cross-correlation information corresponding to a first sound signal acquired by a first microphone and a second sound signal acquired by a second microphone, then determining time difference information of a voice control signal sent by a target sound source reaching the first microphone and the second microphone based on the cross-correlation information, and performing time delay compensation operation on the first sound signal and/or the second sound signal based on the time difference information to obtain pickup signals corresponding to the first microphone and the second microphone. The wireless communication module 140 is configured to transmit the sound pickup signal to the server, so that the server determines a first type of control instruction matching the voice control signal sent by the target sound source based on the sound pickup signal.

Illustratively, the first type of control instruction is a control instruction that is independent of a cleaning state of the sweeping robot. Wherein, the cleaning state refers to a sweeping state. Such as at least one of "play song" and "chat with me".

It should be appreciated that the cross-correlation information corresponding to the first acoustic signal and the second acoustic signal can characterize a correlation between the first acoustic signal and the second acoustic signal.

For example, the first acoustic signal is denoted x₁(t) the second acoustic signal is denoted x₂(t), then, the following formulas (1) and (2) can be obtained.

x₁(t)＝s(t-τ₁)+n₁(t) (1)

x₂(t)＝s(t-τ₂)+n₂(t) (2)

In the formulae (1) and (2), τ₁The time, τ, at which the original speech control signal originating for the target sound source arrives at the first microphone₂The time of arrival of the original speech control signal at the second microphone, s (t-tau), for the target sound source₁) And s (t- τ)₂) The original voice control signal sent by the target sound source respectively reaches the sound signals of the first microphone and the second microphone after corresponding delay (namely the voice control signals respectively collected by the first microphone and the second microphone), and n₁(t) and n₂(t) noise signals (e.g. ambient) collected by the first and second microphones, respectivelyNoise signals and/or operational noise signals of the sweeping robot, etc.).

Then, a cross-correlation function between the first acoustic signal and the second acoustic signal

Can be determined based on the following formula (3).

In equation (3), τ is the time difference between the arrival of the original speech control signal at the first and second microphones from the target sound source.

Illustratively, the delay compensation operates to improve the signal-to-noise ratio of the signals collected by the first and second microphones. Specifically, the delay compensation operation is to time align the voice control signal sent by the target sound source included in the first sound signal and the voice control signal sent by the target sound source included in the second sound signal on two channels, and then superimpose the signals after the time alignment, so as to achieve the purpose of improving the signal-to-noise ratio of the signals collected by the first microphone and the second microphone (i.e., improving the signal-to-noise ratio of the voice control signals corresponding to the signals collected by the first microphone and the second microphone). The specific manner of the superposition includes, but is not limited to, direct superposition, arithmetic mean superposition, weighted superposition, and the like.

Since the time nodes of the first microphone and the second microphone, which receive the voice control signals, are related to the arrival direction angles of the target sound source, the time difference information between the sound signals received by the first microphone and the second microphone is estimated, so that the direction information of the target sound source relative to the first microphone and the second microphone can be determined, and the delay compensation operation is performed in a targeted manner based on the direction information and the time difference information.

It should be noted that, since the time difference between the arrival of the interference signal such as the ambient noise signal in the spatial sound field at the first microphone and the second microphone is different from the time difference between the arrival of the voice control signal at the first microphone and the arrival of the voice control signal at the second microphone, the interference signal is not "aligned" by the delay compensation operation for the voice control signal. Take the example of direct superposition, i.e. the interfering signal is not enhanced. In the case where the speech control signal is enhanced and the interference signal is not enhanced, the signal-to-noise ratio of the signals collected by the first and second microphones is significantly improved.

The case where the speech control signal is enhanced and the interfering signal is not enhanced is shown below in connection with fig. 2.

Specifically, fig. 2 is a schematic alignment diagram of a first acoustic signal and a second acoustic signal provided in an embodiment of the present application. As shown in fig. 2, a first acoustic signal x₁(t) and a second acoustic signal x₂(t) can be all represented by sinusoidal wave curves, and the first acoustic signal x₁(t) and a second acoustic signal x₂(t) there is an interference signal (i.e. the bump shown in figure 2).

The first acoustic signal x is then transmitted along the time axis t₁(t) and a second acoustic signal x₂(t) after direct alignment superposition, x is obtained in which the interference signal is not enhanced and the speech control signal is enhanced₁(t)+x₂(t)。

In an embodiment of the application, the server acquires a pickup signal, determines a first type of control instruction matched with a voice control signal sent by a target sound source based on the pickup signal, and then returns the first type of control instruction to the sweeping robot, so that the sweeping robot executes an operation corresponding to the first type of control instruction based on the first type of control instruction.

It should be noted that the first microphone and the second microphone mentioned in the above embodiments may be an independent microphone, that is, a plurality of independent microphones form a microphone array. In addition, the first microphone and the second microphone mentioned in the above embodiments may also be an independent microphone sub-array, that is, a plurality of independent microphone sub-arrays form a microphone array.

However, the operation of the sweeping robot is based on the high-speed rotation of the internal mechanism, and the operation noise thereof is relatively large. Since the operating noise signals contained in the first acoustic signal and the second acoustic signal are also correlated with each other, the present application further extends the following embodiments on the basis of the embodiment shown in fig. 1 in order to further improve the signal-to-noise ratio of the signals collected by the first microphone and the second microphone.

It is emphasized that the above-mentioned noise reduction processing operations may be performed on the first acoustic signal or the second acoustic signal individually or jointly. Then, when the first acoustic signal or the second acoustic signal is separately executed, only the working noise signal corresponding to the sweeping robot in the acoustic signals which need to be subjected to the noise reduction processing operation needs to be determined, so that the noise reduction processing operation is performed on the corresponding acoustic signals based on the determined working noise signal; when the first acoustic signal and the second acoustic signal are executed together, working noise signals corresponding to the sweeping robot in the first acoustic signal and the second acoustic signal need to be determined respectively, and then noise reduction processing operation is executed on the first acoustic signal and the second acoustic signal respectively based on the working noise signals corresponding to the first acoustic signal and the second acoustic signal respectively.

The sweeping robot provided by the embodiment of the application utilizes the signal processing module to determine the working noise signal corresponding to the sweeping robot in the first sound signal and/or the second sound signal, and then carries out noise reduction processing operation on the first sound signal and/or the second sound signal based on the working noise signal. After the noise reduction processing operation, the working noise signals in the first sound signal and/or the second sound signal are reduced or even filtered, so that the condition that the working noise signals related to each other are included in the first sound signal and the second sound signal is avoided, the condition that the working noise signals related to each other influence the accuracy of the determined time difference information of the first microphone and the second microphone is further avoided, and meanwhile, the signal-to-noise ratio of the signals collected by the first microphone and the second microphone is further improved.

For example, the motor rotation speed information of the sweeping robot is acquired in real time, and the working noise signal is determined based on the motor rotation speed information acquired in real time, so that the accuracy of the determined working noise signal is improved. For another example, a mapping relationship between the motor rotation speed of the sweeping robot and the working condition and a mapping relationship between the motor rotation speed of the sweeping robot and the working noise signal are predetermined, so that the purpose of determining the working noise signal of the sweeping robot directly according to the working condition of the sweeping robot is achieved, the calculation complexity is reduced, and the efficiency of determining the working noise signal is improved. For example, the suction gear of the sweeping robot can be used to represent the working condition of the sweeping robot.

Because the work noise signal of robot of sweeping the floor is correlated with motor speed information by force, and motor speed information can be confirmed fast with simple, convenient mode again, consequently, this application embodiment can confirm the work noise signal of robot of sweeping the floor with the help of the motor speed information of robot of sweeping the floor accurately fast, and then further improves the SNR of the signal that first microphone and second microphone gathered.

In another embodiment of the present application, the microphone array further includes a third microphone located at the predetermined noise pickup area. Correspondingly, the determining of the working noise signal corresponding to the sweeping robot in the first sound signal and/or the second sound signal comprises: and determining the working noise signal based on the third sound signal collected by the third microphone and the sound field transfer function information between the third microphone and the first microphone and/or the second microphone.

It should be understood that, according to the third acoustic signal collected by the third microphone and the sound field transfer function information between the third microphone and the first microphone and/or the second microphone, the working noise signal corresponding to the sweeping robot in the first acoustic signal and/or the second acoustic signal can be calculated and determined.

It should be noted that the specific position of the preset noise picking-up area may be determined according to an actual situation, and this is not uniformly limited in the embodiment of the present application. For example, the preset noise picking-up area is located in a central area of a chassis of a housing of the sweeping robot, and for example, the preset noise picking-up area is located in a preset area close to a rotating mechanism (a working noise source).

The robot of sweeping the floor that this application embodiment provided can further improve the precision of the work noise signal who confirms, and then further improve the SNR of the signal that first microphone and second microphone gathered.

In an embodiment of the present application, the preset noise picking area is located in a runner structure of the sweeping robot. The flow channel structure is provided with an active noise reduction system, and the third microphone is a reference microphone in the active noise reduction system.

According to the embodiment of the application, the utilization rate of parts of the sweeping robot is optimized by the mode of picking up the working noise signal by the reference microphone in the active noise reduction system, and the product cost is saved.

Examples of the structure of the flow channel of the sweeping robot and the sweeping robot are given below with reference to fig. 3a to 3c and fig. 4, respectively.

Fig. 3a to 3c are schematic structural views of a sweeping robot according to another embodiment of the present application. As shown in fig. 3a to 3c, the sweeping robot 200 provided in the embodiment of the present application includes a housing 210 and a flow channel structure 220 located inside the housing 210. The housing 210 is an oblate-like cylindrical structure. The flow channel structure 220 is a tubular cavity structure, and the flow channel structure 220 includes a dust suction port 221 and an air outlet 222, where the dust suction port 221 and the air outlet 222 are respectively located at two opposite ports of the tubular cavity structure.

It should be noted that the specific shape and type of the flow channel structure 220 in the sweeping robot shown in fig. 3a to 3c can be determined according to actual situations, and refer to fig. 4 specifically. Fig. 4 is a schematic structural diagram of a flow channel structure according to an embodiment of the present application. As shown in fig. 4, the outlet duct section of the flow channel structure may be a curved duct structure or a straight duct structure.

For example, in the claimed embodiments, cross-correlation information

Can be determined based on the following formula (4).

In the formula (4), the first and second groups,

representing cross-power spectral information of the first and second acoustic signals, phi₁₂(f) Representing weighted spectral function information.

Illustratively, the weighted spectral function information may be determined based on the following formula (5).

Based on the weighted spectrum function information described in the above equation (5), the amplitude of the cross power spectrum is normalized to a constant 1, and the following equation (6) is obtained by deriving the above equation (4).

Therefore, the cross-correlation function is converted into the delay pulse, the peak value is obviously highlighted, and the estimation precision of the time difference can be further ensured when the signal-to-noise ratio of the collected acoustic signal is extremely low.

According to the method and the device, the cross-power spectrum information corresponding to the first sound signal and the second sound signal is determined, the weighted spectrum function information corresponding to the first sound signal and the second sound signal is determined based on the cross-power spectrum information, and then the cross-correlation information is determined based on the cross-power spectrum information and the weighted spectrum function information, so that the purpose of determining the cross-correlation information corresponding to the first sound signal collected by the first microphone and the second sound signal collected by the second microphone is achieved. According to the embodiment of the application, the peak value of the cross-correlation information is sharpened, the precision of the determined time difference information is improved, and the precision of the delay compensation operation is further improved.

In an embodiment of the application, the cross-correlation information comprises cross-correlation function information. Determining time difference information of arrival of a voice control signal emitted by a target sound source at a first microphone and a second microphone based on cross-correlation information, comprising: determining peak information of the cross-correlation function based on the cross-correlation function information; determining the time difference information based on peak information.

On the basis of equation (3) in the embodiment shown in fig. 1, it is assumed that the speech control signal s (t- τ) is collected by the first microphone₁) With the acquired noise signal n₁(t) uncorrelated, speech control signal s (t- τ) collected by the second microphone₂) With the acquired noise signal n₂(t) are uncorrelated, and the noise signal n₁(t) and a noise signal n₂(t) is also uncorrelated, then the cross-correlation function between the first acoustic signal and the second acoustic signal as set forth in equation (3)

It can be simplified to the following equation (7).

As can be seen from the above equation (7), if and only if τ ═ τ₁₂＝τ₁-τ₂Then, the cross-correlation function described in equation (7) takes a peak. Therefore, the time difference information of the original voice control signal sent by the target sound source reaching the first microphone and the second microphone can be estimated by searching the peak value information of the cross-correlation function.

According to the embodiment of the application, the purpose of determining the time difference information of the original voice control signal sent by the target sound source reaching the first microphone and the second microphone based on the cross-correlation information is achieved by determining the peak value information of the cross-correlation function based on the cross-correlation function information and determining the time difference information based on the peak value information.

Fig. 5 is a schematic structural view of a sweeping robot according to another embodiment of the present application. The embodiment shown in fig. 5 is extended from the embodiment shown in fig. 1 of the present application, and the differences between the embodiment shown in fig. 5 and the embodiment shown in fig. 1 are emphasized below, and the descriptions of the same parts are omitted.

As shown in fig. 5, the sweeping robot 100 provided in the embodiment of the present application further includes a voice recognition module 130 connected to the signal processing module 120. The voice recognition module 130 is configured to determine a second type of control instruction matching the voice control signal sent by the target sound source based on the pickup signal.

Illustratively, the second type of control instruction is a control instruction related to a cleaning state of the sweeping robot. Such as at least one of "left turn", "45 °", "forward", "power up", and "pause".

The robot of sweeping floor that this application embodiment provided can realize the purpose based on the pickup signal that the microphone array corresponds and generate the second class control command that matches, and then realizes the purpose based on the speech control signal control robot of sweeping floor that the target sound source sent, has improved the good sensitivity of user experience.

In an embodiment of the present application, the first type of control instruction mentioned in the above embodiment is a control instruction that satisfies a preset complexity condition, and correspondingly, the second type of control instruction is a control instruction that does not satisfy the preset complexity condition. The preset complexity condition is that the complexity of the control instruction is greater than the preset complexity.

The embodiment of the application provides favorable conditions for enriching and expanding the processing mode of the pickup signals by sending the pickup signals to the server for processing. In other words, the purpose of enriching the backend interaction can be achieved by means of the server.

For example, in an embodiment of the present application, a voice recognition model capable of recognizing a picked-up sound signal is deployed in a server, and a purpose of determining a first type of control instruction matching a voice control signal issued by a target sound source based on the picked-up sound signal is achieved by using the voice recognition model in the server.

Illustratively, the speech recognition model is a deep learning based Neural network model, such as a Convolutional Neural Network (CNN) model.

Optionally, in an embodiment of the present application, the voice recognition model in the server is incrementally updated by using the pickup signal recognized by the voice recognition model and the first type of control instruction corresponding to the pickup signal. Namely, the voice recognition model is updated adaptively according to the historical recognition data of the voice recognition model, so that the voice recognition model learns the personalized information such as the voiceprint information, the use preference information and the customized information of the user, the recognition accuracy of the voice recognition model is further improved, and the user experience good feeling is further improved.

Optionally, in an embodiment of the present application, a preset lexicon is stored in the server, and correspondingly, after receiving the sound pickup signal, the server determines, based on the sound pickup signal and the preset lexicon, a first type of control instruction that matches a voice control signal sent by the target sound source. For example, the preset lexicon includes words such as "turn left", "45 °", "forward", and "power up".

Optionally, in an embodiment of the present application, the preset lexicon in the server is extended according to the historical recognition data, so as to improve an adaptive capability of the preset lexicon, and further improve the speech recognition accuracy.

In an embodiment of the present application, the first type of control command and the second type of control command are not opposite control commands, that is, the first type of control command and the second type of control command have no substantial difference limit, such as the difference limit mentioned in the above embodiment, which is related to whether the cleaning state of the sweeping robot is relevant or not. In other words, distinguishing the first class from the second class is merely for convenience of description.

Fig. 6 is a schematic structural diagram of a speech recognition module according to an embodiment of the present application. The embodiment shown in fig. 6 is extended from the embodiment shown in fig. 5, and the differences between the embodiment shown in fig. 6 and the embodiment shown in fig. 5 will be described in detail below, and the same parts will not be described again.

As shown in fig. 6, in the embodiment of the present application, the speech recognition module 130 includes a language unit 131 and an acoustic unit 132 connected to the language unit 131. The language unit 131 is configured to determine prior character sequence information based on the sound pickup signal, and the acoustic unit 132 is configured to determine a second type of control instruction based on a prior speech signal and the sound pickup signal corresponding to the prior character sequence information.

Illustratively, the a priori text sequence information refers to text combination information corresponding to the picked-up sound signal. For example, the text combination information is "i want to pause".

Illustratively, the a priori speech signal corresponding to the a priori text sequence information refers to the speech signal corresponding to the text combination information. For example, the text combination of "i want to pause" corresponds to a speech signal (also called standard speech signal).

Illustratively, when the feature similarity of the prior speech signal and the picked-up sound signal reaches a preset similarity threshold, the prior speech signal is considered to be matched with the picked-up sound signal, and then the prior character sequence information corresponding to the prior speech signal can be determined as the second type of control instruction.

In the embodiment of the application, the voice recognition operation is executed locally on the sweeping robot, so that the real-time performance of the voice recognition can be greatly ensured.

The number of the a priori character sequence information determined based on the collected sound signal is not limited to one, and may be plural. When the number of the prior character sequence information is multiple, the prior speech signal corresponding to each prior character sequence information can be compared with the pickup signal respectively to obtain the comparison result corresponding to each of the multiple prior character sequence information.

For example, for each prior character sequence information in a plurality of prior character sequence information, the probability that the prior character sequence information constitutes a sentence is P (W), and the comparison result between the prior speech signal corresponding to the prior character sequence information and the picked-up sound signal is P (X | W), then P (W) and P (W) may be addedThe prior character sequence information with the maximum product of P (X | W) is used as the speech recognition result W^*And further based on the speech recognition result W^*A control instruction is determined. Wherein the speech recognition result W^*Can be expressed by the following formula (8).

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the apparatus and devices of the present application, the components may be disassembled and/or reassembled. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A sweeping robot is characterized by comprising a microphone array, a signal processing module connected with the microphone array and a wireless communication module connected with the signal processing module, wherein,

the microphone array comprises a first microphone and a second microphone, and the first microphone and the second microphone are used for acquiring acoustic signals containing voice control signals sent by a target sound source;

the signal processing module is used for:

determining cross-correlation information corresponding to a first acoustic signal acquired by the first microphone and a second acoustic signal acquired by the second microphone;

determining time difference information of arrival of a voice control signal emitted by the target sound source at the first microphone and the second microphone based on the cross-correlation information;

performing delay compensation operation on the first sound signal and/or the second sound signal based on the time difference information to obtain pickup signals corresponding to the first microphone and the second microphone, wherein the delay compensation operation is used for improving signal-to-noise ratios of signals collected by the first microphone and the second microphone;

the wireless communication module is used for transmitting the pickup signal to a server so that the server can determine a first type of control instruction matched with a voice control signal sent by the target sound source based on the pickup signal.

2. The sweeping robot of claim 1, wherein prior to the determining of the cross-correlation information corresponding to the first acoustic signal collected by the first microphone and the second acoustic signal collected by the second microphone, the signal processing module is further configured to:

determining a working noise signal corresponding to the sweeping robot in the first sound signal and/or the second sound signal;

performing a noise reduction processing operation on the first acoustic signal and/or the second acoustic signal based on the operating noise signal;

wherein the determining of the cross-correlation information corresponding to the first acoustic signal collected by the first microphone and the second acoustic signal collected by the second microphone comprises:

and determining the cross-correlation information corresponding to the first acoustic signal and the second acoustic signal after the noise reduction processing operation.

3. The sweeping robot of claim 2, wherein the determining of the working noise signal corresponding to the sweeping robot from the first acoustic signal and/or the second acoustic signal comprises:

and determining the working noise signal based on the motor rotating speed information of the sweeping robot.

4. The sweeping robot of claim 2, wherein the microphone array further comprises a third microphone located in a preset noise picking area, and the determining of the working noise signal corresponding to the sweeping robot in the first acoustic signal and/or the second acoustic signal comprises:

and determining the working noise signal based on a third sound signal collected by the third microphone and sound field transfer function information between the third microphone and the first microphone and/or the second microphone.

5. The sweeping robot of claim 4, wherein the preset noise picking area is located in a flow channel structure of the sweeping robot, the flow channel structure is provided with an active noise reduction system, and the third microphone is a reference microphone in the active noise reduction system.

6. The sweeping robot of any one of claims 1 to 5, further comprising a voice recognition module connected to the signal processing module, wherein the voice recognition module is configured to determine a second type of control command matching the voice control signal sent by the target sound source based on the pickup signal.

7. The sweeping robot of claim 6, wherein the voice recognition module comprises a language unit and an acoustic unit connected to the language unit, the language unit is configured to determine prior character sequence information based on the pickup signal, and the acoustic unit is configured to determine the second type of control instruction based on a prior voice signal corresponding to the prior character sequence information and the pickup signal.

8. The sweeping robot of claim 7, wherein the determining the second type of control instruction based on the prior voice signal and the pickup signal corresponding to the prior character sequence information comprises:

comparing the prior voice signal corresponding to the prior character sequence information with the pickup signal to obtain a comparison result;

and determining the second type of control instruction based on the comparison result and the prior character sequence information.

9. The sweeping robot according to any one of claims 1 to 5, wherein the determining of the cross-correlation information corresponding to the first acoustic signal collected by the first microphone and the second acoustic signal collected by the second microphone comprises:

determining cross-power spectrum information corresponding to the first acoustic signal and the second acoustic signal;

determining weighted spectrum function information corresponding to the first acoustic signal and the second acoustic signal based on the cross-power spectrum information;

determining the cross-correlation information based on the cross-power spectral information and the weighted spectral function information.

10. The sweeping robot according to any one of claims 1 to 5, wherein the cross-correlation information includes cross-correlation function information, and the determining the time difference information of the voice control signal emitted by the target sound source reaching the first microphone and the second microphone based on the cross-correlation information includes:

determining peak information of a cross-correlation function based on the cross-correlation function information;

determining the time difference information based on the peak information.