CN217933162U

CN217933162U - Apparatus for processing speech

Info

Publication number: CN217933162U
Application number: CN202220643959.7U
Authority: CN
Inventors: 刘正义
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-11-29
Anticipated expiration: 2032-03-22

Abstract

The present disclosure provides a device for processing speech, which relates to the technical field of artificial intelligence, in particular to the technical field of intelligent hardware and speech. The equipment comprises a microphone, a network communication module and a processor; the microphone is used for collecting the voice to be processed; the network communication module is used for sending the voice to be processed and receiving a voice processing result of the voice to be processed; the processor is configured to perform an association operation of the voice processing result, where the association operation may be any operation among a plurality of operations, and the plurality of operations include a distribution network operation. The method and the device can be used for interactively communicating with a user through voice, so that the intelligent degree of the equipment is improved, and the use difficulty of the equipment is reduced.

Description

Apparatus for processing speech

Technical Field

The present disclosure relates to the field of artificial intelligence, in particular to the field of intelligent hardware and speech technology, and more particularly to an apparatus for processing speech.

Background

Along with the development of artificial intelligence, various artificial intelligence technologies have been widely applied, and have a wide application prospect in many aspects such as safety, finance, human-computer interaction, information, education and the like.

Speech recognition in artificial intelligence is a very popular field, and speech recognition techniques involve fields including: signal processing, pattern recognition, probability and information theory, phonation and hearing mechanisms, and the like.

SUMMERY OF THE UTILITY MODEL

An apparatus for processing speech is provided.

According to a first aspect, there is provided an intelligent hardware device comprising a microphone, a network communication module, a processor; the microphone is used for collecting voice to be processed; the network communication module is used for sending the voice to be processed and receiving a voice processing result of the voice to be processed; the processor is configured to perform an association operation of the voice processing result, where the association operation may be any operation among a plurality of operations, and the plurality of operations include a distribution network operation.

According to the scheme disclosed by the invention, the voice-based intelligent terminal can be interactively communicated with a user, so that the intelligent degree of equipment is improved, and the use difficulty of the equipment is reduced.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1a is a schematic front view of an apparatus for processing speech of the present disclosure;

FIG. 1b is a reverse schematic view of the apparatus for processing speech of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the public order and the custom are not violated.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1a and 1b, fig. 1a and 1b respectively show the front and back of the device for processing speech, i.e. the intelligent hardware device 100 according to the present invention. As shown, the apparatus 100 includes: a microphone 101, a network communication module 102, and a processor 103.

The microphone 101 is used for collecting voice to be processed. The microphone 101 may be used to collect sound. In particular, in the present application, the microphone 101 may be used to collect speech, which is speech to be processed later.

The network communication module 102 is configured to send the to-be-processed voice and receive a voice processing result of the to-be-processed voice. The network communication module 102 may interact with communication devices for communicating and retrieving data. The communication device may be, for example, a server in the cloud. In practice, the network communication module 102 may send the pending voice to a server. The speech processing result may be a variety of information, such as information received from a server, in particular an answer to the speech to be processed. The network communication module 102 may be a network card, a modem, or a wireless communication transceiver.

And the processor 103 is configured to perform an association operation on the voice processing result, where the association operation may be any operation in a plurality of operations, and the plurality of operations include a distribution network operation. In particular, the associated operation (or operations) may be various, such as playing the speech processing result through a sound production device. That is, the processor 103 sends an instruction to play the speech processing result to the sound generating device, and the sound generating device can play the speech processing result. The plurality of operations may include distribution network operations. The processor 103 may be located anywhere within the device. The positional relationship between the network communication module 102 and the processor 103 may be arbitrary.

As can be seen from fig. 1a and 1b, the device 100 is shaped like a magnifying glass, and the display device of the device 100 is a circular display screen, or alternatively a square or any other display screen. In practice, the display screen may support touch operations.

According to the embodiment of the disclosure, the device can be interactively communicated with the user through voice, so that the device executes the voice instruction of the user, the intelligence degree of the device is improved, and the use difficulty of the device is reduced. Specifically, the embodiment of the disclosure can realize a sound distribution network, and reduces the difficulty of the distribution network of the equipment.

In some optional implementation manners of any embodiment of the present disclosure, the processor 103 is further configured to, in response to determining that the voice processing result includes distribution network information, analyze the distribution network information from the voice processing result, and perform distribution network operation on the device 100 by using the distribution network information.

In these alternative implementations, the device 100 may need to be configured for internet access after a network distribution, i.e., networking configuration, such as configuring the device 100 with a Wi-Fi network. The distribution network operation is an operation performed by the processor 103 to implement a networking configuration. In practice, distribution network operation usually requires distribution network information, which may include a network name and password.

In these optional implementations, the device in the present disclosure may obtain the distribution network information through voice, so as to implement a sound distribution network.

In some optional implementations of any embodiment of the present disclosure, the apparatus 100 further comprises a display device 104 and an output functional area 105; the output functional area includes an interactive light 1051; the interactive light 1051 to indicate any of the items, wherein the items include at least one of: networking state, voice broadcast speed, image recognition state.

In these alternative implementations, the apparatus 100 may further include a display device 104 and an output functional area 105. In particular, the display device 104 may include a display screen. The display device 104 of the apparatus 100 may display the image to be recognized and the recognition result. In addition, the display device 104 may also display the recognition result corresponding to the target object in the image to be recognized, which is fed back by the server. The display device may be adjacent to the output functional region 105.

The output functional region 105 includes a device having an output function therein. For example, an interactive light 1051 that can output visual effects to a user can be included in the output functional area 105. The interactive light 1051 may indicate various states of the device. For example, the interactive light 1051 may indicate a networking status, such as flashing a designated color light, such as a green light, after networking is successful. Or, voice broadcast speed can be instructed to interactive lamp 1051, for example the effect of breathing lamp can be realized to interactive lamp 1051, and according to the speed scintillation of voice broadcast, the speed of voice broadcast is faster, and then breathing lamp flicker speed is faster. Still alternatively, the interactive light 1051 may indicate the image recognition status, e.g., if the image recognition is successful, the interactive light 1051 blinks a specified color light.

These alternative implementations may provide rich visual effects through the display device and interactive lights.

Optionally, the device 100 further comprises a visual function area 106, the visual function area 106 comprising an illumination lamp 1061.

In these alternative implementations, the visual functional area 106 refers to a visually relevant functional area. The visual function area 106 includes an illumination lamp 1061 therein. The illumination lamp 1061 can achieve illumination, especially at night.

The visual function area 106 may also be adjacent to the display device 104, and the output function area 105 and the visual function area 106 may be located on the front and back sides of the apparatus 100, respectively. The display device 104 may be displayed on only the front side, or the display device 104 may be displayed on both the front and back sides.

These alternative implementations may add lighting to the device through a light to facilitate exploration and practice by users, such as children, at night.

In practice, the visual function zone 106 also includes a camera 1062; the camera 1062 is used for collecting an image to be identified; the network communication module 102 is configured to send the image to be recognized, and receive image result information corresponding to a target object in the image to be recognized; the display device 104 is configured to display the image result information, where the image result information includes the recognition result and knowledge information related to the recognition result.

In these alternative implementations, a camera 1062 is used to capture the image to be recognized. Which is located in the device 100, and a user can hold the device 100 to shoot at the object to be recognized, so as to obtain an image to be recognized. It should be noted that the camera 1062 may be used to collect a face image, an animal image, a plant image, or a building image, or even a dynamic image. The network communication module 102 may send the image to be recognized collected by the camera 1062 to the server, so that the server recognizes the image to be recognized. The network communication module 102 may further receive image result information corresponding to a target object in the image to be recognized, which is sent by the server. The image result information may include a recognition result of the image to be recognized, and may also include knowledge information related to the recognition result, such as children's enlightenment education knowledge. The display device in these implementations may be flat.

These alternative implementations can intuitively and efficiently present the image recognition results to the user. Moreover, the display device can display the recognition result and the knowledge information, and is beneficial to realizing the elementary education of children through equipment.

In some alternative implementations of any of the embodiments of the present disclosure, the output functional area 105 includes a sound-producing device 1052.

In these alternative implementations, the output functional area 105 may include a sound producing device 1052. The sound generating device 1052 may be a speaker, and the sound generating device 1052 may output an answer to the voice to be processed, that is, a voice processing result, or perform voice broadcast on the recognition result of the image to realize voice output.

The output functionality of these alternative implementations may output sound, thereby increasing the auditory effect of the device.

In some optional implementations of any embodiment of the disclosure, the apparatus further comprises: a guard structure 107 and a handle 108; the protective structure 107 is disposed at the edge of the display device 104, and one end of the handle 108 is connected to the protective structure 107.

In these alternative implementations, the protective structure 107 may be a protective sleeve made of various materials, such as rubber or plastic materials, etc. The processor 103 may be disposed within either the handle 108 or the protective structure 107.

The protective structure in these implementations can better protect the device from being broken and can also protect the user from being injured. The connection of the handle and the protective structure can make the structure of the device more concise.

Optionally, a button is arranged on the handle 108; the key includes at least one of: an identification key 109, a volume key 110 and an illumination key 111, wherein the identification key 109 is used for receiving a trigger operation, and the trigger operation is used for triggering the acquisition of an image to be identified. In order to facilitate the user to find the keys on the handle, the keys here may preferably be provided as physical keys. In addition, a touch screen can be arranged on the handle, so that the keys can also be touch keys.

In these alternative implementations, operating the illumination key 111 may effect turning the illumination light 1061 on and off. The user clicks the identification key 109, which triggers the capturing of the image to implement the identification process for the image. The user operating the volume key 110 may adjust the amount of sound emitted by the sound-producing device 1052 of the device.

The handle of the optional implementation modes is provided with the keys, so that the operation can be accurately and quickly carried out. Particularly, the image recognition can be rapidly and directly carried out through the recognition key, and the image recognition efficiency is improved.

In practice, the identification key 109 is also used for receiving a start operation or a close operation of the device 100, the start operation and the close operation are long-press operations, and the trigger operation is a short-press operation.

The recognition key 109 may not only enable triggering of image recognition but also enable activation and deactivation of the device 100 as a power key. Specifically, the user may long press the identification key 109 to turn on the device 100 when the device 100 is turned off. Alternatively, the user may press the identification key for a long time to turn off the device 100 when the device 100 is in the on state. In the application, the key pressing duration corresponding to the long-time pressing operation is greater than or equal to a first preset duration, and the key pressing duration corresponding to the short-time pressing operation is less than or equal to a second preset duration. The key pressing duration corresponding to the long-time pressing operation is longer than the key pressing duration corresponding to the short-time pressing operation.

These implementations may in practice give the identification key more functionality, thereby reducing the number of physical keys. And the key trigger functions can be distinguished through different operations.

Optionally, a battery compartment for accommodating a rechargeable battery for supplying power to the device 100 is provided 100 in the handle 108; the bottom of the handle is provided with a charging port for charging the rechargeable battery, a charging indicator lamp 112 is arranged on the handle, and the charging indicator lamp 112 is used for indicating at least one of the following: whether the device 100 is fully charged or charged.

In these implementations, charge indicator light 112 may indicate a charging status, such as device 100 is charging, and charge indicator light 112 lights a first color light (e.g., red light). Device 100 is fully charged 112 and charge indicator 112 illuminates a second color light (e.g., green). In addition, the charge indicator 112 may also be used to indicate the charge status, for example, if the charge status is below the charge threshold, the charge indicator 112 lights a third color light (e.g., yellow light).

These implementation modes not only can realize the power supply of equipment, can also demonstrate the charged state and the electric quantity state of equipment through charging indicator lamp directly perceivedly.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be understood by those skilled in the art that the scope of the present disclosure is not limited to the specific combination of the above-mentioned features, but also covers other embodiments formed by any combination of the above-mentioned features or their equivalents without departing from the spirit of the present disclosure. For example, the above features and the technical features disclosed in the present disclosure (but not limited to) having similar functions are replaced with each other to form the technical solution.

Claims

1. The device for processing the voice is characterized by comprising a microphone, a network communication module and a processor;

the microphone is used for collecting voice to be processed;

the network communication module is used for sending the voice to be processed and receiving a voice processing result of the voice to be processed;

the processor is configured to perform an association operation of the voice processing result, where the association operation may be any operation among a plurality of operations, and the plurality of operations include a distribution network operation.

2. The apparatus of claim 1,

the processor is further configured to, in response to determining that the voice processing result includes distribution network information, parse the distribution network information from the voice processing result, and perform distribution network operation on the device using the distribution network information.

3. The apparatus of claim 1, further comprising a display device and an output functional area; the output functional area comprises an interactive lamp;

the interactive light to indicate any of the items, wherein the items include at least one of: networking state, voice broadcast speed, image recognition state.

4. The apparatus of claim 3, wherein the output function includes a sound producing device.

5. The apparatus of claim 3, further comprising a visual function area, the visual function area comprising an illumination lamp.

6. The apparatus of claim 5, wherein the visual function area further comprises a camera;

the camera is used for collecting an image to be identified;

the network communication module is used for sending the image to be identified and receiving image result information corresponding to a target object in the image to be identified;

the display device is used for displaying the image result information, wherein the image result information comprises an identification result and knowledge information related to the identification result.

7. The apparatus of claim 1, further comprising a display device; the apparatus further comprises: a protective structure and a handle;

the edge of the display device is provided with the protective structure, and one end of the handle is connected with the protective structure.

8. The apparatus of claim 7, wherein the handle has a key disposed thereon;

the key includes at least one of: the device comprises an identification key, a volume key and an illumination key, wherein the identification key is used for receiving triggering operation, and the triggering operation is used for triggering the collection of an image to be identified.

9. The device of claim 8, wherein the identification key is further configured to receive a start operation or a close operation of the device, the start operation and the close operation are long-press operations, and the trigger operation is a short-press operation.

10. The device of claim 7, wherein a battery compartment is provided within the handle for receiving a rechargeable battery for powering the device;

the bottom of handle is provided with and is used for giving rechargeable battery charges the mouth that charges, be provided with the pilot lamp that charges on the handle, the pilot lamp that charges is used for instructing following at least one item: whether the device is fully charged, state of charge.