CN108986811B

CN108986811B - Voice recognition detection method, device and equipment

Info

Publication number: CN108986811B
Application number: CN201811009582.4A
Authority: CN
Inventors: 史金龙
Original assignee: Beijing Electric Vehicle Co Ltd
Current assignee: Beijing Electric Vehicle Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2021-05-28
Anticipated expiration: 2038-08-31
Also published as: CN108986811A

Abstract

The invention provides a detection method, a device and equipment for voice recognition, which relate to the technical field of vehicle voice recognition, and the method comprises the following steps: after the vehicle is connected with a vehicle to be detected, acquiring vehicle type information of the vehicle; judging the voice configuration state of the vehicle according to the vehicle type information; sending audio information to a voice recognition server according to the determined voice configuration state; and receiving a detection result which is sent by the voice recognition server and used for carrying out voice recognition on the audio information. According to the invention, the preset vehicle type information is obtained according to the vehicle type information, so that the detection efficiency is improved; the voice recognition server for voice recognition detection can be judged and the audio information is sent for detection through the determined voice configuration state, then the detection result sent by the voice recognition server for voice recognition of the audio information is received, and the efficiency and the accuracy of voice recognition can be judged according to the detection result.

Description

Voice recognition detection method, device and equipment

Technical Field

The invention relates to the technical field of vehicle voice recognition, in particular to a voice recognition detection method, device and equipment.

Background

With the progress of science and technology, the continuous iteration of computer intelligent algorithm, the speech recognition technology is continuously popularized from consumer electronics products to the field of automobile industry, and the vehicle-mounted information entertainment system integrates a speech recognition system to help a driver to solve the trouble of inconvenient manual operation in the driving process, such as call receiving and making, navigation control, skylight regulation and control and the like, so that a faster and safer operation method is brought.

Generally, according to a voice principle, a user audio is input into a system, the system needs to convert an audio file into a text format, then compares the text with a local or cloud database to determine the intention of the user, then controls execution, and broadcasts an execution result in a voice mode. According to the voice vehicle control function, the vehicle-mounted information entertainment system needs to be sent to the middle layer and the bottom layer after receiving the voice request of the application layer, and is sent to the whole vehicle system in the form of a vehicle bus signal so as to control components such as an air conditioner, a skylight and the like to act. However, the voice recognition system usually equipped with the car networking comprises two sets of engines, namely a local engine and a cloud engine, the voice engines not networked only comprise the local engine, the acceptance of the engines of different types requires a large amount of manual long-period acceptance, when manual recording is performed, due to manual difference, the speaking tone, sound intensity and speech speed of each person are different, the reading of the persons of the site oral calling organization is difficult to avoid and errors are generated, the pronunciation effect is directly influenced by the emotion, volume and definition of the testee, and the environment of factory testing cannot cover a common noise scene, so that the testing of the offline audio link and the basic voice awakening rate recognition rate become a difficulty. And. The detection of voice recognition vehicle control needs manual recording, whether the control is successful or not is judged through the action of a vehicle air conditioner and the opening action of a skylight, the more functions of a controlled vehicle are, and the efficiency of an offline detection link is directly influenced.

Therefore, a method, an apparatus and a device for detecting speech recognition are needed, which can quickly detect the accuracy of speech recognition offline, have strong anti-interference performance, reduce the cost and improve the detection efficiency.

Disclosure of Invention

The embodiment of the invention provides a detection method, a detection device and detection equipment for voice recognition, which are used for solving the problems of long time consumption, high interference and low detection efficiency of manual detection.

In order to solve the above technical problem, an embodiment of the present invention provides a method for detecting speech recognition, including:

after the vehicle is connected with a vehicle to be detected, acquiring vehicle type information of the vehicle;

judging the voice configuration state of the vehicle according to the vehicle type information;

sending audio information to a voice recognition server according to the determined voice configuration state;

and receiving a detection result which is sent by the voice recognition server and used for carrying out voice recognition on the audio information.

Preferably, the determining the voice configuration state of the vehicle according to the vehicle type information includes:

carrying out consistency check on the vehicle type information and preset vehicle type information;

when the vehicle type information passes the verification, acquiring first parameter configuration which is consistent with the vehicle type information in the preset vehicle type information;

and judging the voice configuration state of the vehicle according to the first parameter configuration.

Preferably, after the consistency check is performed on the vehicle type information and the preset vehicle type information, the method further includes:

when the verification fails, acquiring second parameter configuration;

preferably, when the speech recognition server is a vehicle, the sending the audio information to the speech recognition server according to the determined speech configuration state includes:

when the voice configuration state is determined to be the local voice configuration state, acquiring first text information;

acquiring first audio information in the audio information according to the first text information;

transmitting the first audio information to the vehicle.

Preferably, when the voice recognition server is a cloud server, the sending the audio information to the voice recognition server according to the determined voice configuration state includes:

when the voice configuration state is determined to be the cloud voice configuration state, acquiring second text information;

acquiring second audio information in the audio information according to the second text information;

and sending second audio information to the cloud server.

The embodiment of the invention also provides a detection device for voice recognition, which comprises:

the acquisition module is used for acquiring the vehicle type information of the vehicle after the vehicle to be detected is connected;

the judging module is used for judging the voice configuration state of the vehicle according to the vehicle type information;

the voice processing module is used for sending audio information to the voice recognition server according to the determined voice configuration state;

and the first communication module is used for receiving a detection result which is sent by the voice recognition server and used for carrying out voice recognition on the audio information.

Preferably, the judging module includes:

the checking unit is used for checking the consistency of the vehicle type information and preset vehicle type information;

the first parameter configuration unit is used for acquiring first parameter configuration which is consistent with the vehicle type information in the preset vehicle type information when the verification is passed;

and the judging unit is used for judging the voice configuration state of the vehicle according to the first parameter configuration.

Preferably, the judging module further includes:

and the second parameter configuration unit is used for acquiring second parameter configuration when the verification fails.

Preferably, when the speech recognition server is a vehicle, the speech processing module includes:

the first voice operation database unit is used for acquiring first text information when the voice configuration state is determined to be the local voice configuration state;

the first audio synthesis unit is used for acquiring first audio information in the audio information according to the first text information;

a first audio output unit for transmitting the first audio information to the vehicle.

Preferably, when the voice recognition server is a cloud server, the voice processing module includes:

the second language database unit is used for acquiring second text information when the voice configuration state is determined to be the cloud voice configuration state;

the second audio synthesis unit is used for acquiring second audio information in the audio information according to the second text information;

and the second audio output unit is used for sending second audio information to the cloud server.

The embodiment of the invention also provides a voice recognition detection device, which comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the detection method of speech recognition as described above when executing the computer program.

Compared with the prior art, the detection method, the device and the equipment for voice recognition provided by the embodiment of the invention at least have the following beneficial effects:

the preset vehicle type information can be obtained by judging the voice configuration state of the vehicle according to the vehicle type information, so that the detection efficiency is improved; the voice recognition server for voice recognition detection can be judged and the audio information can be sent for detection according to the determined voice configuration state, then the detection result sent by the voice recognition server for voice recognition of the audio information is received, and the efficiency and the accuracy of voice recognition can be judged according to the detection result.

Drawings

FIG. 1 is a flow chart of a detection method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a detection method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a detection method according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a detection method according to an embodiment of the present invention;

fig. 5 is a block diagram of a detection apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of a detection apparatus according to an embodiment of the present invention;

FIG. 7 is a block diagram of a detecting device according to an embodiment of the present invention;

description of reference numerals:

1-an obtaining module, 2-a judging module, 21-a checking unit, 22-a first parameter configuration unit, 23-a judging unit, 24-a second parameter configuration unit, 3-a voice processing module, 31-a first voice operation database unit, 32-a first audio synthesis unit, 33-a first audio output unit, 34-a second voice operation database unit, 35-a second audio synthesis unit, 36-a second audio output unit and 4-a first communication module.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

An embodiment of the present invention provides a method for detecting speech recognition, as shown in fig. 1, including:

step S1, after the vehicle is connected with a vehicle to be detected, the vehicle type information of the vehicle is obtained;

step S2, judging the voice configuration state of the vehicle according to the vehicle type information;

step S3, sending audio information to the voice recognition server according to the determined voice configuration state;

step S4, receiving a detection result after performing voice recognition on the audio information sent by the voice recognition server.

According to the embodiment of the invention, the preset vehicle type information can be obtained by judging the voice configuration state of the vehicle according to the vehicle type information, so that the detection efficiency is improved; the voice recognition server for voice recognition detection can be judged and the audio information can be sent for detection according to the determined voice configuration state, then the detection result sent by the voice recognition server for voice recognition of the audio information is received, and the efficiency and the accuracy of voice recognition can be judged according to the detection result. The voice recognition server comprises a vehicle and/or a cloud server; the vehicle includes an in-vehicle infotainment system, the in-vehicle infotainment system including: a voice recognition system and a second communication module.

The following describes a specific implementation process of the above scheme with reference to a specific flow:

as shown in fig. 2, the step S2 includes:

step S21, carrying out consistency check on the vehicle type information and preset vehicle type information; the preset vehicle type information comprises a plurality of vehicle type information, and the preset vehicle type information is parameter information of preset different vehicle types; when the vehicle type information is preset in the preset vehicle type information and passes the verification, the step S22 is executed; and when the vehicle type information does not exist in the preset vehicle type information, if the verification fails, the step S23 is executed.

Step S22, when the check is passed, acquiring a first parameter configuration which is consistent with the vehicle type information in the preset vehicle type information; wherein the first parameter configuration comprises: the type of microphone (including electret, silicon microphone, etc.), speaker information (including male, female, old, child, etc.), speech speed (including fast, medium, slow, etc.), sound intensity, distance, noise (including wind noise, road dryness, fetal noise, etc.), etc. The setting of the speech rate may be set according to a method that the speech rate is faster than the first speech rate, faster than the second speech rate and less than or equal to the first speech rate is a medium speed, and slower than or equal to the second speech rate is a slow speed.

And step S23, when the verification fails, acquiring a second parameter configuration. When the second parameter is configured that the vehicle type information does not exist in the preset vehicle type information, an operator sets the second parameter manually, and can select and configure the parameter meeting the vehicle requirement according to the basic information of the vehicle.

And step S24, judging the voice configuration state of the vehicle according to the first parameter configuration and the second parameter configuration. Wherein the voice configuration state comprises: a local voice configuration state and/or a cloud voice configuration state. And judging the voice configuration state of the vehicle according to the first parameter configuration, the second parameter configuration and the attribute judgment of the voice development kit integrated into the vehicle-mounted infotainment system.

As shown in fig. 3, when the speech recognition server is a vehicle, the step S3 includes:

step S311, when the voice configuration state is determined to be the local voice configuration state, acquiring first text information; wherein the text information includes: local navigation, local music, radio, telephone, and other language text sets, or different language text sets according to actual settings.

Step S312, acquiring first audio information in the audio information according to the first text information; wherein the first audio information is audio processed and synthesized according to the first text information.

Step S313, transmitting the first audio information to the vehicle. The sending mode may be voice playing through a speaker, and the voice recognition system of the vehicle performs voice recognition through the received content of the voice playing.

As shown in fig. 3, when the voice recognition server is a cloud server, the step S3 includes:

step S321, when the voice configuration state is determined to be the cloud voice configuration state, acquiring second text information; the second text information may include: and the jargon text sets such as weather information, flight information, stock information and the like can also be set differently according to the actual setting.

Step S322, acquiring second audio information in the audio information according to the second text information; wherein the second audio information is audio processed and synthesized according to the second text information.

Step S323, sending second audio information to the cloud server. The second audio information is sent to the cloud server through a second communication module of the vehicle.

As shown in fig. 4, when the voice recognition server is a vehicle and a cloud server, the sending audio information to the voice recognition server according to the determined voice configuration state (i.e., step S3) includes:

step S331, when the voice configuration state is determined to be the local voice configuration state and the cloud voice configuration state, acquiring first text information and second text information;

step S332, acquiring first audio information in the audio information according to the first text information, and acquiring second audio information in the audio information according to the second text information;

step S333, sending the first audio information to the vehicle, and sending the second audio information to the cloud server.

The steps when the voice recognition server is a vehicle and a cloud server are equivalent to the steps when the voice configuration state is determined to be the local voice configuration state and the voice configuration state is determined to be the cloud voice configuration state.

In the step S4, the method includes:

receiving a first detection result which is sent by the vehicle and used for carrying out voice recognition on the first audio information; and/or

And receiving a second detection result which is sent by the cloud server and used for carrying out voice recognition on the second audio information.

The first detection result and/or the second detection result can be printed, and whether the voice recognition is correct or not and the text information which cannot be recognized can be judged according to the printed result to serve as the voice recognition detection result.

An embodiment of the present invention further provides a detection apparatus for speech recognition, as shown in fig. 5, including:

the system comprises an acquisition module 1, a detection module and a display module, wherein the acquisition module is used for acquiring vehicle type information of a vehicle after the vehicle is connected with the vehicle to be detected;

the judging module 2 is used for judging the voice configuration state of the vehicle according to the vehicle type information;

the voice processing module 3 is used for sending audio information to the voice recognition server according to the determined voice configuration state;

and the first communication module 4 is used for receiving a detection result which is sent by the voice recognition server and used for carrying out voice recognition on the audio information.

According to the embodiment of the invention, the preset vehicle type information can be obtained by judging the voice configuration state of the vehicle according to the vehicle type information, so that the detection efficiency is improved; the voice recognition server for voice recognition detection can be judged and the audio information can be sent for detection according to the determined voice configuration state, then the detection result sent by the voice recognition server for voice recognition of the audio information is received, and the efficiency and the accuracy of voice recognition can be judged according to the detection result. The voice recognition server comprises a vehicle and/or a cloud server; wherein the vehicle includes a vehicle infotainment system, the vehicle infotainment system including: the voice recognition system and the second communication module; the first communication module 4 is connected with the second communication module of the vehicle in a mobile network connection mode, a wireless network connection mode, a Bluetooth connection mode and a controller area network connection mode.

In an embodiment of the present invention, as shown in fig. 6, the determining module 2 includes:

the checking unit 21 is used for checking consistency of the vehicle type information and preset vehicle type information; the preset vehicle type information comprises a plurality of vehicle type information, and the preset vehicle type information is parameter information of preset different vehicle types.

The first parameter configuration unit 22 is configured to, when the vehicle type information passes the verification, obtain a first parameter configuration that is consistent with the vehicle type information in the preset vehicle type information; wherein the first parameter configuration comprises: the type of microphone (including electret, silicon microphone, etc.), speaker information (including male, female, old, child, etc.), speech speed (including fast, medium, slow, etc.), sound intensity, distance, noise (including wind noise, road dryness, fetal noise, etc.), etc. The setting of the speech rate may be set according to a method that the speech rate is faster than the first speech rate, faster than the second speech rate and less than or equal to the first speech rate is a medium speed, and slower than or equal to the second speech rate is a slow speed.

And the judging unit 23 is used for judging the voice configuration state of the vehicle according to the first parameter configuration. Wherein the voice configuration state comprises: a local voice configuration state and/or a cloud voice configuration state. And judging the voice configuration state of the vehicle according to the first parameter configuration, the second parameter configuration and the attribute judgment of the voice development kit integrated into the vehicle-mounted infotainment system.

In an embodiment of the present invention, as shown in fig. 6, the determining module 2 further includes:

and the second parameter configuration unit 24 is configured to obtain a second parameter configuration when the verification fails. When the second parameter is configured that the vehicle type information does not exist in the preset vehicle type information, an operator sets the second parameter manually, and can select and configure the parameter meeting the vehicle requirement according to the basic information of the vehicle.

In an embodiment of the present invention, as shown in fig. 7, when the speech recognition server is a vehicle, the speech processing module 3 includes:

a first voice database unit 31, configured to obtain first text information when it is determined that the voice configuration state is the local voice configuration state; wherein the text information includes: local navigation, local music, radio, telephone, and other language text sets, or different language text sets according to actual settings.

The first audio synthesizing unit 32 is configured to obtain first audio information in the audio information according to the first text information; the first audio information is an audio processed and synthesized according to the first text information, and the speech synthesis unit may be composed of a digital audio processing chip.

A first audio output unit 33 for transmitting the first audio information to the vehicle. The sending mode may be voice playing through a speaker, and the voice recognition system of the vehicle performs voice recognition through the received content of the voice playing.

In an embodiment of the present invention, when the voice recognition server is a cloud server, the voice processing module 3 includes:

a second language database unit 34, configured to obtain second text information when it is determined that the voice configuration state is the cloud voice configuration state; the second text information may include: and the jargon text sets such as weather information, flight information, stock information and the like can also be set differently according to the actual setting.

The second audio synthesizing unit 35 is configured to obtain second audio information in the audio information according to the second text information; wherein the second audio information is audio processed and synthesized according to the second text information.

And a second audio output unit 36, configured to send second audio information to the cloud server. The second audio information is sent to the cloud server through a second communication module of the vehicle.

In an embodiment of the present invention, the first communication module 4 includes:

The vehicle-mounted information entertainment system can be upgraded through the voice recognition detection device, the first detection result and/or the second detection result can be printed through the computer and the printer through the voice recognition detection device, and whether the voice recognition is correct or not and text information which cannot be recognized can be judged according to the printed result so as to serve as the voice recognition detection result. The communication connection can be mobile network connection, wireless network connection and Bluetooth connection; and the system is connected with a vehicle-mounted automatic diagnosis system through a controller local area network, can quickly acquire a signal log of the voice control of the whole vehicle through a database of vehicle type information and preset vehicle type information, and can be aligned and detected together with a voice software log when the detection structure is printed.

In a specific embodiment of the present invention, the detection device for voice recognition may be placed at a position where the driver is located, the device is provided with a packaging structure, and the obtaining module 1, the judging module 2, the voice processing module 3 and the first communication module 4 are all disposed in the packaging structure; the packaging structure can be designed by simulating the morphological structure of a driver, a microphone and a loudspeaker are arranged at the position, close to the headrest of the driver seat, of the packaging structure, the loudspeaker is used for playing voice, and the microphone is used for receiving voice. And a fixing strap can be further arranged at a position, close to the headrest, on the packaging structure, and is used for enabling the packaging structure to be fixedly attached to the headrest. The vehicle-mounted infotainment system on the vehicle is arranged on the headrest, can simulate all seats of the whole vehicle, and flexibly applies single microphones and multiple microphones.

It should be noted that the embodiment of the apparatus is an apparatus corresponding to the embodiment of the method, and all implementations in the embodiment of the method are applicable to the embodiment of the apparatus, and the same technical effect can be achieved.

The embodiment of the invention also provides a voice recognition detection device, which comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any of the steps of the detection method of speech recognition as described above when executing the computer program.

According to the embodiment of the invention, the preset vehicle type information can be obtained by judging the voice configuration state of the vehicle according to the vehicle type information, so that the detection efficiency is improved; the voice recognition server for voice recognition detection can be judged and the audio information can be sent for detection according to the determined voice configuration state, then the detection result sent by the voice recognition server for voice recognition of the audio information is received, and the efficiency and the accuracy of voice recognition can be judged according to the detection result.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for detecting speech recognition, comprising:

judging the voice configuration state of the vehicle according to the vehicle type information, wherein the voice configuration state comprises the following steps: a local voice configuration state and/or a cloud voice configuration state;

receiving a detection result which is sent by the voice recognition server and used for carrying out voice recognition on the audio information;

printing the detection result, and determining a voice recognition detection result of the voice recognition server according to the printing result, wherein the voice recognition detection result comprises: whether the speech recognition is correct, and text information which cannot be recognized.

2. The method for detecting speech recognition according to claim 1, wherein said determining the speech configuration state of the vehicle based on the vehicle type information comprises:

3. The method for detecting speech recognition according to claim 2, wherein after the checking the vehicle type information for consistency with the preset vehicle type information, the method further comprises:

and when the verification fails, acquiring a second parameter configuration.

4. The method for detecting speech recognition according to claim 1, wherein the sending audio information to the speech recognition server according to the determined speech configuration state comprises:

when the voice configuration state is determined to be the local voice configuration state, acquiring first text information, and determining that the voice recognition server is a vehicle;

transmitting the first audio information to the vehicle.

5. The method for detecting speech recognition according to claim 1, wherein the sending audio information to the speech recognition server according to the determined speech configuration state further comprises:

when the voice configuration state is determined to be the cloud voice configuration state, acquiring second text information, and determining that the voice recognition server is a cloud server;

and sending second audio information to the cloud server.

6. A detection apparatus for speech recognition, comprising:

the judging module is used for judging the voice configuration state of the vehicle according to the vehicle type information, wherein the voice configuration state comprises the following steps: a local voice configuration state and/or a cloud voice configuration state;

the first communication module is used for receiving a detection result which is sent by the voice recognition server and used for carrying out voice recognition on the audio information;

a determining module, configured to print the detection result, and determine a voice recognition detection result of the voice recognition server according to the print result, where the voice recognition detection result includes: whether the speech recognition is correct, and text information which cannot be recognized.

7. The apparatus for detecting speech recognition according to claim 6, wherein said determining module comprises:

8. The apparatus for detecting speech recognition according to claim 7, wherein said determining module further comprises:

9. The detection apparatus of speech recognition according to claim 6, wherein the speech processing module comprises:

the first voice database unit is used for acquiring first text information when the voice configuration state is determined to be the local voice configuration state, and determining the voice recognition server to be a vehicle;

10. The detection apparatus of speech recognition according to claim 6, wherein the speech processing module further comprises:

the second language database unit is used for acquiring second text information when the voice configuration state is determined to be the cloud voice configuration state, and determining the voice recognition server to be the cloud server;

11. A detection apparatus for speech recognition, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps in the detection method of speech recognition according to any one of claims 1 to 5 when executing the computer program.