CN119763613A

CN119763613A - Simulation test set generation method and device, electronic equipment and storage medium

Info

Publication number: CN119763613A
Application number: CN202411187325.5A
Authority: CN
Inventors: 邱宇豪; 徐小毅; 商文龙
Original assignee: Beijing Co Wheels Technology Co Ltd
Current assignee: Beijing Co Wheels Technology Co Ltd
Priority date: 2024-08-27
Filing date: 2024-08-27
Publication date: 2025-04-04

Abstract

The embodiment of the present invention provides a method, device, electronic device and storage medium for generating a simulation test set; the method includes: a voice test set automatic conversion device plays a preset first audio data set in a vehicle in a preset vehicle scene; obtains a second audio data set collected for the first audio data set from the vehicle, and log information generated by the vehicle in the process of playing the first audio data set; obtains the target recognition result of the vehicle for the second audio data set from the log information; annotates the second audio data set according to the target recognition result, and generates a simulation test set according to the annotated result and the second audio data set. Since the simulation test set is automatically generated throughout the process, compared with the method of playing, screening and annotating by staff in the prior art, the embodiment of the present invention can improve the efficiency of generating the simulation test set and reduce the investment cost of staff time and energy.

Description

Simulation test set generation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method for generating a simulation test set, a device for generating a simulation test set, an electronic device, and a computer-readable storage medium.

Background

In new energy automobiles, under the increasing popularity of intelligent voice technology, it becomes particularly important to ensure the performance of a voice system. For the performance effect of the voice system, the performance effect is often required to be ensured by means of large-scale and repeated tests, the existing test mode generally comprises the steps of manually waking up voice, speaking or playing pre-recorded audio in a vehicle through sound equipment, the error problem in voice waking up and recognition is required to be monitored in real time, and a large number of logs are screened, extracted, summarized and counted to form a test result, so that the long-term continuous test has a certain limitation in the market environment of multiple vehicle types.

Therefore, in the environment of rapid iteration of the voice product, a solution capable of rapidly completing multi-vehicle multi-scenario testing is needed. At present, simulation tests are usually adopted, and for the production of a simulated voice test set, complicated manual recording, playing and data acquisition are often required, and complicated manual labeling, alignment of audio and tables and other operations are manually carried out on the data, so that a great deal of time and effort are required.

Disclosure of Invention

In view of the above problems, a method of generating a simulation test set, a device of generating a simulation test set, an electronic apparatus, and a computer-readable storage medium have been proposed that overcome or at least partially solve the above problems.

The embodiment of the invention provides a method for generating a simulation test set, which is applied to an automatic conversion device of a voice test set, and comprises the following steps:

The automatic conversion device of the voice test set plays a preset first audio data set in a vehicle in a preset vehicle scene;

obtaining, from the vehicle, a second audio data set collected for the first audio data set, and log information generated by the vehicle during playback of the first audio data set;

Obtaining a target identification result of the vehicle for the second audio data set from the log information;

And marking the second audio data set according to the target identification result, and generating a simulation test set according to the marked result and the second audio data set.

Optionally, the method further comprises:

acquiring preset first audio data and scene configuration information set for each first audio data;

Storing the scene configuration information and the first audio data into a preset first data table to obtain a first audio data set;

The playing of the preset first audio data set in the vehicle in the preset vehicle scene comprises the following steps:

and constructing the preset vehicle scene in the vehicle according to the scene configuration information in the first data table, and playing the first audio data in the first audio data set in the vehicle.

Optionally, the storing the scene configuration information and the first audio data in a preset first data table includes:

And storing the first audio data and the scene configuration information into a preset first data table according to a preset format requirement.

Optionally, the scene configuration information includes a vehicle control parameter and a play control parameter, and the constructing the preset vehicle scene in the vehicle according to the scene configuration information in the first data table, and playing the first audio data in the first audio data set in the vehicle includes:

Controlling the vehicle according to the vehicle control parameters so as to construct the preset vehicle scene in the vehicle;

in the vehicle, first audio data in the first audio data set is played according to the play control parameter.

Optionally, generating a simulation test set according to the noted result and the second audio data set includes:

determining scene configuration information corresponding to first audio data corresponding to each second audio data in the second audio data set from the first data table;

And generating the simulation test set according to the labeling result, scene configuration information corresponding to each second audio data in the second audio data set and the second audio data set.

Optionally, the marking the second audio data set according to the target recognition result, and generating a simulation test set according to the marked result and the second audio data set, including:

determining the time point of the target words and sentences in the second audio data set according to the target recognition result, and taking the time point of the target words and sentences in the second audio data as a labeling result;

And generating a simulation test set according to the marked result and the second audio data set.

Optionally, the method further comprises:

And cleaning a target storage space, wherein the target storage space is used for storing the log information and/or the second audio data set.

Optionally, the method further comprises:

performing simulation test by using the second audio data set in the simulation test set to obtain a first simulation test result;

And evaluating the simulation test according to the first simulation test result and the identification result in the simulation test set.

The embodiment of the invention also provides a device for generating the simulation test set, which is applied to the automatic conversion device of the voice test set, and comprises the following steps:

The playing module is used for playing a preset first audio data set in a vehicle in a preset vehicle scene by the voice test set automatic conversion device;

A collection module for acquiring a second audio data set collected for the first audio data set from the vehicle, and log information generated by the vehicle during the playing of the first audio data set;

the extraction module is used for acquiring a target identification result of the vehicle for the second audio data set from the log information;

The generating module is used for marking the second audio data set according to the target identification result and generating a simulation test set according to the marked result and the second audio data set.

Optionally, the apparatus further comprises:

The scene information acquisition module is used for acquiring preset first audio data and scene configuration information set for each first audio data, storing the scene configuration information and the first audio data into a preset first data table, and obtaining the first audio data set;

the playing module is configured to construct the preset vehicle scene in the vehicle according to the scene configuration information in the first data table, and play the first audio data in the first audio data set in the vehicle.

Optionally, the scene information obtaining module is configured to store the first audio data and the scene configuration information into a preset first data table according to a preset format requirement.

Optionally, the scene configuration information includes a vehicle control parameter and a playing control parameter, and the playing module is configured to control the vehicle according to the vehicle control parameter so as to construct the preset vehicle scene in the vehicle, and play the first audio data in the first audio data set according to the playing control parameter in the vehicle.

Optionally, the generating module is configured to determine, from the first data table, scene configuration information corresponding to first audio data corresponding to each second audio data in the second audio data set, and generate the simulation test set according to the labeling result, the scene configuration information corresponding to each second audio data in the second audio data set, and the second audio data set.

Optionally, the generating module is configured to determine, according to the target recognition result, a time point of a target word and sentence in each second audio data in the second audio data set, and use the time point of the target word and sentence in each second audio data as a labeling result, and generate a simulation test set according to the labeling result and the second audio data set.

Optionally, the apparatus further comprises:

The cleaning module is used for cleaning a target storage space, and the target storage space is used for storing the log information and/or the second audio data set.

Optionally, the apparatus further comprises:

The evaluation module is used for performing simulation test by using the second audio data set in the simulation test set to obtain a first simulation test result, and evaluating the simulation test according to the first simulation test result and the identification result in the simulation test set.

The embodiment of the invention also provides electronic equipment, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the generation method of the simulation test set when being executed by the processor.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the generation method of the simulation test set when being executed by a processor.

The embodiment of the invention has the following advantages:

In the embodiment of the invention, the automatic conversion device of the voice test set plays a preset first audio data set in a vehicle in a preset vehicle scene, acquires a second audio data set collected for the first audio data set from the vehicle and log information generated by the vehicle in the process of playing the first audio data set, acquires a target identification result of the vehicle for the second audio data set from the log information, marks the second audio data set according to the target identification result, and generates a simulation test set according to the marked result and the second audio data set. Because the simulation test set is automatically generated in the whole process, compared with the mode of playing, screening and labeling by staff in the prior art, the embodiment of the invention can improve the efficiency of generating the simulation test set and reduce the input cost of time and energy of the staff.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of steps of a method for generating a simulation test set in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of steps of another method for generating a simulation test set in accordance with an embodiment of the present invention;

FIG. 3a is a schematic view of a scenario of an embodiment of the present invention;

FIG. 3b is a schematic view of another scenario of an embodiment of the present invention;

FIG. 4a is a flowchart illustrating steps for simulation test set generation in accordance with an embodiment of the present invention;

FIG. 4b is a flowchart of the steps for another simulation test set generation in accordance with an embodiment of the present invention;

fig. 5 is a block diagram of a simulation test set generating apparatus according to an embodiment of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In practical application, a simulation test set is needed to be obtained first, and the simulation test set can be used for simulating the real environment or condition. Specifically, in the vehicle voice wake-up and voice recognition scenarios, a worker is required to drive the vehicle first so that the vehicle is in a different vehicle scenario (e.g., a high-speed driving scenario, a low-speed driving scenario, etc.), and then the worker can play preset audio data in the vehicle.

Then, the staff can acquire the audio data recorded by the vehicle aiming at the played preset audio data from the vehicle, screen the corresponding information from the log information of the vehicle, and then generate a simulation test set, and in the process, the staff needs to consume a great deal of time and effort to play, screen, mark and other operations. In order to reduce the time consumed by staff to manufacture the simulation test set, the embodiment of the invention provides a method for generating the simulation test set, which automatically generates the simulation test set in the modes of automatic playing, automatic screening and automatic labeling.

Specifically, referring to fig. 1, fig. 1 shows a flowchart of steps of a method for generating a simulation test set according to an embodiment of the present invention, where the method may be applied to a voice test set automatic conversion device.

As shown in fig. 1, the method may include the steps of:

Step 101, the automatic conversion device of the voice test set plays a preset first audio data set in a vehicle in a preset vehicle scene.

The first audio data set may include a plurality of audio data, where the plurality of audio data may include different contents, and exemplary, the plurality of audio data may include audio data of voice wakeup of different ages, different sexes, and different speech speeds, and audio data of recognition sentences of different language types, which is not limited by the embodiment of the present invention.

The preset vehicle scene may refer to a scene where a vehicle playing audio data in the first audio data set is located, for example, a high-speed driving scene, a low-speed driving scene.

In practical application, the vehicle can be controlled to be in a preset vehicle scene, and then the vehicle or a playing device deployed in the vehicle can be controlled to play the audio data in the first audio data set.

The method for generating the simulation test set in the invention can be realized by a voice test set automatic conversion device, and particularly, when a preset first audio data set is played, the voice test set automatic conversion device can send the audio data in the first audio data set to a vehicle or a playing device deployed in the vehicle so that the vehicle or the playing device deployed in the vehicle plays the audio data in the first audio data set.

Step 102, acquiring a second audio data set collected for the first audio data set from the vehicle, and log information generated by the vehicle during the playing of the first audio data set.

When playing the audio data in the first audio data set, an audio recording device (for example, a microphone, the microphone may include one or more microphones, the one or more microphones may be disposed in the vehicle according to actual situations, and when the microphone includes a plurality of microphones, the plurality of microphones may be distributed in different positions in the vehicle) may record the audio data in the vehicle, so as to record the second audio data set.

When the simulation test set is generated, the automatic conversion device of the voice test set can automatically read a second audio data set which is obtained by the vehicle and collected by aiming at the played first audio data set from the vehicle, and the second audio data set can also comprise a plurality of audio data which are obtained by recording the audio data in the played first audio data set.

It should be noted that, the difference between the audio data in the first audio data set and the audio data in the second audio data set is that the audio data in the second audio data set includes both the sound of the audio data in the first audio data set and the sound of the preset vehicle scene, for example, the generated sound of the vehicle driving or the sound of the other person speaking, which is not limited in the embodiment of the present invention.

In some possible embodiments, the vehicle may play the audio data after receiving the audio data in the first audio data set, and when playing the audio data, a vehicle-mounted system in the vehicle may collect the audio data and identify the collected data (i.e., the second audio data) to generate corresponding log information, where the log information may include a result of collecting the played audio data and identifying the collected second audio data, such as text content obtained after voice recognition or a time corresponding to each identified word, or sentence in the audio data after identifying the audio data in the second audio data set.

The vehicle may also collect sounds in the environment when collecting audio data in the first audio data set that is played, and then the vehicle may identify the collected data and generate corresponding log information based on the result of the identification.

In practical application, when the simulation test set is generated, the automatic conversion device of the voice test set can also automatically read log information generated by the vehicle in the process of playing the first audio data set from the vehicle.

In some possible embodiments, the vehicle for outputting the log information and the second audio data set may be a vehicle with a certain voice recognition capability, which may recognize and analyze the audio data more accurately.

And step 103, acquiring a target identification result of the vehicle for the second audio data set from the log information.

After obtaining the log information, a target recognition result of the vehicle on the audio data in the second audio data set may be obtained, where the target recognition result may be used to characterize information of each audio data in the second audio data set, for example, whether the audio data is a text sentence including a wake-up word or a text sentence including a non-wake-up word, if the audio data is a sentence including a wake-up word, the target recognition result may include a time period in which the wake-up word is located in the audio data, if the audio data is a text sentence including a non-wake-up word, the target recognition result may further include text content specific to the text sentence, and/or a time period in which a word, a word or a sentence in the text content is located in the audio data (for example, a time period in which a word is located in the audio data may be represented by a time at which a word starts and a time at which the word ends in the audio data).

And 104, marking the second audio data set according to the target identification result, and generating a simulation test set according to the marked result and the second audio data set.

After the target recognition result is obtained, the voice test set automatic conversion device can automatically label the audio data in the second audio data set, so that the corresponding content of each audio data in the second audio data set, the time of the corresponding content in the second audio data and the like can be determined when the simulation test is performed later.

After the audio data in the second audio data set is marked, the voice test set automatic conversion device can generate a simulation test set according to the marked result and the second audio data set, wherein the simulation test set can comprise the marked result and the second audio data set, the second audio data set can comprise a file with specific audio data, a storage address, a name and the like of the specific audio data, and the embodiment of the invention is not limited to the above.

In some possible embodiments, after the simulation test set is obtained, the simulation test set can be applied to vehicles of different models or different vehicle simulators so as to perform simulation tests on the vehicles of different models, and because the simulation test set is automatically generated by the voice test set automatic conversion device in the whole course, compared with the mode of playing, screening and labeling by staff in the prior art, the embodiment of the invention can improve the efficiency of generating the simulation test set and reduce the investment cost of time and energy of the staff.

The method includes the steps of receiving a first audio data set, receiving a first identification result from a vehicle or a vehicle simulator, receiving a second identification result from the vehicle or the vehicle simulator, and determining performance of the vehicle or the vehicle simulator based on the first identification result and a labeling result in the simulation test set. For example, if the recognition result is matched with the labeling result in the simulation test set, the vehicle or the vehicle simulator performance can be indicated to pass, otherwise, the vehicle or the vehicle simulator performance can be indicated to pass, and readjustment is needed.

In the embodiment of the invention, a preset first audio data set is played in a vehicle in a preset vehicle scene, a second audio data set collected for the first audio data set is acquired from the vehicle, log information generated by the vehicle in the process of playing the first audio data set is acquired, a target identification result of the vehicle for the second audio data set is acquired from the log information, the second audio data set is marked according to the target identification result, and a simulation test set is generated according to the marked result and the second audio data set. Because the simulation test set is automatically generated in the whole process, compared with the mode of playing, screening and marking by staff in the prior art, the embodiment of the invention can improve the efficiency of generating the simulation test set and reduce the input cost of time and energy of the staff.

Referring to fig. 2, a flowchart illustrating steps of another method for generating a simulation test set according to an embodiment of the present invention may include the following steps:

step 201, acquiring preset first audio data and scene configuration information set for each first audio data.

In practical applications, a plurality of first audio data may be pre-recorded, where the plurality of first audio data may include audio data of voice wake-up of different ages, sexes, and speech speeds, and audio data of recognition sentences of different language types, which is not limited in the embodiment of the present invention.

In addition, corresponding scene configuration information can be set for different first audio data respectively, and the scene configuration information can be used for configuring a vehicle scene played by the first audio data.

In some possible embodiments, the voice test set automatic conversion device may first acquire a preset plurality of first audio data, and scene configuration information set for each first audio data.

Step 202, storing scene configuration information and first audio data into a preset first data table to obtain a first audio data set.

After obtaining a plurality of preset first audio data and scene configuration information set for each first audio data, the voice test set automatic conversion device can store the scene configuration information and the first audio data into a preset first data table according to a corresponding relation, so as to obtain a first audio data set.

In some possible embodiments, the scene configuration information and the first audio data may be sequentially stored in a preset first data table according to the playing sequence, so that the automatic conversion device for the voice test set may play the first audio data in the first audio data set according to the playing sequence later.

In an embodiment of the present invention, the above step 202 may be implemented as follows:

In some possible embodiments, in order to ensure that the audio data can be correctly played and processed, the first audio data and the scene configuration information can be stored in a preset first data table according to a preset format requirement, specifically, the first audio data in 16000Hz, single channel and wav formats and the scene configuration information corresponding to the first audio data can be stored in the preset first data table, and the embodiment of the invention has no limitation on the format requirement and can be set according to practical situations.

Step 203, according to the scene configuration information in the first data table, a preset vehicle scene is built in the vehicle, and the first audio data in the first audio data set is played in the vehicle.

In some possible embodiments, the voice test set automatic conversion device may first construct a preset vehicle scene in the vehicle according to the scene configuration information after obtaining the scene configuration information in the first data table.

The voice test set automatic conversion device may then play the first audio data in the first audio data set in a vehicle in a preset vehicle scene.

In an embodiment of the present invention, the scene configuration information may include a vehicle control parameter and a play control parameter, where the vehicle control parameter may include a parameter for controlling a vehicle to construct a preset vehicle scene, for example, a vehicle speed, etc., and the play control parameter may include a parameter for controlling a play to construct a preset vehicle scene, for example, a play position, a play volume, etc., which is not limited in the embodiment of the present invention. In practical applications, step 203 may be implemented by the following sub-steps:

and step 11, controlling the vehicle according to the vehicle control parameters so as to construct a preset vehicle scene in the vehicle.

After the voice test set automatic conversion device obtains the vehicle control parameters, the vehicle can be controlled first, so that the vehicle is in a preset vehicle scene.

The method may further include controlling the vehicle to travel at a speed exceeding a preset speed value (e.g., 80 km/h) when the preset vehicle scene is a high-speed traveling scene, and determining that the vehicle is in the high-speed traveling scene when the vehicle speed exceeds the preset speed value, wherein the first audio data in the first audio data set may be played according to the play control parameter.

In some possible embodiments, scenes such as in-car conversations can be added, for example, when the first audio data set can be played in the car, the audio data of the person-to-person conversations can be played through other audio playing devices so as to enrich the simulation test set.

In the vehicle, the first audio data in the first audio data set is played according to the play control parameter, sub-step 12.

The first audio data in the first audio data set can be played in the vehicle according to the play control parameters after the vehicle is in the preset vehicle scene, and the first audio data can be played in the corresponding position or according to the set volume by way of example, and the embodiment of the invention is not limited in this way.

Step 204, obtaining a second audio data set collected for the first audio data set from the vehicle, and log information generated by the vehicle during playing of the first audio data set.

When playing the audio data in the first audio data set, the vehicle may record the audio data in the first audio data set, thereby recording the audio data in the second audio data set.

When the simulation test set is generated, the voice test set automatic conversion device can automatically read a second audio data set obtained by the vehicle for the first audio data set to be played from the vehicle.

When the audio data is played, a vehicle machine system in the vehicle can collect the audio data and identify the collected data (namely the audio data in the second audio data set) to generate corresponding log information.

Step 205, obtaining a target identification result of the vehicle for the second audio data set from the log information.

After the log information is obtained, the voice test set automatic conversion device can obtain a target recognition result of the vehicle on the audio data in the second audio data set.

And 206, determining the time point of the target words and sentences in each second audio data in the second audio data set according to the target recognition result, and taking the time point of the target words and sentences in each second audio data as a labeling result.

In some possible embodiments, after obtaining the target recognition result, the voice test set automatic conversion device may determine a time point of the target word in each second audio data set based on the information of the time therein, so as to determine the position of the target word in the second audio data based on the time point.

In practical application, the time point corresponding to the target word and sentence in each second audio data can be used as the result of labeling the target word and sentence in the second audio data. Based on the result of the tagging, a time at which the target phrase in the second audio data is located may be determined. Furthermore, the simulation test can be evaluated based on the time in which the same target word or sentence is located in the result output by the simulation test at the time of the simulation test.

The target words and phrases may be wake words or sentences other than wake words, which are not limited in the embodiment of the present invention.

Step 207, generating a simulation test set according to the marked result and the second audio data set.

After obtaining the time point of the target word and sentence in each second audio data, the voice test set automatic conversion device may generate the simulation test set based on the time point corresponding to the target word and sentence in each second audio data and the second audio data set.

In one embodiment of the present invention, step 207 may be implemented as follows:

And determining scene configuration information corresponding to the first audio data corresponding to each second audio data in the second audio data set from the first data table. And generating a simulation test set according to the labeling result, scene configuration information corresponding to each second audio data in the second audio data set and the second audio data set.

In some possible embodiments, when the simulation test set is generated, the voice test set automatic conversion device may further read, from the first data table, scene configuration information corresponding to the first audio data transcribed by each second audio data, and use the scene configuration information as scene configuration information of the second audio data.

After the second audio data are marked and the scene configuration information corresponding to each second audio data is determined, the voice test set automatic conversion device can generate a simulation test set according to the marked result, the scene configuration information corresponding to each second audio data in the second audio data set and the second audio data set.

Specifically, the labeling result, scene configuration information corresponding to each second audio data in the second audio data set, and the name or address of the second audio data in the second audio data set may be written into the second data table according to the corresponding relationship, and the second data table is used as a simulation test set.

Based on the name or address of the second audio data in the simulation test set, the corresponding second audio data can be obtained, and based on the obtained second audio data and the labeling result and scene configuration information in the simulation test set, simulation tests of different vehicle scenes can be performed.

In some possible embodiments, when the simulation test is performed, the scene for which the simulation test is performed can be determined based on the scene configuration information, then the second audio data in the second audio data set can be played, and the simulation test is evaluated based on the result of the output of the simulation test with the result of the identification in the simulation test as a reference, so that the performance condition of the real vehicle or the vehicle simulator performing the simulation test is determined.

In an embodiment of the present invention, the method may further include the following steps:

And cleaning a target storage space, wherein the target storage space is used for storing log information and/or the second audio data set.

In some possible embodiments, the log information and/or the second audio data set may be stored in the target storage space after the log information and/or the second audio data set is obtained. The target storage space may be cleaned before storing the log information and/or the second audio data set in order to avoid interference with other information, and then the log information and/or the second audio data set currently obtained for the first audio data set may be stored in the target storage space.

In practical application, after the simulation data set is output, the target storage space may be cleaned up to transcribe the next audio data set to obtain another simulation data set, which is not limited in the embodiment of the present invention.

And evaluating the simulation test according to the first simulation test result and the result of the identification in the simulation test set.

In some possible embodiments, the simulation test may be performed using a second set of audio data in the simulation test set, which may be played in a real vehicle, for example. After the simulation test set is used for performing the simulation test, a first simulation test result can be obtained, and the first simulation test result can represent the recognition condition of the simulation test on the audio data in the played second audio data set, such as whether the simulation test is wake-up by a wake-up word or the specific content is recognized.

After the first simulation test result is obtained, the simulation test can be evaluated according to the identification result and the first simulation test result in the simulation test set, and specifically, if the identification result is matched with the first simulation test result or the matching degree reaches a preset value, the simulation test can be passed, namely, the real vehicle or vehicle simulator for performing the simulation test passes. For example, at the same time, whether text content of the same wake-up word or non-wake-up word corresponds.

In some possible embodiments, the simulation test can also be used to evaluate the quality of the simulation test set. For example, if the first simulation test result deviates greatly from the identified result, it may also indicate that the simulation test set may have a problem. In order to further verify the simulation test set, after multiple simulation tests are performed, the advantages and disadvantages of the simulation test set can be evaluated based on the results of the multiple simulation tests.

In the embodiment of the invention, preset first audio data and scene configuration information set for each first audio data are acquired, the scene configuration information and the first audio data are stored in a preset first data table to obtain a first audio data set, a preset vehicle scene is built in a vehicle according to the scene configuration information in the first data table, the first audio data in the first audio data set are played in the vehicle, a second audio data set collected for the first audio data set and log information generated by the vehicle in the process of playing the first audio data set are acquired from the vehicle, a target recognition result of the vehicle for the second audio data set is acquired from the log information, the time point of a target word and sentence in each second audio data in the second audio data set is determined according to the target recognition result, the time point of the target word and sentence in each second audio data is used as a labeling result, and a simulation test set is generated according to the labeling result and the second audio data set. Because the simulation test set is automatically generated in the whole process, compared with the mode of playing, screening and marking by staff in the prior art, the embodiment of the invention can improve the efficiency of generating the simulation test set and reduce the input cost of time and energy of the staff.

The above method is further described by a specific example:

1. Automated conversion preparation:

1.1, preparing a plurality of first audio data according to the requirements of a simulation test set in advance, wherein the plurality of first audio data comprise voice awakening of different ages, different sexes and different language speeds, recognition sentences of different language types and other audios.

1.2, Storing first audio data which is required to be converted by using a voice test set automatic conversion device into audio in 16000Hz, single-channel and wav formats according to the format requirement of a program, and writing a first data table which is required to be read by the program according to the requirement.

In addition, scene configuration information can be written in the first data table, so that a first audio data set is obtained.

Exemplary, as shown in tables 1 and 2 below, are headers of a first audio data set of two different audio types, wherein specific data may be set according to actual requirements:

Table 1:

Audio data storage address

Speaker numbering

Sex of speaker

Age group of speaker

Environment (environment)

Position of

Speech speed

Table 2:

Audio data storage address	Sex of speaker	Text content
			*--**	Female woman	Third one
--	Male men	Closing vehicle windows

And 1.3, arranging a voice test set automatic conversion device on the real vehicle, and connecting playing equipment such as a sound card, a sound box and the like to play the first audio data set. The voice test set automatic conversion device may also be coupled to the vehicle to obtain the log information and the second audio data set from the vehicle.

According to the required position, high-fidelity sound equipment can be assembled at different seat positions, and when the first audio data set is played, the volume is controlled to be 75DB to 80DB so as to simulate the volume closest to the speech of the human voice in the vehicle, so that the high frequency and the low frequency of the human voice are restored to the greatest extent, and the problems of distortion or non-conforming to the real sound are avoided.

2. Automated switching operation:

2.1, at the PC end deployed with the automatic conversion device of the voice test set, the high-fidelity sound of playing the audio is switched through the sound card or directly connected according to the requirement, the first audio data set is played by calling the sound card through different card numbers, and the sound playing is directly called, so that the distinction on the playing audio is realized.

As shown in fig. 3a and 3b, a sound box may be placed in the vehicle 30, and a connection between the PC 301 with the voice test set automation conversion device disposed thereon and the vehicle 302 may be established, and the PC 301 may be connected to the vehicle 302 through a USB (Universal Serial Bus ).

The PC 301 may be further connected to the first sound device 304 through the sound card 303 or directly connected to the second sound device 305, and the vehicle 302 may be provided with a microphone 306 for collecting the sound output by the first sound device 304 and/or the second sound device 305, so as to obtain the second audio data.

And 2.2, connecting the PC end deployed with the voice test set automatic conversion device to the vehicle machine end of the vehicle, starting to run a program, confirming the sound card number of the sound card, and calling the playing device to play the first audio data in the first audio data set by using the corresponding sound card number.

2.3, The voice test set automatic conversion device pre-operates the vehicle terminal, if the root authority is obtained, the logs and the audio in the target storage space are cleared, the voice system of the vehicle terminal is automatically restarted, the test and the audio of the dubbing can be normally performed, the audio of the landing disc has no redundant interference, and the collected data are all ensured to be tested.

And 2.4, the voice test set automatic conversion device reads the first data table, acquires the content and the path of the first audio data set to be played, reads the second data table and records the position of the second data table. After the data is written in the second data table, the position of the existing data is memorized so as to avoid the data which is written later from covering the existing data.

And 2.5, the automatic conversion device of the voice test set sequentially plays the prepared audio data according to the first data table, and records the time before playing the audio data, for example, the time before playing the 'present' word is acquired if the weather is true today.

And 2.6, after each sentence of audio is played, acquiring real-time log information, and grabbing required target recognition results including but not limited to wake-up words, wake-up voice areas or sentence recognition results and the like.

For example, as shown in the following table 3, a plurality of sentence recognition results may be obtained, together with an audio data storage address and a speaker sex of the first audio data corresponding to each sentence recognition result.

Table 3:

Audio data storage address	Sex of speaker	Text content
			*--**	Female woman	Reducing air quantity of air conditioner
--	Male men	Reading lamp turned on

And 2.7, writing the key information into the second data table in real time according to the captured key information.

For example, as shown in the following table 4, the header of the target recognition result corresponding to the first audio data of the wake-up class, where specific data may be set according to actual requirements, and as shown in the table 5, the header of the target recognition result corresponding to the statement of the non-wake-up word, where specific data may be set according to actual requirements:

table 4:

table 5:

And 2.8, after the testing and the transcription of an audio data set are completed, the current log information is automatically derived, and Sse (SPEECH SIGNAL ENHANCEMENT ) -input name with a time stamp is converted, and the multi-channel long original audio is formatted as PCM (Pulse Code Modulation ).

For example, the second audio data may be named as "03-20_14-04-03_977_ -Sse-input. Pcm", and the corresponding log information may be named as "SS3_android-00000700001890000-2024_03_20_14_03_22+0800.zi p", which is not limited in the embodiment of the present invention.

And 2.9, deleting the history log and the audio of the target storage space again, automatically restarting the voice system, preparing to read a new first data table, and testing and converting the new audio.

And 2.10, circularly executing the above flow until all audio data of the placement position of the current real vehicle are converted into multichannel original audio Sse-input.

3. Automated switching operation:

And 3.1, after the part of the real vehicle is finished, processing the audio at the PC end deployed with the automatic conversion device of the voice test set. And reading a second data table of the real vehicle test and storing the multichannel original audio Sse-input. Pcm.

3.2, Configuring scene configuration information of each audio, and ensuring that the output data is multichannel original audio Sse-input. Pcm matched with each recorded scene and type after operation processing, for example:

wavepath = 'Cartest/W01/Mandarin_ kws/Quiet_1/' Path

Type_str='Mandarin_kws'

Background_srt='Quiet'

Folder_path=r "D: \test set"

Xlsx _name=r "D: \test set\OUT_KWS_Quiet. Xlsx"

Wherein wavepath denotes an audio data storage address, type_str denotes a test Type, background_ srt denotes a test scenario, folder_path denotes a test table folder address, and xlsx _name denotes a test table (i.e., second data table) file name.

And 3.3, starting to execute time calculation on the multichannel original audio Sse-input.pcm, and calculating the time of each sentence in the long original audio through the time recorded in the real vehicle. For example, the first sentence of the classmate appears in the 10 th to 12 th S of the long audio, and the second sentence of the classmate appears in the 23 rd to 25 th S of the long audio.

And 3.4, finishing the information matching of the step 3.2 and the time calculation of the step 3.3, and generating a simulation test set, wherein the table head of the simulation test set is shown in table 6, and the specific content of the table head is set according to the actual situation.

Table 6:

The finally generated simulation test set may include a path for simulating the long original audio uploading cloud after conversion, a short audio name for transcription, a time for starting a target word and sentence, a time for ending the target word and sentence, a duration, a test type to which the target word and sentence belong, a test scene, a seat position in a vehicle, and the like, which is not limited in the embodiment of the invention.

And 3.5, finally, the automatic conversion device of the voice test set can directly read the converted original audio Sse-input.pcm stored in the cloud, and the generated simulation test set is used for performing simulation tests of different scenes of different vehicle types.

And 3.6, after the simulation test of the first edition is completed, comparing the simulation test result with the test result of the real vehicle in a real recording environment, so as to verify the simulation test set, verify the quality of the simulation test set, ensure the conversion process from the real vehicle test to the simulation test, and ensure that the test error is in a certain reliable range.

As shown in fig. 4a and 4 b:

The method can be used for preparing data and setting up scenes, and then, the PC side can be used for carrying out real vehicle testing and collecting data by playing audio data in the vehicle.

Then, the voice test set automatic conversion device deployed in the PC end can perform real vehicle test based on the first audio data set and output a real vehicle test result, wherein the real vehicle test result can comprise log information such as wake-up rate and the like.

In addition, the voice test set automatic conversion device can also label the audio, and a simulation test set is generated based on the labeling result and the result of the real vehicle test.

After the simulation test set is obtained, the simulation test set can be uploaded to the cloud server, then the simulation test set can be used for performing simulation test, and the first simulation test result of the simulation test and the result of the real vehicle test (namely the result of the identification in the simulation test set) are compared to determine the quality of the simulation test set.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 5, a schematic structural diagram of a simulation test set generating device according to an embodiment of the present invention is shown and applied to a voice test set automatic conversion device. The generating device of the simulation test set can comprise the following modules:

a playing module 501, configured to play a preset first audio data set in a vehicle in a preset vehicle scene by using the voice test set automatic conversion device;

a collection module 502 for acquiring, from the vehicle, a second audio data set collected for the first audio data set, and log information generated by the vehicle during playback of the first audio data set;

An extracting module 503, configured to obtain, from the log information, a target recognition result of the vehicle for the second audio data set;

the generating module 504 is configured to label the second audio data set according to the target recognition result, and generate a simulation test set according to the labeled result and the second audio data set.

In an alternative embodiment of the present invention, the apparatus further comprises:

the scene information acquisition module is used for acquiring preset first audio data and scene configuration information set for each first audio data; storing scene configuration information and first audio data into a preset first data table to obtain a first audio data set;

The playing module 501 is configured to construct a preset vehicle scene in the vehicle according to the scene configuration information in the first data table, and play the first audio data in the first audio data set in the vehicle.

In an optional embodiment of the invention, the scene information obtaining module is configured to store the first audio data and the scene configuration information into a preset first data table according to a preset format requirement.

In an alternative embodiment of the present invention, the scene configuration information includes a vehicle control parameter and a playing control parameter, and the playing module 501 is configured to control the vehicle according to the vehicle control parameter to construct a preset vehicle scene in the vehicle, and play the first audio data in the first audio data set according to the playing control parameter in the vehicle.

In an alternative embodiment of the present invention, the generating module 504 is configured to determine, from the first data table, scene configuration information corresponding to the first audio data corresponding to each second audio data in the second audio data set, and generate the simulation test set according to the labeling result, the scene configuration information corresponding to each second audio data in the second audio data set, and the second audio data set.

In an alternative embodiment of the present invention, the generating module 504 is configured to determine, according to the target recognition result, a time point of the target word and sentence in each second audio data in the second audio data set, and take the target word and sentence in each second audio data and the time point of the target word and sentence as labeling results, and generate the simulation test set according to the labeling results and the second audio data set.

the system comprises a cleaning module, a target storage space and a storage module, wherein the cleaning module is used for cleaning the target storage space, and the target storage space is used for storing log information and/or a second audio data set.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing describes in detail the principles and embodiments of the present invention and is directed to a method for generating a simulation test set, a device for generating a simulation test set, an electronic apparatus and a computer readable storage medium, wherein the foregoing examples are provided to facilitate understanding of the method and core ideas of the present invention, and meanwhile, the present invention is not to be construed as being limited to the specific embodiments and application scope of the present invention, since the ideas will vary to those skilled in the art.

Claims

1. A method for generating a simulation test set, which is applied to a voice test set automatic conversion device, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

3. The method of claim 2, wherein storing the scene configuration information and the first audio data in a predetermined first data table comprises:

4. The method of claim 2, wherein the scene configuration information includes vehicle control parameters and play control parameters, wherein the constructing the preset vehicle scene in the vehicle and playing the first audio data in the first audio data set in the vehicle according to the scene configuration information in the first data table includes:

5. The method of claim 2, wherein generating the simulated test set from the annotated result and the second audio data set comprises:

6. The method of claim 1, wherein labeling the second audio data set according to the target recognition result, and generating a simulation test set according to the labeled result and the second audio data set, comprises:

7. The method according to claim 1, wherein the method further comprises:

8. The method according to claim 1, wherein the method further comprises:

9. A device for generating a simulation test set, the device comprising:

10. An electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor implements a method of generating a simulation test set according to any one of claims 1 to 8.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements a method of generating a simulation test set according to any of claims 1 to 8.