CN115662470B

CN115662470B - Audio particle extraction method, sound wave synthesis device, equipment and medium

Info

Publication number: CN115662470B
Application number: CN202211590380.XA
Authority: CN
Inventors: 朱志鹏; 夏立勋; 胡明清; 赵力; 马峰
Original assignee: Iflytek Suzhou Technology Co Ltd
Current assignee: Iflytek Suzhou Technology Co Ltd
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-05-26
Anticipated expiration: 2042-12-12
Also published as: CN115662470A

Abstract

The application discloses an audio particle extraction method, a sound wave synthesis device, equipment and a medium, wherein the audio particle extraction method comprises the following steps: acquiring recording data of a target engine in a plurality of running states; respectively selecting various running states as target states, selecting recording data in the target states as target data, and determining the main order frequency of each audio frame in the target data; for each audio frame in the target data, extracting an audio fragment from the target data based on the central moment determined by the audio frame, and obtaining candidate audio particles of the target frequency in the target state based on two sampling points which are separated by a preset numerical value sampling period and are zero crossing in the audio fragment; and for each main order frequency in the target state, respectively carrying out cross-correlation on each candidate audio particle based on the main order frequency and the reference audio particle, and determining the target audio particle of the main order frequency in the target state. According to the scheme, the quality of the audio particles can be improved, and then the effect of synthesizing the sound waves is improved.

Description

Audio particle extraction method, sound wave synthesis device, equipment and medium

Technical Field

The present disclosure relates to the field of signal analysis and synthesis technologies, and in particular, to an audio particle extraction method, a sound wave synthesis method, a device, equipment, and a medium.

Background

With the vigorous development of the current automobile industry and the continuous support of domestic policies on the new energy direction, although the automobile market is getting great by replacing the traditional fuel engine with pure electric power, an engine cylinder and an exhaust physical system are lacked, and a driver lacks perceivable signal feedback following a pedal when driving in an electric automobile, so that the immersive driving experience of the driver is greatly influenced. Therefore, engine sound wave simulation in an electric automobile becomes a hot research direction for researchers in the related field.

At present, the voice synthesis is generally carried out by adopting modes such as statistical parameters, waveform splicing and the like. However, the former performs approximate processing through modeling, the naturalness of synthesized audio is difficult to guarantee, and the latter comprises two technical routes of wave surface synthesis and particle synthesis, wherein the wave surface synthesis is not natural enough due to the fact that phases among audio fragments cannot be aligned accurately, and the synthesis quality of the particle synthesis is limited to the quality of audio particles to a great extent. In view of this, how to improve the quality of audio particles and further improve the effect of synthesizing acoustic waves is a problem to be solved.

Disclosure of Invention

The technical problem that this application mainly solves is to provide an audio particle extraction method and a sound wave synthesis method and device, equipment, medium, can improve the quality of audio particle, and then improves the effect of sound wave synthesis.

To solve the above technical problem, a first aspect of the present application provides an audio particle extraction method, including: acquiring recording data of a target engine in a plurality of running states; wherein the plurality of operating conditions include: at least one of an acceleration state, a deceleration state, and an idle state; respectively selecting various running states as target states, selecting recording data in the target states as target data, and determining the main order frequency of each audio frame in the target data; for each audio frame in the target data, extracting an audio fragment from the target data based on the central moment determined by the audio frame, and obtaining candidate audio particles of the target frequency in the target state based on two sampling points which are separated by a preset numerical value sampling period and are zero crossing in the audio fragment; the target frequency is the primary order frequency of the audio frame, and the preset value is a first multiple of the number of cylinders of the target engine; for each primary order frequency in the target state, respectively carrying out cross-correlation on each candidate audio particle based on the primary order frequency and the reference audio particle, and determining the target audio particle of the primary order frequency in the target state; wherein the reference audio particles are target audio particles that have been determined before the primary order frequency of the target audio particles to be determined currently.

In order to solve the above technical problem, a second aspect of the present application provides a method for synthesizing acoustic waves, including: acquiring target audio particles of each primary order frequency in a plurality of running states, and continuously acquiring driving information; the driving information at least comprises driving speed, accelerator pedal depth and driving state, and the target audio particles are determined based on the audio particle extraction method in the first aspect; predicting the driving speed and the depth of an accelerator pedal in the driving information based on the rotating speed model to obtain a predicted rotating speed value of the virtual engine; based on the predicted rotation speed value, the maximum rotation speed value and the minimum rotation speed value set for the virtual engine and the lowest main order frequency of the target engine in an idle state, predicting the main order frequency reached by the virtual engine in the predicted rotation speed value; selecting target audio particles corresponding to the predicted main order frequency under the running state consistent with the running state, and transmitting the target audio particles into a cache space; and synthesizing to obtain the sound wave data based on the target audio particles in the buffer space.

To solve the above technical problem, a third aspect of the present application provides an audio particle extraction apparatus, including: the device comprises a recording module, a selecting module, a determining module, a candidate module and an extracting module. The recording module is used for acquiring recording data of the target engine in a plurality of running states; the several operating states include: at least one of an acceleration state, a deceleration state, and an idle state; the selection module is used for respectively selecting various running states as target states and selecting recording data in the target states as target data; the determining module is used for determining the main order frequency of each audio frame in the target data; the candidate module is used for extracting an audio fragment from the target data based on the central moment determined by the audio frame for each audio frame in the target data, and obtaining candidate audio particles of the target frequency in the target state based on two sampling points which are separated by a preset numerical value in the audio fragment and are zero-crossing; the target frequency is the primary order frequency of the audio frame, and the preset value is a first multiple of the number of cylinders of the target engine; the extraction module is used for carrying out cross-correlation on each candidate audio particle based on the main order frequency and the reference audio particle for each main order frequency in the target state, and determining the target audio particle of the main order frequency in the target state; wherein the reference audio particles are target audio particles that have been determined before the primary order frequency of the target audio particles to be determined currently.

In order to solve the above technical problem, a fourth aspect of the present application provides a device for synthesizing acoustic waves, including: the device comprises an acquisition module, a rotation speed prediction module, a frequency prediction module, a cache module and a synthesis module. The acquisition module is used for acquiring target audio particles of each primary order frequency in a plurality of running states and continuously acquiring driving information; the driving information at least comprises driving speed, accelerator pedal depth and driving state, and the target audio particles are determined based on the audio particle extraction method in the first aspect; the rotating speed prediction module is used for predicting the driving speed and the depth of the accelerator pedal in the driving information based on the rotating speed model to obtain a predicted rotating speed value of the virtual engine; the frequency prediction module is used for predicting the main order frequency reached by the virtual engine at the predicted rotating speed value based on the predicted rotating speed value, the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the rotating speed model and the lowest main order frequency of the target engine at the idle speed state; the buffer memory module is used for selecting the target audio particles corresponding to the predicted main order frequency under the running state consistent with the running state, and transmitting the target audio particles into the buffer memory space; the synthesizing module is used for synthesizing the sound wave data based on the target audio particles in the buffer space.

In order to solve the above technical problem, a fifth aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the memory stores program instructions, and the processor is configured to execute the program instructions to implement the audio particle extraction method of the first aspect or the acoustic wave synthesis method of the second aspect.

In order to solve the above technical problem, a sixth aspect of the present application provides a computer readable storage medium storing program instructions executable by a processor, where the program instructions are configured to implement the audio particle extraction method of the first aspect or the acoustic wave synthesis method of the second aspect.

According to the scheme, recording data of the target engine in a plurality of running states are obtained; wherein the plurality of operating conditions include: at least one of an acceleration state, a deceleration state, and an idle state; respectively selecting various running states as target states, selecting recording data in the target states as target data, and determining the main order frequency of each audio frame in the target data; for each audio frame in the target data, extracting an audio fragment from the target data based on the central moment determined by the audio frame, and obtaining candidate audio particles of the target frequency in the target state based on two sampling points which are separated by a preset numerical value sampling period and are zero crossing in the audio fragment; the target frequency is the primary order frequency of the audio frame, and the preset value is a first multiple of the number of cylinders of the target engine; for each primary order frequency in the target state, respectively carrying out cross-correlation on each candidate audio particle based on the primary order frequency and the reference audio particle, and determining the target audio particle of the primary order frequency in the target state; the method comprises the steps that a reference audio particle is a target audio particle which is determined before the main order frequency of a target audio particle to be determined currently, on one hand, recording data of a target engine in a plurality of running states are obtained, the recording data in the target state are selected as target data, the main order frequency of each audio frame in the target data is determined, and the accuracy of the main order frequency of each audio frame in the target data in each running state is improved; on the other hand, based on the central moment determined by the audio frame, the audio fragment is extracted from the target data, and based on two sampling points which are separated by a preset number of sampling periods and are zero crossing in the audio fragment, the candidate audio particles of the target frequency in the target state are obtained, the efficiency of obtaining the candidate audio particles of the target frequency in the target state is improved, on the basis, the candidate audio particles of the main order frequency are respectively cross-correlated with the reference audio particles, the target audio particles of the main order frequency in the target state are determined, and further the audio particles of the complete period can be extracted, and the quality of the extracted audio particles is ensured. Therefore, the quality of the audio particles can be improved, and the effect of synthesizing the sound waves can be further improved.

Drawings

FIG. 1 is a flow chart of an embodiment of an audio particle extraction method of the present application;

FIG. 2 is a flow chart of another embodiment of an audio particle extraction method of the present application;

FIG. 3 is a schematic flow chart of an embodiment of a method of synthesizing acoustic waves according to the present application;

FIG. 4 is a schematic diagram of a framework of one embodiment in a bench test or off-line simulation scenario;

FIG. 5 is a schematic diagram of a framework of another embodiment in a bench test or offline simulation scenario;

FIG. 6 is a schematic diagram of a frame of an embodiment in a real-vehicle driving scenario;

FIG. 7 is a schematic diagram of an embodiment of a method of synthesizing acoustic waves according to the present application;

FIG. 8 is a schematic diagram of a frame of an embodiment of an audio particle extraction apparatus of the present application;

FIG. 9 is a schematic diagram of a frame of an embodiment of a wave synthesizing apparatus of the present application;

FIG. 10 is a schematic diagram of a framework of an embodiment of the electronic device of the present application;

FIG. 11 is a schematic diagram of a framework of one embodiment of the computer-readable storage medium of the present application.

Detailed Description

The following describes the embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C. "several" means at least one. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Referring to fig. 1, fig. 1 is a flow chart illustrating an embodiment of an audio particle extraction method of the present application. Specifically, the method may include the steps of:

step S11: recording data of the target engine in a plurality of running states is obtained.

In an embodiment of the present disclosure, the several operating states include: at least one of an acceleration state, a deceleration state, and an idle state.

In one implementation scenario, the target engine may be an engine of a fuel-fired vehicle, and in order for the recorded data to be more efficient, it is desirable to cover as much of the noise characteristics of the target engine in several operating conditions as possible over an effectively long period of time. Therefore, the recording data which are uniformly accelerated or decelerated in the fixed gear can be obtained, the acceleration process should be as gentle as possible, so that the recording data can meet short-time steady state approximation, for example, the acceleration or deceleration process can be more gentle by controlling the acceleration or deceleration time length, the acceleration or deceleration time length can be longer than the preset time length, the preset time length can be 25s, 30s, 35s and the like, and the method is not particularly limited. In addition, the recording position can be selected from the engine, of course, the recording can be performed based on a frame with a fixed transfer function or a certain position in the vehicle, and the recording process should ensure that the signal-to-noise ratio of the engine noise compared with wind noise, tire noise and the like can reach more than a preset threshold value, and the preset threshold value can be 25dB, 30dB, 35dB and the like, which is not particularly limited. Further, the recording mode may be selected from mono pickup, or may be selected from multichannel pickup, and the recording mode may be selected according to actual conditions, which is not limited herein.

Step S12: and respectively selecting various running states as target states, selecting recording data in the target states as target data, and determining the main order frequency of each audio frame in the target data.

In one implementation, the target state is selected based on a number of operating states and the target data is determined based on the target state. For example, the acceleration state may be selected as the target state, and the recording data in the acceleration state may be selected as the target data. The other operating states are not described in detail herein.

In one implementation scenario, the manner of determining the primary order frequency of each audio frame in the target data may be selected in actual situations, and when the target engine does not acquire the power system state data time-synchronized with the recording data, the primary order frequency of each audio frame in the target data may be determined based on processing the recording data; when the target engine collects power system state data time-synchronized with the recording data, the primary order frequency of each audio frame in the target data can be determined based on the rotational speed data in the power system state data. It should be noted that, the power system state data time-synchronized with the recording data may be obtained through a CAN signal broadcasting mode or an active transceiving mode during the driving process.

In one implementation scenario, when the target engine does not collect power system state data that is time synchronized with the recorded data, the firing frequency of the target transmitter engine may be calculated based on the recorded data. Specifically, the recording data includes a plurality of audio frames, on the basis of which each audio frame can be processed by algorithms including, but not limited to, a PYIN algorithm of time-domain autocorrelation, a SWIPE algorithm of frequency domain, a YAAPT algorithm of time-frequency domain synthesis, and the like, and finally, the main order frequency of each audio frame in the target data is determined. It should be noted that, when there is no time-synchronized power system state data, the determination manner of the primary order frequency may refer to technical details such as the PYIN algorithm, the SWIPE algorithm, the YAAPT algorithm, and the like, which are not described herein.

In one implementation scenario, from rotational speed data in the power system state data, extracting any rotational speed value that is time-synchronized with the audio frame as a target rotational speed value; and determining the main order frequency of the audio frame based on the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the target rotating speed value and the rotating speed model and the lowest main order frequency of the target engine in the idle state. The following related descriptions in the embodiments of the present disclosure may be referred to for details, which are not described herein.

In another implementation scenario, in order to improve accuracy of the primary order frequency of the audio frame, at least one rotation speed value synchronized with the audio frame time may be extracted from rotation speed data in the power system state data; and carrying out numerical statistics based on at least one rotating speed value to determine a target rotating speed value of the target engine when the audio frame is acquired. Specifically, if only one rotation speed value is extracted, determining the rotation speed value as a target rotation speed value of a target engine when the audio frame is acquired; if a plurality of rotation speed values are extracted, numerical statistics may be performed on the plurality of rotation speed values, for example, taking an average value, taking a mode, taking a variance, and the like, and the numerical statistics result is used as a target rotation speed value of a target engine when the audio frame is acquired. For example, if a plurality of rotational speed values are acquired, averaging the plurality of rotational speed values, determining the average as a target rotational speed value of the target engine at the time of acquiring the audio frame; or, if a plurality of rotation speed values are collected, the rotation speed values may be weighted, and the weight of the rotation speed values may be determined according to the actual situation, which is not limited herein. In addition, the numerical statistics may be processed according to actual situations, for example, rounding up, rounding down, and the like. And determining the main order frequency of the audio frame based on the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the target rotating speed value and the rotating speed model and the lowest main order frequency of the target engine in the idle state. According to the method, at least one rotating speed value which is time-synchronous with the audio frame is extracted, numerical statistics is carried out on the at least one rotating speed value, the target rotating speed value of the target engine is further determined, accuracy of the target rotating speed value is improved, and the main order frequency of the audio frame is determined based on the maximum rotating speed value and the minimum rotating speed value which are set for the virtual engine by the target rotating speed value and the rotating speed model and the lowest main order frequency of the target engine in an idle state, so that accuracy of the main order frequency is improved.

Further, the rotational speed model may be a mathematical model, and the rotational speed model sets a maximum rotational speed value, a minimum rotational speed value and a target rotational speed value for the virtual engine, and it is understood that the maximum rotational speed value and the minimum rotational speed value are fixed preset parameter values, and of course, the maximum rotational speed value and the minimum rotational speed value may also be updated according to actual situations. In addition, the lowest main order frequency of the target engine in the idle state is obtained, and the lowest main order frequency in the idle state can be selected through the main order frequency of each audio frame in the target data in the idle state, namely, the lowest main order frequency in the idle state is selected. As a possible implementation, when the target state is the acceleration state, the primary order frequency of the first audio frame in the target data may be selected as the lowest primary order frequency. Unlike the foregoing embodiment, when the target state is the deceleration state, the main order frequency of the last audio frame in the target data may be selected as the lowest main order frequency. After obtaining a maximum rotating speed value, a minimum rotating speed value and a target rotating speed value which are set by the virtual engine and the lowest main order frequency of the target engine in an idle state, obtaining a first difference value between the target rotating speed value and the minimum rotating speed value, and obtaining a second difference value between the maximum rotating speed value and the minimum rotating speed value; and taking the sum of the ratio between the first difference and the second difference and the lowest main order frequency in the idle state as the main order frequency of the audio frame. Can be expressed by the expression:

Wherein F is ₀ [n]For the primary order frequency of the nth audio frame, F _0|idle RPN [ n ] being the lowest primary order frequency in idle state]As the value of the target rotational speed,RPN _min a minimum rotational speed value set for the virtual engine,RPN _max a maximum rotational speed value set for the virtual engine. According to the mode, the maximum rotating speed value, the minimum rotating speed value and the target rotating speed value which are set for the virtual engine through the rotating speed model are beneficial to improving the accuracy of the primary order frequency of the audio frame, and further improving the quality of audio particles.

In one implementation, for each audio frame in the target data, a check result of the recording data with respect to validity is determined. The checking of validity may include that the recording data sampling rate should be higher than a preset threshold, which may be set to 8kHz, 9kHz, etc.; recording data can also be free of intermittent, frame loss, amplitude interception and distortion problems; the method can also comprise the step that the recording data should be integrally expressed as one of three trends of uniform speed, uniform acceleration and uniform deceleration; or, the recorded data should not have the trend transient of gear shifting, clutch, sudden acceleration and deceleration, etc. on the whole. Through determining the checking result of the validity of the recording data, if the validity of a certain audio frame in the recording data does not meet the condition, rejecting the audio frame; and if the validity of the audio frames in the recording data does not meet the conditions, re-executing the step of acquiring the recording data of the target engine in the target running state and the subsequent steps. According to the method, the audio data with obvious flaws or the audio data which can influence the audio particles in the audio data are removed or re-recorded through the validity check of the audio data, so that the quality of the audio data is improved, and the quality of the audio particles is further improved.

In one implementation, for each audio frame in the target data, the inspection of the recorded data for monotonicity may be analyzed based on rotational speed data in the power system state data in response to the target engine acquiring power system state data that is time synchronized with the recorded data. As a possible implementation manner, since there is a positive correlation between the rotational speed data in the power system status data and the primary order frequencies of the audio frames, the inspection result of the recording data about monotonicity can be determined by the trend of the change between the primary order frequencies of the audio frames in the target data. Different from the foregoing embodiment, the target rotation speed value of the target engine when the audio frame is acquired may be obtained by statistics based on the rotation speed value in the rotation speed data and time-synchronized with the audio frame, and the method for obtaining the target rotation speed value may refer to the method in the foregoing disclosed embodiment, which is not described herein again. And determining the check result of the recording data about monotonicity based on the quantity ratio of the audio frames in the recording data, wherein the change trend of the target rotating speed value accords with the target state. In addition, if the change trend of the target rotating speed value of a certain audio frame in the recording data does not accord with the quantity ratio of the audio frame in the recording data in the target state, the video frame is removed. And if the change trend of the target rotating speed value of the audio frame in the recording data does not accord with the quantity ratio of the audio frame in the target state in the recording data, re-executing the step of acquiring the recording data of the target engine in the target running state and the subsequent steps. According to the method, whether the change trend of the target rotating speed value accords with the quantity ratio of the audio frames in the target state in the recording data or not is determined, and the check result of the recording data about monotonicity is determined, so that the check result of the monotonicity of the acceleration state, the deceleration state and the idle state can be obtained rapidly, the speed of extracting the audio particles is improved, and meanwhile, the quality of the audio particles is improved.

In another implementation scenario, in response to not acquiring power system state data for the target engine that is time synchronized with the recording data, first analyzing whether each audio frame in the recording data satisfies a signal-to-noise ratio condition, where the signal-to-noise ratio condition includes: the first signal-to-noise ratio corresponding to the primary order frequency is greater than a first threshold, and the second signal-to-noise ratio corresponding to the harmonic order frequency outside the primary order frequency is greater than a second threshold. Specifically, the fractional octave filtering group (Fractional Octave Filter Bank) is adopted to process to obtain component components corresponding to each main order frequency and harmonic order frequency outside the main order frequency, and then the signal-to-noise ratio of each audio frame is obtained by calculating based on the component components of each main order frequency and the harmonic order frequency outside the main order frequency. For example, a fractional filter with a bandwidth of 1/6 octave and a center frequency of a main order frequency of the audio frame may be used for processing each audio frame, so as to determine a component of the main order frequency of the audio frame, and further determine a first signal-to-noise ratio corresponding to the main order frequency, which may be specifically expressed as:

wherein,,SNR _fund （F ₀ ) E represents energy statistics, F for a first signal-to-noise ratio corresponding to the primary order frequency ₀ S [ n ] being the primary order frequency of the nth audio frame]Representation ofAn audio signal of an nth audio frame,OctaveFiltrepresenting a bandwidth of 1/6 octave (i.e., 1/6 oct), a center frequency of the audio frame primary order frequency F ₀ Is used for the filtering of the digital signal,E _total is the total energy of the nth audio frame. After the first signal-to-noise ratio is obtained, the first signal-to-noise ratio needs to be compared with a first threshold, which may be set to 15dB, 20dB, 25dB, etc. Further, the component components of the harmonic order frequency outside the main order frequency can be obtained based on the component components of the main order frequency of the audio frame, so that the second signal-to-noise ratio corresponding to the harmonic order frequency outside the main order frequency can be obtained, which can be specifically expressed as:

wherein,,SNR _harmonic （F ₀ ) For the second signal-to-noise ratio corresponding to the harmonic order frequency outside the primary order frequency, E represents the energy statistics, F ₀ For the primary order frequency of the nth audio frame,OctaveFiltrepresenting a bandwidth of 1/6 octave (i.e., 1/6 oct), a center frequency of the audio frame primary order frequency F ₀ Is used for the filtering of the digital signal,E _total for the total energy of the nth audio frame, the molecular part represents the sum statistics of the energy of each harmonic order obtained by processing the audio frame by adopting a fractional filter with the bandwidth of 1/6 octave and the center frequency of the harmonic order frequency of the audio frame. After the second signal-to-noise ratio is acquired, the second signal-to-noise ratio needs to be compared with a second threshold, which may be set to 15dB, 20dB, 25dB, etc. Based on the number of the audio frames meeting the signal-to-noise ratio condition in the recording data, determining the checking result of the recording data about the order energy, specifically, the number of the audio frames meeting the signal-to-noise ratio condition in the recording data in the number of the audio frames is larger than a third threshold (e.g. 95%, 97%, 99%, etc.), it can be determined that the recording data meets the order checking, and of course, the audio frames not meeting the signal-to-noise ratio condition can also be removed. Otherwise, it may be determined that the recorded data does not satisfy the order check. In addition, when the record data is determined not to be satisfied In the case of the order check, the step of acquiring the recording data of the target engine in the target operating state and the subsequent steps may be re-executed. According to the mode, the detection result of the recording data about the order energy is determined based on the number ratio of the audio frames meeting the signal-to-noise ratio condition in the recording data, so that the rate of extracting the audio particles is improved, and the quality of the audio particles is further improved.

Further, in the case that the number of audio frames satisfying the signal-to-noise ratio condition in the recording data is larger than the third threshold value, the check result of the recording data about monotonicity may be analyzed based on the primary order frequency of each audio frame in the target data. Specifically, the monotonicity of the primary order frequency variation can be determined based on the obtained primary order frequency estimation curve of the recording data, that is, the audio frames are arranged based on time sequence, the primary order frequency curve of the recording data is obtained after the arrangement, and the checking result of the recording data about the monotonicity is obtained based on the curve variation. Specifically, if the target state is the acceleration state, the time-dependent change of the primary frequency of the audio data needs to satisfy that the primary frequency of each audio frame is greater than or equal to the primary frequency of the previous audio frame. Specifically, the method can be expressed as:

Wherein F is ₀ [n]For the primary order frequency corresponding to the nth audio frame,flag（F ₀ [n]) And a flag bit for indicating whether the nth video frame is a monotone effective frame, wherein N is the total frame length, and gamma represents the proportion of the monotone effective frame to the total recording data. That is, the difference between the primary order frequency corresponding to the nth audio frame and the primary order frequency corresponding to the n-1 audio frame should be greater than or equal to 0, if the condition is satisfied, the flag bit is 1, the number of the audio frames in the recording data based on the change trend of the target rotation speed value conforming to the target state is occupied, and if the occupied ratio is greater than the preset threshold, it can be determined that the recording data satisfies the requirement of monotonicity. The preset threshold value may be set to 93%, 95%, 98%, etc., and is not particularly limited herein. Further, the method comprises the steps of,if the target state is a deceleration state, the main order frequency of the audio data is required to be changed along with time to meet that the main order frequency of each audio frame is smaller than or equal to the main order frequency of the previous audio frame. Specifically, the method can be expressed as:

wherein F0[ n ]]For the primary order frequency corresponding to the nth audio frame,flag（F ₀ [n]) And a flag bit for indicating whether the nth video frame is a monotone effective frame, wherein N is the total frame length, and gamma represents the proportion of the monotone effective frame to the total recording data. That is, the difference between the primary order frequency corresponding to the nth audio frame and the primary order frequency corresponding to the n-1 audio frame should be less than or equal to 0, if the condition is satisfied, the flag bit is 1, the number of the audio frames in the recording data based on the change trend of the target rotation speed value conforming to the target state is occupied, and if the occupied ratio is greater than the preset threshold, it can be determined that the recording data satisfies the requirement of monotonicity. The preset threshold value may be set to 93%, 95%, 98%, etc., and is not particularly limited herein. If the target state is an idle state, no trend transient phenomena such as gear shifting, clutch, rapid acceleration and deceleration and the like in the target data need to be determined. In addition, if a certain audio frame in the audio data does not meet the monotonicity, the audio frame which does not meet the monotonicity can be removed, and if the recording data does not meet the monotonicity requirement, the step of acquiring the recording data of the target engine in the target running state and the subsequent steps can be re-executed. According to the mode, the inspection result of the recording data about monotonicity is analyzed based on the primary order frequency of each audio frame in the target data, so that the speed of extracting the audio particles is improved, and the quality of the audio particles is further improved.

Step S13: and extracting an audio fragment from the target data based on the central moment determined by the audio frame for each audio frame in the target data, and obtaining candidate audio particles of the target frequency in the target state based on two sampling points which are separated by a preset number of sampling periods and are zero-crossing in the audio fragment.

In the embodiment of the disclosure, all audio frames in the target data need to be processed, and finally candidate audio particles of the target frequency in the target state are obtained. The target frequency is the primary order frequency of the audio frame, which is the primary order frequency of the currently processed audio frame. In addition, the preset value is a first multiple of the number of target engine cylinders, the number of target engine cylinders may be determined according to the setting of the target engine in practical application, and the first multiple may be any value greater than 1, for example, the first multiple may be 2, 3, 4, etc., which is not limited specifically herein.

In one implementation, the center instant is located at the center of the audio frame. The duration of the audio clip is a second multiple of the period corresponding to the primary order frequency of the audio frame, and the second multiple may be any value greater than 1, for example, the second multiple may be 2, 3, 4, etc., which is not specifically limited herein.

In one implementation scenario, the center time is the center time of each audio frame, and the audio clip is extracted from the target data, specifically, the duration of the audio clip is a second multiple of the period corresponding to the primary order frequency of the audio frame, and the period is the inverse of the primary order frequency. Illustratively, when the second multiple is 3, it may be expressed as:

wherein F is ₀ As the primary order frequency of the frequency band,X _F0 [k]for an audio segment, k is the sampling point. Further, a band-pass filter with the primary order frequency of the audio frame as the center frequency is used to filter the audio clip. Specifically, the audio clip is bandpass filtered with a filter using the primary order frequency as the center frequency, and the filter may use, but is not limited to, a butterworth filter, a bessel filter, an elliptic filter, or the like. And further, under the condition of ensuring the stability of the filter, the higher-order filter coefficient is optimized as much as possible so as to ensure smaller phase response distortion. Illustratively, the audio segments are bandpass filtered using a butterworth filter, which may be expressed in particular as:

wherein y is _F0 [k]Representing the modulated signal, F ₀ As the primary order frequency of the frequency band,X _F0 [k]for an audio segment, k is the sampling point. According to the mode, the band-pass filter taking the main order frequency of the audio frame as the center frequency is adopted to filter the audio fragment, so that under the condition that the stability of the filter is ensured, the higher-order filter coefficient is optimized as much as possible to ensure smaller phase response distortion, and the quality of audio particles is further improved.

In one implementation scenario, candidate audio particles of the target frequency in the target state may be obtained based on two sampling points in the audio segment that are separated by a preset number of sampling periods and that both cross zero. In particular, zero crossings may be calculated for the audio signal of the filtered amplitude modulated signal, which may be expressed in particular as:

wherein, cross_zero _F0 [k]If the multiplication of the modulation signal of the kth sampling point and the modulation signal of the kth-1 sampling point is smaller than 0, the kth sampling point is represented as a zero crossing point, the preset value is the first multiple of the number of target engine cylinders, namely the value of the sampling period is related to the number of engine cylinders, and the number of target engine cylinders can be the preset number of cylinders or can be obtained through harmonic classification statistics. Further, candidate audio particles for the target frequency in the target state are determined based on the target engine cylinder number. For example, if the preset value is a first multiple of the target engine cylinder number, and the first multiple is set to 2, the candidate audio particles may be obtained by an expression, specifically expressed as:

wherein,,Z _F0 [k，n]representing the candidate audio particles,get_idxsubscript, n, representing the acquisition sample point _start Sample point subscript, n, representing the start of candidate audio particles _end Sample point subscript indicating termination as candidate audio particle _F0 [k]A zone bit of the kth sampling point, N _cylinder Indicating the target engine cylinder number. Specifically, the first sampling point is checked, if the first sampling point is a zero crossing point, whether the sampling point with the subscript of the two times of cylinder numbers plus the subscript of the sampling point is also a zero crossing point is checked, if the sampling point is also a zero crossing point, it is determined that the two sampling points are respectively used as starting points, and candidate audio particles are extracted. And then sequentially checking until all sampling points are checked.

Step S14: and for each main order frequency in the target state, respectively carrying out cross-correlation on each candidate audio particle based on the main order frequency and the reference audio particle, and determining the target audio particle of the main order frequency in the target state.

In an embodiment of the present disclosure, the reference audio particle is a target audio particle that has been determined before the primary order frequency of the target audio particle currently to be determined. As a possible implementation, the target audio particles may be determined for each primary order frequency in the target state in turn. In the acceleration state, the target audio particles can be sequentially determined according to the sequence from high to low of the main order frequency; or in a deceleration state, the target audio particles can be sequentially determined according to the sequence from low to high of the main order frequency; or, in the idle state, the target audio particles may be determined in the order from the large to the small of the primary order frequency, the target audio particles may be determined in the order from the small to the large of the primary order frequency, or the target audio particles may be determined directly in the order of each audio frame, which is not limited herein. Furthermore, for the primary order frequency of the target audio particle to be currently determined, the reference audio particle thereof may specifically be a target audio particle of the previous primary order frequency in the target state. Alternatively, as another possible implementation manner in practical applications, the reference audio particle may specifically be the target audio particle of the m (m > 1) th primary order frequency before the target state, which is not limited herein.

In one implementation scenario, as a possible implementation manner, if there are multiple candidate audio particles for each primary order frequency in the target state, any one of the candidate audio particles may be used as the target audio particle of the primary order frequency in the target state. Alternatively, the audio quality of these candidate audio particles may be compared, and the candidate audio particle having the highest quality may be selected as the target audio particle of the primary order frequency in the target state.

In another implementation scenario, different from the foregoing disclosure embodiment, in order to further enhance the quality of the audio particles, thereby enhancing the subsequent synthesis effect, as another possible implementation manner, the target audio particles of the previous primary order frequency in the target state to which the primary order frequency of the target audio particle currently to be determined belongs may be selected as the reference audio particles, and then the candidate audio particles are selected as the target audio particles of the primary order frequency in the target state based on the cross-correlation time delays of each candidate audio particle and the reference audio particle. Specifically, the candidate audio particle corresponding to the smallest cross-correlation delay may be selected as the target audio particle of the main order frequency in the target state. The cross-correlation time delay is that each candidate audio particle is respectively cross-correlated with two discrete sequences of the reference audio particle, so as to obtain a cross-correlation peak value, the moment corresponding to the cross-correlation peak value is the cross-correlation time delay, and then the candidate audio particle corresponding to the smallest cross-correlation time delay is selected as the target audio particle of the main order frequency in the target state. The cross-correlation calculation can be expressed as:

Wherein R < m >, k]For the result of the cross-correlation calculation,Z _F0 [k，n]representing the candidate audio particles,Z _F0 [n]representing the reference audio particles. Based on the cross-correlation calculation result, i.e. the cross-correlation time delay, selecting the candidate audio particle corresponding to the smallest cross-correlation time delay as the target audio particle of the primary order frequency in the target state, further,the length of time that the target audio particles can be acquired can be expressed as:

wherein L is _F0 A time length of the target audio particle representing the primary order frequency F0, n _start Indicating the start time of the target audio particle, n _end Indicating the termination time of the target audio particle. According to the mode, through the mutual correlation time delay of each candidate audio particle and the reference audio particle, and the candidate audio particle corresponding to the smallest mutual correlation time delay is selected to serve as the target audio particle of the main order frequency in the target state, the accuracy of the target audio particle is improved, and the quality of the audio particle is improved.

In one implementation scenario, after each candidate audio particle based on the primary order frequency is cross-correlated with the reference audio particle, respectively, the target audio particle of the primary order frequency in the target state is determined, and then the window function of the target audio particle may be obtained. Specifically, at least part of the data time-synchronized with the target audio particles may be extracted from the power system state data in response to the acquisition of the power system state data time-synchronized with the recording data for the target engine, and illustratively, the information of the speed, torque, accelerator pedal, etc. synchronized with the recording data may be extracted from the power system state data, and the gain processing or the envelope adjustment processing may be performed on the acquired target audio particles, for example, the gain processing may be performed by adjusting the amplitude response of the window function, and the envelope adjustment processing may be performed by an ADSR (attach/Decay/Sustain/Release) scheme in the music signal processing, it being understood that the larger the accelerator pedal, the larger the attach in the ADSR, the smaller the Release, and thus, the window function may be determined based on at least part of the data. In addition, the particle time length of the target audio particles, namely L, can be obtained in response to the condition data of the power system which is not acquired by the target engine and is time-synchronous with the recording data _F0 And taking the sum of the particle time length and the preset overlapped frame length as the target window length, namely the soundWhen the frequency particles are overlapped to generate sound waves, adjacent audio particles are partially overlapped, the duration of the overlapped part is the preset overlapped frame length, and the preset overlapped frame length can be set to be 2ms, 3ms, 4ms and the like, and the method is not particularly limited. The target window length can be expressed by the expression:

wherein,,L _win|F0 for the length of the window to be a target,L _F0 for the length of time of the particles,L _OLA the overlapping frame length is preset. Thus, the target window length may be obtained, and the window function may be determined based on the target window length, and the window function may be designed as a Tapered cosine window (tune window), or may be designed as a Raised cosine window (Raised cosine), such as a hanning window, a hamming window, or a variant implementation such as a hamming window. The design of the window function may be determined according to practical situations, and is not particularly limited herein. In addition, the window function of the target particles may be used to optimize the target audio particles. According to the method, the window function of the target audio particle is obtained, and the window function of the target particle is utilized to optimize the target audio particle, so that the effect of the audio particle is optimized, and the quality of the audio particle is improved.

Referring to fig. 2, fig. 2 is a flow chart illustrating another embodiment of the audio particle extraction method of the present application. Specifically, the method may include the steps of:

step S201: recording data of the target engine in a plurality of running states is obtained.

Specifically, the manner of acquiring the recording data may be referred to in the foregoing disclosed embodiments, which is not described herein in detail.

Step S202: whether the target engine collects power system state data time-synchronized with the recording data or not; if yes, go to step S203; otherwise, step S204 is performed.

In one implementation scenario, after the recording data is acquired, it may be further determined whether the target engine acquires the power system state data that is time-synchronized with the recording data, and it may be understood that if the target engine may acquire the power system state data that is time-synchronized with the recording data, the information such as the rotational speed data, the gear information, the speed, the torque, and the like may be acquired through the power system state data.

In one implementation scenario, after determining whether the target engine collects power system state data that is time-synchronized with the recording data, various operating states may be selected as target states, recording data in the target states may be selected as target data, and a primary order frequency of each audio frame in the target data may be determined, where the operating states include at least one of an acceleration state, a deceleration state, and an idle state. The method for determining the primary order frequency of each audio frame in the target data may refer to the method in the foregoing disclosed embodiment, and will not be described herein.

Step S203: and carrying out monotonicity check on the recording data based on the rotating speed data in the power system state data.

In one implementation scenario, the target rotation speed value of the target engine when the audio frame is acquired can be obtained through statistics based on the rotation speed value which is synchronous with the time of the audio frame in the rotation speed data, and then the check result of the recording data about monotonicity is determined based on the number ratio of the audio frame with the change trend of the target rotation speed value conforming to the target state in the recording data.

Step S204: and checking the validity of the recorded data.

In an implementation scenario, the method for checking the validity of the recording data may refer to the steps in the foregoing disclosed embodiments, which are not described herein.

Step S205: the recorded data is subjected to an order energy check.

In an implementation scenario, the method of performing the order energy check on the recording data may refer to the steps in the foregoing disclosed embodiments, which are not described herein.

Step S206: and performing monotonicity check on the recording data.

In an implementation scenario, the manner of performing the monotonicity check on the recording data may refer to the steps in the foregoing disclosed embodiments, which are not described herein.

Step S207: target data in a target state.

In one implementation scenario, various operating states may be selected as target states, recording data in the target states may be selected as target data, and a primary order frequency of each audio frame in the target data may be determined.

Step S208: candidate audio particles for a target frequency in a target state.

Specifically, the determination manner of the candidate audio particles may refer to the manner in the foregoing disclosed embodiments, which is not described herein.

Step S209: and determining target audio particles of the main order frequency in the target state.

Specifically, the determination manner of the target audio particle may refer to the manner in the foregoing disclosed embodiment, which is not described herein.

Step S210: a window function of the target audio particle is acquired.

Specifically, the determination manner of the window function may refer to the manner in the foregoing disclosed embodiment, which is not described herein.

Step S211: extraction of audio particles.

In one implementation scenario, the extraction of audio particles may be continuous. It will be appreciated that after the audio data is examined until the target audio particles are optimized with the window function of the target particles, the audio particles may be continuously extracted, and after the audio particles are extracted, the index table may be generated directly using the remaining audio particles. The audio particles are extracted, so that the effect of the audio particles is optimized, and the quality of the audio particles is improved.

Step S212: and generating an audio particle index table.

In one implementation scenario, all the obtained audio particles may be arranged according to the order of the primary order frequency, and stored in an audio file with a preset sampling rate, where the preset sampling rate may be set according to a hardware parameter during the synthesis of the acoustic waves, for example, the hardware supports a sampling rate of 4k, and the preset sampling rate is just 4k, and may be determined according to the actual situation, which is not specifically limited herein. And simultaneously, an index sequence number, a main order frequency, window function parameters, a target window length and the like of corresponding audio storage position information are given, and the format of the index parameter table file can be selected but is not limited to json, xml and the like. In addition, the audio file and the index parameter table file are stored in a FLASH memory of the synthesis device for audio splicing application in a subsequent synthesis stage, and the synthesis device can be a Head Unit of an automobile host, a power amplifier AMP with processing capability, and the like, which is not particularly limited herein.

According to the scheme, through acquiring the recording data of the target engine in a plurality of running states and judging whether the target transmitter acquires the power system state data which are time-synchronous with the recording data, under the condition that the target transmitter does not acquire the power system state data which are time-synchronous with the recording data, the recording data are effectively checked, the order energy check and the monotonicity check are respectively carried out; and under the condition that the target transmitter acquires the power system state data which is time-synchronous with the recording data, the recording data is subjected to monotonicity check based on the rotating speed data in the power system state data. After the record data is checked, various running states are selected as target states, the record data in the target states are selected as target data, candidate audio particles of target frequencies in the target states are determined, and the candidate audio particles are selected as target audio particles of main order frequencies in the target states based on the mutual correlation time delays of the candidate audio particles and the reference audio particles. On the basis, in the process of checking the audio data until the window function of the target particles is utilized to optimize the target audio particles, the audio particles can be continuously extracted, and after the audio particles are extracted, the index table can be directly generated by using the residual audio particles. On one hand, whether the target transmitter collects the state data of the power system in time synchronization with the recorded data or not is judged, recorded data under different conditions are checked, the effectiveness of the recorded data is improved, on the other hand, after the audio particles are extracted, the index table is generated by using residual audio particles, the quality of the audio particle index table is improved, and the efficiency of acquiring the audio particles is improved. Therefore, the quality of the audio particles can be improved, and the effect of synthesizing the sound waves can be further improved.

Referring to fig. 3, fig. 3 is a flow chart illustrating an embodiment of a method for synthesizing acoustic waves according to the present application. Specifically, the method may include the steps of:

step S31: and acquiring target audio particles of each primary order frequency in a plurality of running states, and continuously acquiring driving information.

In an embodiment of the present disclosure, the driving information includes at least a driving speed, an accelerator pedal depth, and a driving state, and the target audio particle is determined based on the audio particle extraction method of any one of the foregoing embodiments of the disclosure.

In one implementation scenario, target audio particles for each primary order frequency may be acquired for each operating state, including an acceleration state, a deceleration state, and an idle state. In addition, driving information is continuously acquired, and the driving information can comprise driving speed, accelerator pedal depth and driving state.

In a specific implementation scene, in a bench test or offline simulation scene, the simulation running parameters and the depth of the accelerator pedal can be predicted based on a vehicle speed model to obtain the running speed; wherein the simulated driving parameters at least comprise: the vehicle weight, the running gradient and the running resistance are simulated. The vehicle speed model can be a mathematical function, and in a bench test or offline simulation scene, prediction is required based on the vehicle speed model to obtain related information of driving. According to the mode, the simulation driving parameters and the depth of the accelerator pedal are predicted through the vehicle speed model, so that the driving speed is obtained, and the accuracy of the driving speed in different scenes is improved.

In one embodiment, in a real driving scenario, the driving information further includes a motor operation parameter, and the audio data simulating exhaust tempering can be synthesized based on the motor operation parameter. According to the mode, the audio data simulating exhaust tempering is synthesized based on the motor operation parameters, so that the effect of synthesizing sound waves is enriched as much as possible.

Step S32: and predicting the driving speed and the depth of the accelerator pedal in the driving information based on the rotating speed model to obtain a predicted rotating speed value of the virtual engine.

In one implementation scenario, the driving speed and the accelerator pedal depth in the driving information can be predicted based on the rotational speed model, so as to obtain a predicted rotational speed value of the virtual engine. Specifically, the rotational speed model can be fitted first, so that the prediction result of the rotational speed model is more accurate.

In one implementation scenario, based on the predicted gear value, analyzing whether a virtual engine has a gear jump; a new predicted rotating speed value and a new predicted gear value can be obtained based on the rotating speed model by adopting a gear gradual change strategy in response to the gear jump of the virtual engine; wherein the new predicted gear value is free of gear jumps. Specifically, whether the virtual engine has gear jump is analyzed, the gear can be detected, and if the gear jump exists, for example, the gear jumps from 1 gear to 3 gear, a gear slow-change strategy is adopted to obtain a new predicted rotating speed value and a new predicted gear value based on a rotating speed model. The gear ramp strategy may include, but is not limited to, an upshift ramp strategy with gradual fade overlap curve climb, a downshift ramp strategy with step climb and step descent, and so on. According to the mode, whether the virtual engine has gear jump or not is analyzed, and for the gear jump, a new predicted rotating speed value and a new predicted gear value are obtained based on the rotating speed model by adopting a gear slow-changing strategy, so that gear change is more gentle, and the applicability of the sound wave synthesizing method is improved.

Step S33: and predicting the main order frequency reached by the virtual engine at the predicted rotating speed value based on the predicted rotating speed value, the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the rotating speed model and the lowest main order frequency of the target engine at the idle speed state.

In one implementation scenario, the predicted rotation speed value and the maximum rotation speed value and the minimum rotation speed value set by the rotation speed model for the virtual engine may be obtained first, and it is understood that the maximum rotation speed value and the minimum rotation speed value are fixed preset parameter values, and of course, the maximum rotation speed value and the minimum rotation speed value may also be updated according to actual situations. In addition, the lowest main order frequency of the target engine in the idle state is obtained, and the lowest main order frequency in the idle state can be selected through the main order frequency of each audio frame in the target data in the idle state, namely, the lowest main order frequency in the idle state is selected. The target engine is a fuel vehicle engine, and the target data is audio data in an operating state consistent with the driving state. As a possible implementation, when the target state is the acceleration state, the primary order frequency of the first audio frame in the target data may be selected as the lowest primary order frequency. Unlike the foregoing embodiment, when the target state is the deceleration state, the main order frequency of the last audio frame in the target data may be selected as the lowest main order frequency. After obtaining a maximum rotating speed value, a minimum rotating speed value and a predicted rotating speed value which are set by the virtual engine and the lowest main order frequency of the target engine in an idle state, obtaining a first difference value between the predicted rotating speed value and the minimum rotating speed value, and obtaining a second difference value between the maximum rotating speed value and the minimum rotating speed value; and taking the sum of the ratio between the first difference and the second difference and the lowest main order frequency in the idle state as the main order frequency reached by the virtual engine at the predicted rotating speed value. Reference may be made specifically to the foregoing descriptions of the disclosed embodiments, and details are not repeated herein.

Step S34: and selecting the target audio particles corresponding to the predicted main order frequency under the running state consistent with the running state, and transmitting the target audio particles into the buffer space.

In one implementation scenario, the driving state is any one of an acceleration state, a deceleration state, and an idle state. And in the running state consistent with the running state, namely when the running state is consistent with the running state, selecting the target audio particles corresponding to the predicted main order frequency, transmitting the target audio particles into a buffer space, and selecting a first-in first-out queue buffer as the buffer space.

Step S35: and synthesizing to obtain the sound wave data based on the target audio particles in the buffer space.

In one implementation scenario, based on target audio particles cached in a cache space, a plurality of target audio particles are spliced, and then acoustic wave data is synthesized. It should be noted that the "multiple target audio particles" may be specifically set according to the requirement of the synthesis of the acoustic waves. Such as 4, 5, 6, etc., without limitation.

Referring to fig. 4, fig. 4 is a schematic diagram of a framework of an embodiment of a bench test or off-line simulation scenario. The depth of the accelerator pedal for simulating driving can be obtained firstly, and parameters such as the weight of a simulated vehicle, the simulated driving gradient, the simulated driving resistance and the like can be obtained simultaneously. At this time, parameters such as the simulated vehicle weight, the simulated running gradient, the simulated running resistance and the like can be predicted based on a vehicle speed model to obtain the running speed, wherein the vehicle speed model can be a mathematical function; and then, the opening and closing degree of the virtual valve is obtained based on the depth of the accelerator pedal, and the opening and closing degree and the running speed of the virtual valve are predicted based on the rotating speed model, so that a predicted rotating speed value and a predicted gear value of the virtual engine are obtained.

Further, referring to fig. 5, fig. 5 is a schematic diagram of a frame of another embodiment in a bench test or offline simulation scenario, after obtaining a predicted rotation speed value and a predicted gear value of the virtual engine, whether the virtual engine has a gear jump may be further analyzed based on the predicted gear value; and responding to the gear jump of the virtual engine, obtaining a new predicted rotating speed value and a new predicted gear value based on the rotating speed model by adopting a gear gradual change strategy, wherein the new predicted gear value does not have the gear jump. And predicting the main order frequency reached by the virtual engine in the predicted rotating speed value based on the predicted rotating speed value, the maximum rotating speed value and the minimum rotating speed value set by the rotating speed model for the virtual engine and the lowest main order frequency of the target engine in an idle state, searching an index parameter table file from the FLASH based on the main order frequency, wherein the index parameter table file is stored in the FLASH in advance, the index parameter table stores corresponding relations between audio particles and audio in different states, acquiring the audio file based on the index parameter table file, selecting the target audio particles corresponding to the predicted main order frequency, enabling the target audio particles to enter an audio Buffer queue, and splicing to obtain sound wave data.

Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an embodiment in a real-vehicle driving scenario, in which a motor operation parameter is obtained through a CAN signal during an electric car operation process, the motor operation parameter at least includes a change of parameters such as a rotational speed of a motor, an output power of the motor, a torque of the motor, and the like, and based on the motor operation parameter, audio data simulating exhaust tempering is synthesized. The opening and closing degree of the virtual valve can be obtained based on the depth of the accelerator pedal, and the opening and closing degree, the running speed and the motor running parameters of the virtual valve are predicted based on the rotating speed model, so that a predicted rotating speed value and a predicted gear value of the virtual engine are obtained. Further, whether the virtual engine has gear jump can be analyzed based on the predicted gear value; and responding to the gear jump of the virtual engine, obtaining a new predicted rotating speed value and a new predicted gear value based on the rotating speed model by adopting a gear gradual change strategy, wherein the new predicted gear value does not have the gear jump.

In the mode, the target audio particles of each main order frequency in a plurality of running states are obtained, and the driving information is continuously obtained; the driving information at least comprises driving speed, accelerator pedal depth and driving state, and the target audio particles are determined based on any audio particle extraction method; predicting the driving speed and the depth of an accelerator pedal in the driving information based on the rotating speed model to obtain a predicted rotating speed value of the virtual engine; based on the predicted rotation speed value, the maximum rotation speed value and the minimum rotation speed value set for the virtual engine and the lowest main order frequency of the target engine in an idle state, predicting the main order frequency reached by the virtual engine in the predicted rotation speed value; selecting target audio particles corresponding to the predicted main order frequency under the running state consistent with the running state, and transmitting the target audio particles into a cache space; on the one hand, the prediction speed value of the virtual engine is obtained by predicting the driving speed and the depth of the accelerator pedal in the driving information based on a speed model, which is helpful to improve the accuracy of the prediction speed value, on the other hand, the prediction speed value and the maximum speed value and the minimum speed value set by the speed model for the virtual engine, and the lowest main order frequency of the target engine in an idle state are used for predicting the main order frequency reached by the virtual engine in the prediction speed value, so that the accuracy of the main order frequency obtained by prediction is improved as much as possible, and on the basis, the sound wave data is synthesized, and the effect of sound wave synthesis can be improved.

Referring to fig. 7, fig. 7 is a schematic diagram of an embodiment of a method for synthesizing acoustic waves according to the present application. The acquired driving information is input into a digital audio processor, an index table is acquired from FLASH storage, the index table comprises a plurality of target audio particles of each primary order frequency in an operation state, the target audio particles of each primary order frequency in the operation state are also input into the digital audio processor for audio processing, namely a plurality of audio particles are spliced to synthesize audio data, the audio data is further subjected to power amplification and PAN debugging of a loudspeaker to simulate engine sound waves, and sound wave data is further obtained.

Referring to fig. 8, fig. 8 is a schematic frame diagram of an embodiment of an audio particle extraction device of the present application. The audio particle extraction device 80 includes: a sound recording module 81, a selection module 82, a determination module 83, a candidate module 84, and an extraction module 85. The recording module 81 is used for acquiring recording data of the target engine in a plurality of running states; the several operating states include: at least one of an acceleration state, a deceleration state, and an idle state; the selection module 82 is configured to select various operation states as target states, and select recording data in the target states as target data; the determining module 83 is configured to determine a primary order frequency of each audio frame in the target data; the candidate module 84 is configured to extract, for each audio frame in the target data, an audio segment from the target data based on a center time determined by the audio frame, and obtain candidate audio particles of a target frequency in a target state based on two sampling points in the audio segment, which are separated by a preset number of sampling periods and are both zero-crossing; the target frequency is the primary order frequency of the audio frame, and the preset value is a first multiple of the number of cylinders of the target engine; the extraction module 85 is configured to determine, for each primary frequency in the target state, a target audio particle of the primary frequency in the target state based on cross-correlation between each candidate audio particle of the primary frequency and the reference audio particle; wherein the reference audio particles are target audio particles that have been determined before the primary order frequency of the target audio particles to be determined currently.

In the above-described aspect, for the audio particle extraction apparatus 80, it is configured to obtain the recording data of the target engine in several operating states; wherein the plurality of operating conditions include: at least one of an acceleration state, a deceleration state, and an idle state; respectively selecting various running states as target states, selecting recording data in the target states as target data, and determining the main order frequency of each audio frame in the target data; for each audio frame in the target data, extracting an audio fragment from the target data based on the central moment determined by the audio frame, and obtaining candidate audio particles of the target frequency in the target state based on two sampling points which are separated by a preset numerical value sampling period and are zero crossing in the audio fragment; the target frequency is the primary order frequency of the audio frame, and the preset value is a first multiple of the number of cylinders of the target engine; for each primary order frequency in the target state, respectively carrying out cross-correlation on each candidate audio particle based on the primary order frequency and the reference audio particle, and determining the target audio particle of the primary order frequency in the target state; the method comprises the steps that a reference audio particle is a target audio particle which is determined before the main order frequency of a target audio particle to be determined currently, on one hand, recording data of a target engine in a plurality of running states are obtained, the recording data in the target state are selected as target data, the main order frequency of each audio frame in the target data is determined, and the accuracy of the main order frequency of each audio frame in the target data in each running state is improved; on the other hand, based on the central moment determined by the audio frame, the audio fragment is extracted from the target data, and based on two sampling points which are separated by a preset number of sampling periods and are zero crossing in the audio fragment, the candidate audio particles of the target frequency in the target state are obtained, the efficiency of obtaining the candidate audio particles of the target frequency in the target state is improved, on the basis, the candidate audio particles of the main order frequency are respectively cross-correlated with the reference audio particles, the target audio particles of the main order frequency in the target state are determined, and further the audio particles of the complete period can be extracted, and the quality of the extracted audio particles is ensured. Therefore, the quality of the audio particles can be improved, and the effect of synthesizing the sound waves can be further improved.

In some disclosed embodiments, the center instant is located at the center of the audio frame; and/or the duration of the audio clip is a second multiple of the corresponding period of the primary order frequency of the audio frame.

In some disclosed embodiments, the extraction module 85 includes a first selection sub-module and a second selection sub-module; the first selecting sub-module is used for selecting the target audio particle of the previous main order frequency under the target state that the main order frequency of the target audio particle to be determined currently belongs to as the reference audio particle; the second selecting sub-module is used for selecting the candidate audio particles as target audio particles of the main order frequency in the target state based on the mutual correlation time delay of each candidate audio particle and the reference audio particle.

Therefore, by respectively carrying out cross-correlation time delay on each candidate audio particle and the reference audio particle and selecting the candidate audio particle corresponding to the smallest cross-correlation time delay as the target audio particle of the main order frequency in the target state, the accuracy of the target audio particle is improved, and the quality of the audio particle is improved.

In some disclosed embodiments, the second selection submodule includes a selection unit, where the selection unit is configured to select a candidate audio particle corresponding to the smallest cross-correlation delay as a target audio particle of the primary order frequency in the target state.

In some disclosed embodiments, the candidate module 84 includes a filtering sub-module for filtering the audio clip using a bandpass filter centered at the primary order frequency of the audio frame.

Therefore, by adopting the band-pass filter taking the main order frequency of the audio frame as the center frequency to filter the audio fragment, the higher order filter coefficient is optimized as much as possible under the condition of ensuring the stability of the filter so as to ensure smaller phase response distortion, thereby improving the quality of the audio particles.

In some disclosed embodiments, the determination module 83 includes an extraction sub-module, a first determination sub-module, and a second determination sub-module when power system state data is collected for the target engine that is time synchronized with the recorded data; the extraction submodule is used for extracting at least one rotating speed value which is synchronous with the time of the audio frame from rotating speed data in the state data of the power system; the first determination submodule is used for carrying out numerical statistics based on at least one rotating speed value and determining a target rotating speed value of a target engine when an audio frame is acquired; the second determining submodule is used for determining the main order frequency of the audio frame based on the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the target rotating speed value and the rotating speed model and the lowest main order frequency of the target engine in the idle state.

Therefore, the method comprises the steps of extracting at least one rotating speed value which is synchronous with the time of the audio frame, carrying out numerical statistics on the at least one rotating speed value, further determining a target rotating speed value of the target engine, improving the accuracy of the target rotating speed value, and further determining the main order frequency of the audio frame based on the maximum rotating speed value, the minimum rotating speed value and the target rotating speed value which are set for the virtual engine by the rotating speed model and the lowest main order frequency of the target engine in an idle state, thereby improving the accuracy of the main order frequency.

In some disclosed embodiments, the second determining submodule includes an obtaining unit and a calculating unit, where the obtaining unit is configured to obtain a first difference value between the target rotation speed value and the minimum rotation speed value, and obtain a second difference value between the maximum rotation speed value and the minimum rotation speed value; the calculating unit is used for taking the sum of the ratio between the first difference value and the second difference value and the lowest main order frequency in the idle state as the main order frequency of the audio frame.

Therefore, the maximum rotating speed value, the minimum rotating speed value and the target rotating speed value which are set for the virtual engine through the rotating speed model are beneficial to improving the accuracy of the primary order frequency of the audio frame and further improving the quality of the audio particles.

In some disclosed embodiments, the audio particle extraction apparatus 80 includes an acquisition module and an optimization module; the acquisition module is used for acquiring a window function of the target audio particle; the optimization module is used for optimizing the target audio particles by utilizing window functions of the target particles.

Therefore, the window function of the target audio particle is obtained, and the window function of the target particle is utilized to optimize the target audio particle, so that the effect of the audio particle is optimized, and the quality of the audio particle is improved.

In some disclosed embodiments, obtaining a window function of the target audio particle includes at least one of: in response to the acquisition of power system state data for the target engine that is time synchronized with the recorded data, extracting at least a portion of the data from the power system state data that is time synchronized with the target audio particles, and determining a window function based on the at least a portion of the data; and responding to the condition data of the power system which is not acquired by the target engine and is time-synchronous with the recording data, acquiring the particle time length of the target audio particles, taking the sum of the particle time length and the preset overlapped frame length as the target window length, and determining a window function based on the target window length.

In some disclosed embodiments, the audio particle extraction apparatus 80 includes a first response module and a second response module; the first response module is used for responding to the power system state data which is acquired from the target engine and is time-synchronous with the recording data, and analyzing the check result of the recording data about monotonicity based on the rotating speed data in the power system state data; the second response module is used for analyzing the check result of the recording data about monotonicity based on the primary order frequency of each audio frame in the target data in response to the condition data of the power system which is not acquired by the target engine and is time synchronous with the recording data.

Therefore, by analyzing the check result of the recording data about monotonicity based on the primary order frequency of each audio frame in the target data, the rate of extracting the audio particles is improved, and the quality of the audio particles is further improved.

In some disclosed embodiments, the first response module includes a statistics submodule and a determination submodule; the statistics sub-module is used for counting and obtaining a target rotating speed value of the target engine when the audio frame is acquired based on the rotating speed value which is synchronous with the audio frame time in the rotating speed data; the determining submodule is used for determining the check result of the recording data about monotonicity based on the quantity ratio of the audio frames, which have the change trend of the target rotating speed value and accord with the target state, in the recording data.

Therefore, the checking result of the monotonicity of the recording data is determined according to whether the change trend of the target rotating speed value accords with the quantity ratio of the audio frames in the target state in the recording data, so that the checking result of the monotonicity of the acceleration state, the deceleration state and the idle state can be obtained rapidly, the speed of extracting the audio particles is improved, and the quality of the audio particles is improved.

In some disclosed embodiments, the second response module includes an analysis submodule and a determination submodule; the analysis sub-module is used for analyzing whether each audio frame in the recording data meets the signal-to-noise ratio condition; wherein, the signal-to-noise ratio condition includes: the first signal-to-noise ratio corresponding to the primary order frequency is larger than a first threshold value, and the second signal-to-noise ratio corresponding to the harmonic order frequency outside the primary order frequency is larger than a second threshold value; the determining submodule is used for determining the checking result of the recording data about the order energy based on the quantity ratio of the audio frames meeting the signal-to-noise ratio condition in the recording data.

Therefore, the detection result of the audio data about the order energy is determined based on the number ratio of the audio frames meeting the signal-to-noise ratio condition in the audio data, so that the rate of extracting the audio particles is improved, and the quality of the audio particles is further improved.

Referring to fig. 9, fig. 9 is a schematic diagram of a frame of an embodiment of a wave synthesizing apparatus according to the present application. The acoustic wave synthesizing apparatus 90 includes: an acquisition module 91, a rotation speed prediction module 92, a frequency prediction module 93, a buffer module 94 and a synthesis module 95. The acquiring module 91 is configured to acquire target audio particles of each primary order frequency in a plurality of running states, and continuously acquire driving information; the driving information at least comprises driving speed, accelerator pedal depth and driving state, and the target audio particles are determined based on the audio particle extraction method in any one of the above disclosed embodiments; the rotation speed prediction module 92 is configured to predict a driving speed and an accelerator pedal depth in driving information based on a rotation speed model, so as to obtain a predicted rotation speed value of the virtual engine; the frequency prediction module 93 is configured to predict a primary order frequency reached by the virtual engine at the predicted rotational speed value, based on a maximum rotational speed value, a minimum rotational speed value, and a predicted rotational speed value set for the virtual engine by the rotational speed model, and a lowest primary order frequency of the target engine at an idle state; the buffer module 94 is configured to select, in an operating state consistent with a driving state, a target audio particle corresponding to the predicted primary order frequency, and transmit the target audio particle to the buffer space; the synthesizing module 95 is configured to synthesize the acoustic wave data based on the target audio particles in the buffer space.

In the above manner, for the acoustic wave synthesizing apparatus 90, it continuously acquires the driving information by acquiring the target audio particles of each primary order frequency in several operation states; the driving information at least comprises driving speed, accelerator pedal depth and driving state, and the target audio particles are determined based on any audio particle extraction method; predicting the driving speed and the depth of an accelerator pedal in the driving information based on the rotating speed model to obtain a predicted rotating speed value of the virtual engine; based on the predicted rotation speed value, the maximum rotation speed value and the minimum rotation speed value set for the virtual engine and the lowest main order frequency of the target engine in an idle state, predicting the main order frequency reached by the virtual engine in the predicted rotation speed value; selecting target audio particles corresponding to the predicted main order frequency under the running state consistent with the running state, and transmitting the target audio particles into a cache space; on the one hand, the prediction speed value of the virtual engine is obtained by predicting the driving speed and the depth of the accelerator pedal in the driving information based on a speed model, which is helpful to improve the accuracy of the prediction speed value, on the other hand, the prediction speed value and the maximum speed value and the minimum speed value set by the speed model for the virtual engine, and the lowest main order frequency of the target engine in an idle state are used for predicting the main order frequency reached by the virtual engine in the prediction speed value, so that the accuracy of the main order frequency obtained by prediction is improved as much as possible, and on the basis, the sound wave data is synthesized, and the effect of sound wave synthesis can be improved.

In some disclosed embodiments, in a bench test or offline simulation scenario, the step of acquiring the drive speed comprises: predicting the simulated driving parameters and the depth of the accelerator pedal based on the vehicle speed model to obtain the driving speed; wherein the simulated driving parameters at least comprise: the vehicle weight, the running gradient and the running resistance are simulated.

Therefore, the simulation driving parameters and the depth of the accelerator pedal are predicted through the vehicle speed model, so that the driving speed is obtained, and the accuracy of the driving speed in different scenes is improved.

In some disclosed embodiments, the acoustic wave synthesizing apparatus 90 includes an analysis module and a response module; the analysis module is used for analyzing whether the virtual engine has gear jump or not based on the predicted gear value; the response module is used for responding to gear jump of the virtual engine, and obtaining a new predicted rotating speed value and a new predicted gear value based on the rotating speed model by adopting a gear gradual change strategy; wherein the new predicted gear value is free of gear jumps.

Therefore, whether the virtual engine has gear jump or not is analyzed, and for the gear jump, a new predicted rotating speed value and a new predicted gear value are obtained based on a rotating speed model by adopting a gear gradual change strategy, so that gear change is more gradual, and the applicability of the sound wave synthesis method is improved.

In some disclosed embodiments, in the real vehicle driving scenario, the driving information further includes motor operation parameters, and the method further includes: based on the motor operation parameters, synthesizing and obtaining the audio data for simulating the exhaust tempering.

Therefore, the audio data simulating exhaust tempering is synthesized based on the motor operation parameters, so that the effect of synthesizing sound waves is enriched as much as possible.

Referring to fig. 10, fig. 10 is a schematic diagram of a frame of an embodiment of the electronic device of the present application. The electronic device 100 comprises a memory 101 and a processor 102 coupled to each other, the memory 101 having stored therein program instructions, the processor 102 being adapted to execute the program instructions to implement the steps of any of the above-described embodiments of the method for extracting audio particles, or of any of the embodiments of the method for synthesizing acoustic waves. In particular, the electronic device 100 may include, but is not limited to: desktop computers, notebook computers, servers, cell phones, tablet computers, and the like, are not limited herein.

Specifically, the processor 102 is configured to control itself and the memory 101 to implement the steps of any of the above-described embodiments of the audio particle extraction method, or of any of the embodiments of the acoustic wave synthesis method. The processor 102 may also be referred to as a CPU (Central Processing Unit ). The processor 102 may be an integrated circuit chip having signal processing capabilities. The processor 102 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 102 may be commonly implemented by an integrated circuit chip.

According to the scheme, the electronic device 100 can realize the steps in any one of the embodiments of the audio particle extraction method or any one of the embodiments of the acoustic wave synthesis method, on one hand, by acquiring the recording data of the target engine in a plurality of running states and selecting the recording data in the target state as the target data, the main order frequency of each audio frame in the target data is determined, and the accuracy of the main order frequency of each audio frame in the target data in each running state is improved; on the other hand, based on the central moment determined by the audio frame, the audio fragment is extracted from the target data, and based on two sampling points which are separated by a preset number of sampling periods and are zero crossing in the audio fragment, the candidate audio particles of the target frequency in the target state are obtained, the efficiency of obtaining the candidate audio particles of the target frequency in the target state is improved, on the basis, the candidate audio particles of the main order frequency are respectively cross-correlated with the reference audio particles, the target audio particles of the main order frequency in the target state are determined, and further the audio particles of the complete period can be extracted, and the quality of the extracted audio particles is ensured. Therefore, the quality of the audio particles can be improved, and the effect of synthesizing the sound waves can be further improved.

Referring to FIG. 11, FIG. 11 is a schematic diagram illustrating an embodiment of a computer readable storage medium 110 of the present application. The computer readable storage medium 110 stores program instructions 111 executable by the processor, the program instructions 111 for implementing the steps in any of the above-described embodiments of the audio particle extraction method, or steps in any of the embodiments of the acoustic wave synthesis method.

In the above solution, the computer readable storage medium 110 may implement the steps in any one of the above embodiments of the method for extracting audio particles or any one of the embodiments of the method for synthesizing acoustic waves, on the one hand, by obtaining the recording data of the target engine in a plurality of operating states, and selecting the recording data in the target state as the target data, determining the primary order frequency of each audio frame in the target data, so as to help to improve the accuracy of the primary order frequency of each audio frame in the target data in each operating state; on the other hand, based on the central moment determined by the audio frame, the audio fragment is extracted from the target data, and based on two sampling points which are separated by a preset number of sampling periods and are zero crossing in the audio fragment, the candidate audio particles of the target frequency in the target state are obtained, the efficiency of obtaining the candidate audio particles of the target frequency in the target state is improved, on the basis, the candidate audio particles of the main order frequency are respectively cross-correlated with the reference audio particles, the target audio particles of the main order frequency in the target state are determined, and further the audio particles of the complete period can be extracted, and the quality of the extracted audio particles is ensured. Therefore, the quality of the audio particles can be improved, and the effect of synthesizing the sound waves can be further improved.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed methods and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information, and obtains independent consent of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.

Claims

1. An audio particle extraction method, comprising:

Acquiring recording data of a target engine in a plurality of running states; wherein the plurality of operating conditions includes: at least one of an acceleration state, a deceleration state, and an idle state;

respectively selecting various running states as target states, selecting recording data in the target states as target data, and determining the main order frequency of each audio frame in the target data; when the target engine is not acquired with the power system state data which is time-synchronous with the recording data, the main order frequency is determined based on the recording data;

for each audio frame in the target data, extracting an audio fragment from the target data based on a central moment determined by the audio frame, and obtaining candidate audio particles of target frequency in the target state based on two sampling points which are separated by a preset numerical value sampling period and are zero-crossing in the audio fragment; the target frequency is the main order frequency of the audio frame, and the preset value is a first multiple of the target engine cylinder number;

For each of the primary order frequencies in the target state, determining a target audio particle of the primary order frequency in the target state based on cross-correlation of each of the candidate audio particles of the primary order frequency with a reference audio particle, respectively; wherein the reference audio particles are target audio particles that have been determined before the primary order frequency of the target audio particles is currently to be determined.

2. The method of claim 1, wherein the center instant is located at the center of the audio frame.

3. The method of claim 1, wherein the audio clip has a duration that is a second multiple of a period corresponding to a primary order frequency of the audio frame.

4. The method of claim 1, wherein the determining the target audio particle of the primary order frequency in the target state based on the cross-correlation of each of the candidate audio particles of the primary order frequency with a reference audio particle, respectively, comprises:

selecting a target audio particle of the previous primary order frequency under the target state to which the primary order frequency of the target audio particle to be determined currently belongs as the reference audio particle;

And selecting the candidate audio particles as target audio particles of the main order frequency in the target state based on the mutual correlation time delay of each candidate audio particle and the reference audio particle.

5. The method of claim 4, wherein selecting the candidate audio particles as the target audio particles for the primary order frequency in the target state based on the cross-correlation delays of each of the candidate audio particles and the reference audio particles, respectively, comprises:

and selecting the candidate audio particles corresponding to the smallest cross-correlation time delay as target audio particles of the main order frequency in the target state.

6. The method according to claim 1, wherein after the extracting of an audio clip based on the center time determined by the audio frame, and before the obtaining of candidate audio particles of a target frequency in the target state based on two sampling points in the audio clip that are separated by a preset number of sampling periods and that each cross zero, the method comprises:

and filtering the audio fragment by adopting a band-pass filter taking the main order frequency of the audio frame as a central frequency.

7. The method of claim 1, wherein said determining a primary order frequency for each audio frame in said target data when power system state data is collected for said target engine that is time synchronized with said recorded data comprises:

extracting at least one rotational speed value time-synchronized with the audio frame from rotational speed data in the power system state data;

performing numerical statistics based on the at least one rotation speed value, and determining a target rotation speed value of the target engine when the audio frame is acquired;

and determining the main order frequency of the audio frame based on the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the target rotating speed value and the rotating speed model and the lowest main order frequency of the target engine in the idle state.

8. The method of claim 7, wherein the determining the primary order frequency of the audio frame based on the target speed value and the speed model set for the virtual engine as a maximum speed value, a minimum speed value, and a lowest primary order frequency of the target engine in the idle state comprises:

acquiring a first difference value between the target rotating speed value and the minimum rotating speed value, and acquiring a second difference value between the maximum rotating speed value and the minimum rotating speed value;

And taking the sum of the lowest main order frequency in the idle state and the ratio between the first difference value and the second difference value as the main order frequency of the audio frame.

9. The method of claim 1, wherein after the cross-correlating each of the candidate audio particles based on the primary order frequency with a reference audio particle, respectively, to determine a target audio particle for the primary order frequency in the target state, the method further comprises:

acquiring a window function of the target audio particle;

and optimizing the target audio particles by utilizing window functions of the target audio particles.

10. The method of claim 9, wherein the obtaining the window function of the target audio particle comprises at least one of:

in response to acquiring power system state data for the target engine that is time synchronized with the recorded data, extracting at least a portion of the data from the power system state data that is time synchronized with the target audio particles, and determining the window function based on the at least a portion of the data;

and responding to the condition data of the power system which is not acquired by the target engine and is time-synchronous with the recording data, acquiring the particle time length of the target audio particles, taking the sum of the particle time length and a preset overlapping frame length as a target window length, and determining the window function based on the target window length.

11. The method of claim 1, wherein prior to extracting an audio clip from the target data at the center instant determined based on the audio frame, the method further comprises at least one of:

analyzing a check result of the recording data about monotonicity based on the rotational speed data in the power system state data in response to the target engine acquiring the power system state data time-synchronized with the recording data;

and analyzing a check result of the recording data about monotonicity based on a primary order frequency of each audio frame in the target data in response to the target engine not acquiring the power system state data time-synchronized with the recording data.

12. The method of claim 11, wherein analyzing the inspection of the recorded data for monotonicity based on rotational speed data in the powertrain state data comprises:

based on the rotation speed value which is time-synchronous with the audio frame in the rotation speed data, calculating to obtain a target rotation speed value of the target engine when the audio frame is acquired;

and determining the check result of the recording data about monotonicity based on the quantity ratio of the audio frames, which have the change trend of the target rotating speed value and accord with the target state, in the recording data.

13. The method of claim 11, wherein prior to said analyzing the inspection result of the recording data with respect to monotonicity based on the primary order frequency of each audio frame in the target data, the method further comprises:

analyzing whether each audio frame in the recording data meets a signal-to-noise ratio condition or not; wherein the signal-to-noise ratio condition comprises: the first signal-to-noise ratio corresponding to the main order frequency is larger than a first threshold value, and the second signal-to-noise ratio corresponding to the harmonic order frequency outside the main order frequency is larger than a second threshold value;

and determining a checking result of the recording data about the order energy based on the number ratio of the audio frames meeting the signal-to-noise ratio condition in the recording data.

14. A method of synthesizing a sound wave, comprising:

acquiring target audio particles of each primary order frequency in a plurality of running states, and continuously acquiring driving information; wherein the driving information includes at least a driving speed, an accelerator pedal depth, and a driving state, and the target audio particles are determined based on the audio particle extraction method according to any one of claims 1 to 13;

predicting the driving speed and the depth of an accelerator pedal in the driving information based on a rotating speed model to obtain a predicted rotating speed value of the virtual engine;

Predicting a main order frequency reached by the virtual engine at the predicted rotating speed value based on the predicted rotating speed value, the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the rotating speed model and the lowest main order frequency of the target engine at an idle state;

selecting target audio particles corresponding to the predicted main order frequency under the running state consistent with the running state, and transmitting the target audio particles into a cache space;

and synthesizing to obtain the sound wave data based on the target audio particles in the buffer space.

15. The method of claim 14, wherein the step of obtaining the driving speed in a bench test or offline simulation scenario comprises:

predicting the simulated driving parameters and the depth of the accelerator pedal based on a vehicle speed model to obtain the driving speed; wherein the simulated driving parameters at least comprise: the vehicle weight, the running gradient and the running resistance are simulated.

16. The method according to claim 14, wherein a vehicle speed model predicts a predicted gear value for a vehicle speed and an accelerator pedal depth in the vehicle information, and wherein predicting the virtual engine before the predicted gear value reaches a main order frequency based on the predicted gear value and a maximum gear value, a minimum gear value, and a lowest main order frequency of a target engine in an idle state set by the gear model for the virtual engine, the method further comprises:

Analyzing whether gear jump exists in the virtual engine or not based on the predicted gear value;

responding to the gear jump of the virtual engine, and obtaining a new predicted rotating speed value and a new predicted gear value based on the rotating speed model by adopting a gear gradual change strategy; wherein the new predicted gear value is free of gear jumps.

17. The method of claim 14, wherein in a real-vehicle driving scenario, the driving information further comprises motor operating parameters, the method further comprising:

and synthesizing and obtaining the audio data simulating the exhaust tempering based on the motor operation parameters.

18. An audio particle extraction apparatus, comprising:

the recording module is used for acquiring recording data of the target engine in a plurality of running states; the number of operating conditions includes: at least one of an acceleration state, a deceleration state, and an idle state;

the selection module is used for respectively selecting various running states as target states and selecting recording data in the target states as target data;

the determining module is used for determining the main order frequency of each audio frame in the target data; when the target engine is not acquired with the power system state data which is time-synchronous with the recording data, the main order frequency is determined based on the recording data;

The candidate module is used for extracting an audio fragment from the target data based on the central moment determined by the audio frame for each audio frame in the target data, and obtaining candidate audio particles of target frequency in the target state based on two sampling points which are separated by a preset numerical value sampling period and are zero crossing in the audio fragment; the target frequency is the main order frequency of the audio frame, and the preset value is a first multiple of the target engine cylinder number;

the extraction module is used for carrying out cross-correlation on each candidate audio particle of the main order frequency and the reference audio particle respectively for each main order frequency in the target state, and determining target audio particles of the main order frequency in the target state; wherein the reference audio particles are target audio particles that have been determined before the primary order frequency of the target audio particles is currently to be determined.

19. A wave synthesizing apparatus, comprising:

the acquisition module is used for acquiring target audio particles of each primary order frequency in a plurality of running states and continuously acquiring driving information; wherein the driving information includes at least a driving speed, an accelerator pedal depth, and a driving state, and the target audio particles are determined based on the audio particle extraction method according to any one of claims 1 to 13;

The rotating speed prediction module is used for predicting the driving speed and the depth of the accelerator pedal in the driving information based on a rotating speed model to obtain a predicted rotating speed value of the virtual engine;

the frequency prediction module is used for predicting the main order frequency reached by the virtual engine at the predicted rotating speed value based on the predicted rotating speed value, the maximum rotating speed value and the minimum rotating speed value set for the virtual engine by the rotating speed model and the lowest main order frequency of the target engine at the idle speed state;

the buffer memory module is used for selecting the target audio particles corresponding to the predicted main order frequency under the running state consistent with the running state, and transmitting the target audio particles into a buffer memory space;

and the synthesis module is used for synthesizing the sound wave data based on the target audio particles in the buffer space.

20. An electronic device comprising a memory and a processor, the memory having stored therein program instructions for executing the program instructions to implement the audio particle extraction method of any one of claims 1 to 13 or the method of synthesizing a sound wave of any one of claims 14 to 17.

21. A computer readable storage medium, characterized in that program instructions executable by a processor for implementing the audio particle extraction method of any one of claims 1 to 13 or the method of synthesizing waves of any one of claims 14 to 17 are stored.