CN116437263A

CN116437263A - Extraction system information acquisition method, recording signal processing method, equipment and product

Info

Publication number: CN116437263A
Application number: CN202310306030.4A
Authority: CN
Inventors: 张超鹏; 赵伟峰; 姜涛; 关晓珂; 邓源强
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-07-14

Abstract

The application relates to a stoping system information acquisition method, a recording signal processing method, a computer device and a computer program product. The method comprises the following steps: acquiring a target recording signal corresponding to a test signal in the recording signals acquired by a stoping system of the client; performing cross-correlation processing on the target recording signal and the test signal to obtain the correlation between each signal sample point in the test signal and the target recording signal; according to the correlation between a plurality of signal sample points in the test signal and the target recording signal, determining the impulse response of a stoping system of the client when the target recording signal is acquired; and determining the frequency response information of the stoping system of the client according to the impulse response. According to the method and the device for obtaining the audio signal, the frequency response information of the client side when the audio signal played by the local terminal is extracted can be rapidly and conveniently determined according to the processing result of the cross-correlation processing of the target recording signal and the test signal, resources consumed for obtaining the equipment frequency response information are effectively reduced, and the obtaining efficiency of the frequency response information is improved.

Description

Extraction system information acquisition method, recording signal processing method, equipment and product

Technical Field

The present disclosure relates to the field of audio technologies, and in particular, to a method for acquiring information of a stoping system, a method for processing a recording signal, a computer device, and a computer program product.

Background

With the development of computer technology and the popularization of singing music software, users can install corresponding clients on devices such as terminals and the like, and audio processing such as song recording and the like is performed through the clients. To obtain a better processing effect, frequency response information may be obtained.

In the related art, often, before the equipment leaves the factory, a professional detector detects the frequency response information of the equipment by using a related detection tool. However, the detection of the frequency response information mainly aims at professional audio-video equipment, the detection of the frequency response information is not performed before leaving the factory for handheld equipment such as mobile phones, users often cannot acquire the frequency response information of the equipment, or the frequency response information can be acquired after consuming more time or expense, and the problem of low frequency response information acquisition efficiency exists.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a stoping system information acquisition method, a recording signal processing method, a computer device, and a computer program product.

In a first aspect, the present application provides a method for acquiring information of a recovery system. The method comprises the following steps:

acquiring a target recording signal corresponding to a test signal in the recording signals acquired by a stoping system of the client; the recording signal is collected by a stoping system of the client in the process of playing the test signal by the client;

performing cross-correlation processing on the target recording signal and the test signal to obtain the correlation between each signal sample point in the test signal and the target recording signal;

according to the correlation between a plurality of signal sample points in the test signal and the target recording signal, determining the impulse response of the stoping system of the client when the target recording signal is acquired;

and determining the frequency response information of the stoping system of the client according to the impulse response.

In one embodiment, the determining, according to the correlation between the plurality of signal samples in the test signal and the target recording signal, an impulse response of the stoping system of the client when the target recording signal is acquired includes:

determining an impulse response period of the target recording signal according to the correlation between a plurality of signal sample points in the test signal and the target recording signal;

And obtaining the impulse response of the stoping system of the client when the target recording signal is acquired based on the correlation of each signal sample point in the impulse response time period.

In one embodiment, the determining the impulse response period of the target recording signal according to the correlation between the plurality of signal samples in the test signal and the target recording signal includes:

determining the maximum sample point with the maximum correlation with the target recording signal in each signal sample point;

taking the sample point with the largest relativity change as an initial sample point in a preset N periods before the maximum sample point; the N is a positive number;

determining a sample point with minimum correlation with the target recording signal in a preset M periods after the initial sample point as an ending sample point; m is a positive number;

and obtaining the impulse response time period of the target recording signal based on the starting sample point and the ending sample point.

In one embodiment, the determining, as the ending sample, a sample having the smallest correlation with the target recording signal in a preset M periods after the starting sample includes:

acquiring a signal correlation sequence based on the correlation between each signal sample point in the recording signal and the target recording signal;

Performing low-pass filtering and zero-phase delay filtering on the signal correlation sequence to obtain a processed signal correlation sequence;

and in the preset M periods after the initial sampling point, taking the sampling point with the smallest correlation with the test signal in the processed signal correlation sequence as an ending sampling point.

In one embodiment, the test signal comprises a plurality of test signal segments; the method for obtaining the target recording signal corresponding to the test signal in the recording signals collected by the stoping system of the client comprises the following steps:

determining recording signal fragments corresponding to the test signal fragments in the recording signals collected by the stoping system of the client, and determining each recording signal fragment as a target recording signal to obtain a plurality of target recording signals;

the determining the frequency response information of the stoping system of the client according to the impulse response comprises the following steps:

determining an average value of impulse responses according to the impulse responses of the target recording signal fragments in the target recording signals;

and obtaining the frequency response information of the stoping system of the client based on the average value of the impulse response.

In one embodiment, two adjacent test signal segments in the test signal are spliced through a mute segment; the determining the recording signal segments corresponding to the test signal segments in the recording signals collected by the stoping system of the client comprises the following steps:

Acquiring a plurality of non-mute candidate fragments in the recording signal, and determining a test signal fragment corresponding to each candidate fragment; the candidate fragments are acquired in the process of playing the test signal fragments corresponding to the candidate fragments;

determining a segment duration difference value based on the segment duration of the candidate segment and the segment duration of the test signal segment corresponding to the candidate segment;

and determining the candidate fragments with the fragment duration difference value smaller than a threshold value as recording signal fragments.

In one embodiment, before the target recording signal corresponding to the test signal in the recording signal collected by the extraction system of the obtaining client, the method further includes:

acquiring a plurality of test signal fragments; each test signal segment is generated based on a maximum sequence or an exponential sweep sequence;

splicing the plurality of test signal fragments to obtain a test signal, and sending a test instruction carrying the test signal to a client;

the test instruction is used for indicating the client to play the test signal based on a preset play mode, and collecting the played test signal to obtain a recording signal; the preset playing mode is a playing mode used for playing accompaniment audio when a recorded audio signal is acquired.

In a second aspect, the present application further provides a recording signal processing method, where the method includes:

acquiring a record singing audio signal; the recorded audio signal is an audio signal acquired by a client when the accompaniment audio is played;

determining frequency response information of a stoping system of the client; the frequency response information of the client is acquired according to the stoping system information acquisition method according to any one of the above;

performing suppression processing on a target audio signal in the record audio signal according to the frequency response information to obtain a processed record audio signal; the target audio signal is an audio signal obtained after the client terminal stopes the accompaniment audio.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the stope system information acquisition method as defined in any one of the above or the recorded signal processing method as defined above when executing the computer program.

In a fourth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the recovery system information acquisition method according to any one of the preceding claims or the recorded signal processing method as described above.

In a fifth aspect, the present application also provides a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the stope system information acquisition method as set forth in any one of the above or the recorded signal processing method as set forth above.

The method for acquiring the information of the stoping system, the method for processing the recorded signals, the computer equipment and the computer program product can acquire target recorded signals corresponding to the test signals in the recorded signals acquired by the stoping system of the client, wherein the recorded signals are acquired by the stoping system of the client in the process of playing the test signals by the client; and then, the cross-correlation processing can be carried out on the target recording signal and the test signal, so that the correlation between each signal sample point in the test signal and the target recording signal is obtained, the impulse response of the recovery system of the client when the target recording signal is acquired is determined according to the correlation between a plurality of signal sample points in the test signal and the target recording signal, and the frequency response information of the recovery system of the client is determined according to the impulse response. According to the method and the device for obtaining the audio signal, the target audio signal corresponding to the test signal in the audio signal collected by the client extraction system is obtained, and according to the processing result of cross-correlation processing of the target audio signal and the test signal, the frequency response information of the client when the audio signal played by the client is extracted can be rapidly and conveniently determined, resources consumed for obtaining the equipment frequency response information are effectively reduced, and the obtaining efficiency of the frequency response information is improved.

Drawings

FIG. 1 is a flow chart of a method for acquiring parameters of a recovery system according to an embodiment;

FIG. 2a is a schematic diagram of a spliced test signal according to one embodiment;

FIG. 2b is a schematic diagram of a recovery system according to one embodiment;

FIG. 2c is a schematic diagram of another spliced test signal according to one embodiment;

FIG. 2d is a schematic diagram of another spliced test signal according to one embodiment;

FIG. 3a is a schematic diagram of a signal correlation sequence in one embodiment;

FIG. 3b is a schematic diagram of an impulse response in one embodiment;

FIG. 3c is a schematic diagram of a system frequency response in one embodiment;

FIG. 4a is a schematic diagram of another signal correlation sequence in one embodiment;

FIG. 4b is a schematic diagram of another impulse response in one embodiment;

FIG. 4c is a schematic diagram of another system frequency response in one embodiment;

FIG. 5a is a schematic diagram of another impulse response in one embodiment;

FIG. 5b is a schematic diagram of another system frequency response in one embodiment;

FIG. 6a is a schematic diagram of an audio signal after low pass filtering and zero phase delay processing according to an embodiment;

FIG. 6b is a schematic diagram of a recorded signal segment in one embodiment;

FIG. 7 is a flow chart illustrating steps for determining an impulse response in one embodiment;

FIG. 8 is a flow chart of a method of suppressing a stope signal according to an embodiment;

FIG. 9 is an internal block diagram of a computer device in one embodiment;

FIG. 10 is an internal block diagram of another computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a method for acquiring parameters of a recovery system is provided, where the method is applied to a server for illustrating the method, it is understood that the method may also be applied to a terminal, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

s101, acquiring a target recording signal corresponding to a test signal in recording signals acquired by a stoping system of a client; the recording signal is collected by the stoping system of the client in the process of playing the test signal by the client.

The test signal may be excitation information used in determining the impulse response of the system, and the test signal is subjected to correlation processing with the response (such as a recording signal) of the system, so that the impulse response of the system can be determined according to the processing result of the correlation processing. In an alternative embodiment, the test information may be a pseudo-random signal, such as a test signal generated based on a maximum length sequence (Maximum length sequence, MLS), also known as an M-sequence, a maximum length sequence, or an M-sequence; alternatively, the test signal may be generated based on an exponential sweep (ESS) sequence.

The process of playing the audio signal through the client and collecting the audio signal played by the client can be understood as a process of extracting the audio signal by the client, wherein the system for collecting the audio signal can understand an extracting system, the input of the extracting system is the audio signal played by the client, and the output is a signal obtained by extracting the audio signal played by the client. In the application, the frequency response information of the client extraction system can be identified, and when the client performs song recording subsequently, the interference signals extracted in the recording process can be suppressed or eliminated based on the frequency response information of the extraction system.

In practical application, the extraction system of the client can acquire and play the test signal, and the test signal can be sent to the client by the server after the client sends the test request to the server, or can be pre-stored in the client, or can be obtained after signal processing (such as sound effect processing and the like) is performed by the client. In the process of playing the test signals, the extraction system of the client can collect the played test signals at the same time to obtain recording signals.

In the process of playing the test signals, the client may collect signals other than the test signals at the same time, so that interference signals or blank signals other than the test signals exist in the record signals, after the record signals collected by the stoping system of the client are obtained, the record signals can be analyzed, and target record signals corresponding to the test signals in the record signals are determined, wherein the target record signals can be record signals obtained after the client collects the test signals played by the client, namely, the signal source of the target record signals is the test signals played by the client. In an example, the client may be triggered to play the test signal in a quiet environment, reducing interference signals in the recorded signal.

S102, performing cross-correlation processing on the target recording signal and the test signal to obtain the correlation between each signal sample point in the test signal and the target recording signal.

After the target recording signal is obtained, a cross-correlation process can be performed on the target recording signal and the test signal, where the cross-correlation process can be understood as determining a correlation between the target recording signal and the test signal, specifically, the test signal includes a plurality of signal sample points, and when the cross-correlation process is performed on the target recording signal and the test signal, the correlation between the signal sample point and the target recording signal can be determined for each signal sample point in the test signal.

In some alternative embodiments, the test signal may be multiple signals (two or more signals), for example, a dual-channel test signal, where any signal may be used to perform cross-correlation processing with the target recording signal during the cross-correlation processing, or the multiple signals may be averaged, and the signal obtained after the averaging may be used as a single signal to perform cross-correlation with the target recording signal.

S103, according to the correlation between a plurality of signal sample points in the test signal and the target recording signal, determining the impulse response of the stoping system of the client when the target recording signal is acquired.

Specifically, the target recording signal is an output obtained after the client extraction system responds to the input test signal, namely, the extracted recording signal is related to the input test signal; in this step, after obtaining the correlation between the plurality of signal samples in the test signal and the target recording signal, the impulse response of the recovery system of the client may be determined according to the correlation between the plurality of signal samples and the target recording signal.

S104, determining the frequency response information of the stoping system of the client according to the impulse response.

After the impulse response is acquired, the time-frequency domain transformation can be performed on the currently acquired impulse response according to the Fourier transformation, so that the frequency response information of the stoping system of the client, such as the system frequency response of the stoping system, can be obtained, and the parameters of the stoping system of the client can be acquired.

In this embodiment, a target recording signal corresponding to a test signal in the recording signals collected by the extraction system of the client may be obtained, where the recording signal is collected by the extraction system of the client in the process of playing the test signal by the client; and then, the cross-correlation processing can be carried out on the target recording signal and the test signal, so that the correlation between each signal sample point in the test signal and the target recording signal is obtained, the impulse response of the stoping system of the client when the target recording signal is acquired is determined according to the correlation between a plurality of signal sample points in the test signal and the target recording signal, and the frequency response information of the stoping system of the client is determined according to the impulse response. According to the method and the device for obtaining the audio signal, the target audio signal corresponding to the test signal in the audio signal collected by the client extraction system is obtained, and according to the processing result of cross-correlation processing of the target audio signal and the test signal, the frequency response information of the client when the audio signal played by the client is extracted can be rapidly and conveniently determined, resources consumed for obtaining the equipment frequency response information are effectively reduced, and the obtaining efficiency of the frequency response information is improved.

In some optional embodiments, for the acquired frequency response information, when the subsequent client records the audio, the interference signal existing in the recorded audio can be eliminated according to the frequency response information of the client stoping system, so that the audio quality is improved.

In addition, after the frequency response information of the client is obtained, the frequency response information can be utilized to simulate the stoping signal, so that training set data (such as echo cancellation (Acoustic Echo Cancelling, AEC)) can be conveniently expanded when the neural network model is trained; the method can also eliminate stoping signals to preset model equipment models to obtain clear recorded audio signals based on the frequency response information and related inverse filtering modes after homomorphic processing, and can be understood that through the scheme of the method, the recorded frequency response characteristics of different types of handheld equipment can be analyzed, the recorded frequency response characteristics of the handheld equipment can be acquired by playing test signals and triggering a client to record in a quiet environment, an EQ filter aiming at hardware frequency response loss is convenient to design, the influence of preprocessing (such as before analog-digital conversion) on the recorded frequency response is reduced, and the information of the whole frequency band of an original external signal is obtained to the greatest extent.

In one embodiment, before the target recording signal corresponding to the test signal in the recording signals collected by the stoping system of the client is obtained, the method may further include the following steps:

acquiring a plurality of test signal fragments; splicing a plurality of test signal fragments to obtain a test signal, and sending a test instruction carrying the test signal to a client;

wherein each test signal segment is generated based on a maximum sequence or an exponential sweep sequence; the test instruction is used for indicating the client to play the test signal based on a preset play mode, and collecting the played test signal to obtain a recording signal; the preset playing mode is a playing mode used for playing accompaniment audio when a recorded audio signal is acquired.

In a specific implementation, the test signal segments may be respectively constructed based on a maximum sequence and an exponential sweep sequence, each test signal segment being generated by the maximum sequence or the exponential sweep sequence.

In an alternative embodiment, for the maximum sequence, a random seed and a pseudorandom sequence generator may be used to construct the maximum pseudorandom sequence (i.e., the maximum sequence), for example, an n=13-order random seed may be used to construct the maximum pseudorandom sequence, corresponding to a sample length of: when generating a test signal segment, p=2 ζ -1=8191, a test signal segment of a maximum sequence MLS may be configured according to 3 periods, where the configured test signal segment may also be referred to as an MLS sequence, and a segment length of the test signal segment is as follows: l= 3*P = 24573 (number of samples), where the sampling rate is 512ms at 48kHz and 557ms at 44100 Hz.

For the exponential sweep sequence, the exponential sweep sequence with a preset length can be obtained as a test signal segment by constructing as follows:

wherein f _w Representing the sweep bandwidth, the maximum bandwidth may be used in one example:

f _s representing a sampling rate, which may be 44.1kHz or 48kHz, for example; t represents that the current sweep frequency signal is swept to 0-f _w The time required by the bandwidth, for example, T can be 1s or set as other time according to the actual situation; t represents a time sequence, takes discrete points with an interval of +.>

That is, the range of values of each t in the time series may be +.>

After a plurality of test signal segments generated based on the maximum sequence and the exponential sweep sequence are obtained, the plurality of test signal segments can be spliced together, and the spliced signals are used as test signals.

In an alternative embodiment, multiple test signal segments of the same excitation signal type (i.e., multiple test signal segments generated based on a maximum sequence or test signal segments generated based on an exponential sweep sequence) may be spliced together first to obtain spliced test signal segments belonging to the same excitation signal type, and then spliced test signal segments of different excitation signal types.

When the test signal fragments are spliced, the two test fragments can be connected through the mute fragment, and then different target recording signal fragments can be conveniently and rapidly positioned when target recording signals in the recording signals are determined.

Specifically, for example, based on the multiple test signal segments generated by the maximum sequence, 5 segments of MLS sequences may be spliced with 1s silence segments as intervals, that is, the segments of each MLS sequence have 1s silence segments at the first position, and silence segments with 1s interval between adjacent MLS sequences are configured to form 5 groups of MLS sequences of the final first half, and for the final segment of MLS sequences, in order to facilitate the subsequent splicing of other types of test signal segments, silence segments of 1s may be spliced after the final segment of MLS sequences, so that after the multiple test signal segments generated by the maximum sequence are spliced, the final obtained time length is about 6s+0.5s×5, and about 8.5s. The plurality of test signal segments generated based on the exponential sweep sequence can be spliced at intervals by using mute segment pairs, and the specific splicing mode can refer to the splicing mode of the plurality of test signal segments generated based on the maximum sequence, and is not described herein. The two types of spliced test signal segments may then be spliced, the splice results of which may be as shown in fig. 2 a.

After the test signal is obtained, a test instruction carrying the test signal can be sent to the client, and after the client receives the test instruction, the test signal can be played according to a preset playing mode, and the currently played test signal is collected to obtain a recording signal.

Specifically, the server may send a test signal to the client according to a preset audio signal sending manner, where the preset audio signal sending manner may be a manner that the server sends an accompaniment audio signal or an original singing audio signal to the client when recording is actually performed, and after obtaining the test signal sent by the server, the client may play the accompaniment audio signal or the original singing audio signal according to a preset play mode, where the preset play mode may be a manner that the client processes the obtained accompaniment audio signal or the original singing audio signal when obtaining the recorded audio signal.

Fig. 2b shows a schematic diagram of a stoping system, where a server may send an original audio signal (such as an accompaniment audio signal, an original singing audio signal or an original test signal) to a client, and after receiving the original audio signal, the client decodes the original audio signal and inputs the decoded audio signal into an effector to process, thereby obtaining an audio signal to be played. In an embodiment, if the original audio signal sent by the server to the client is the original test signal, the test signal in step S101 may be the test signal to be played obtained after being processed by the client effector. For example, the client may perform preprocessing according to the decoding and mixing manner of the accompaniment audio signal or the original singing audio signal, to obtain a processed test signal, which may be shown in fig. 2 c. After the client acquires the processed test signal, the client performs Digital-to-Analog conversion (DAC) processing, and then plays the converted test signal.

In addition, the client can collect the played test signal through the stoping system of the client, the environment is kept quiet in the collection process, the interference signal in the current recorded audio is reduced, in the recording signal collection process, the client can obtain the original recording signal after carrying out Analog-to-digital conversion on the collected signal through an Analog-to-Digital Converter (ADC), and the recording signal can be shown in fig. 2 d.

In this embodiment, on the one hand, a plurality of test signal segments may be used to construct a test signal, so as to reduce a test error in a frequency response information determining process, prevent a decrease in reliability of finally obtained frequency response information caused by determining frequency response information only based on an individual test signal, and improve accuracy of the frequency response information. On the other hand, the client plays the test signal according to the preset playing mode, so that the obtained system frequency response accurately reflects the system frequency response existing when the client actually acquires the recorded audio signal, and the method is beneficial to accurately removing the stoping signal in the recording process.

In one embodiment, the test signal includes a plurality of test signal segments; s101, acquiring a target recording signal corresponding to a test signal in the recording signals acquired by a stoping system of a client side, wherein the method comprises the following steps of:

and determining recording signal fragments corresponding to each test signal fragment in the recording signals collected by the stoping system of the client, and determining each recording signal fragment as a target recording signal to obtain a plurality of target recording signals.

In a specific implementation, since the test signal includes a plurality of test signal segments, and other recording contents except the test signal segments may exist in the recording signal, after the recording signal is obtained, the recording signal can be detected, and the recording signal segment corresponding to the test signal segment is determined as the target recording signal. And then, for each recording signal segment, the recording signal segment and the corresponding test signal segment can be subjected to cross-correlation processing, so that the impulse response of the stoping system of the client when each recording signal segment is acquired is obtained.

Accordingly, S104 determines, according to the impulse response, frequency response information of the stoping system of the client, which may include:

determining an average value of impulse responses according to impulse responses of target recording signal fragments in the target recording signals; and obtaining the frequency response information of the stoping system of the client based on the average value of the impulse response.

In a specific implementation, after the impulse response corresponding to each recording signal segment is obtained, the impulse responses of the plurality of recording signal segments may be subjected to an average process to obtain an average value of the impulse responses. In one embodiment, the plurality of test signal segments may include a test signal segment MLS sequence constructed based on a maximum-length sequence and a test signal segment ESS sequence constructed based on an exponential sweep sequence. Wherein, the signal correlation sequence determined based on the MLS sequence is shown in FIG. 3a, and the corresponding impulse response and system frequency response are shown in FIGS. 3b and 3 c; the signal correlation sequence determined based on the ESS sequence is shown in fig. 4a, and the corresponding impulse response and system frequency response are shown in fig. 4b and 4 c. In this step, the average value of the impulse responses obtained by averaging the impulse responses of the plurality of recording signal segments, and the corresponding system frequency response may be as shown in fig. 5a and 5b, respectively.

In this embodiment, the final frequency response information of the client stoping system can be determined by combining impulse responses corresponding to a plurality of recording signal segments, so that the accuracy of the frequency response information is effectively improved.

In one embodiment, two adjacent test signal segments in the test signal are spliced by the mute segment; determining the recording signal segment corresponding to the test signal segment in the recording signal may comprise the steps of:

Acquiring a plurality of non-mute candidate fragments in a recording signal, and determining a test signal fragment corresponding to each candidate fragment; determining a segment duration difference value based on the segment duration of the candidate segment and the segment duration of the test signal segment corresponding to the candidate segment; and determining the candidate fragments with the fragment duration difference value smaller than the threshold value as recording signal fragments.

The candidate segments are acquired in the process of playing the test signal segments corresponding to the candidate segments, in other words, the audio signals of the candidate segments can be obtained after the client extraction system responds to the test signal segments.

In a specific implementation, since the test signal segments in the test signal and the recording signal segments in the recording signal are not necessarily aligned in strict time, that is, the starting positions of the test signal segments and the corresponding recording signal segments may be different, in this embodiment, the recording signal may be first subjected to voice boundary detection (Voice Activity Detection, VAD), and the recording signal segments with sufficiently high energy in the recording signal are identified as a plurality of candidate segments for obtaining non-silence in the recording signal.

In an alternative embodiment, obtaining a plurality of candidate segments of the recorded signal that are not muted may include the steps of:

Performing low-pass filtering and zero-phase delay filtering on the recording signal to obtain a processed recording signal; and determining a plurality of signal fragments with signal energy exceeding an energy threshold in the processed recording signal as candidate fragments.

Specifically, when the voice boundary is detected, the low-pass filtering processing can be performed on the recording signal, and the envelope information of the current recording signal can be obtained by performing the low-pass filtering processing on the recording signal, so that a more reliable vad judgment result is obtained. Illustratively, the low pass filter H (z) may be as follows:

wherein, molecule [ b ] ₀ ，b ₁ ，b ₂ ]And denominator [ a ] ₀ ，a ₁ ，a ₂ ]For the filter coefficients of the second order filter, in one example, the filter term may be:

after the low-pass filtering process, the recording signal may be further subjected to zero-phase delay filtering, and the obtained processed recording signal may be shown in fig. 6a, and further, a plurality of signal segments with energy exceeding an energy threshold in the processed recording signal may be used as candidate segments, where the candidate segments (also referred to as valid periods) may be shown in fig. 6 b.

Specifically, after low-pass filtering is performed on the recording signal, a situation that the filtered recording signal is inconsistent with the original recording signal for a certain time, namely delay exists, and by performing zero-phase delay filtering, the signal delay can be reduced, so that the recording signal obtained after filtering can obtain envelope characteristics without time delay. In one example, a filter structure for zero-phase delay filtering may be as follows:

Wherein y is ¹ (n)、y ² (n) and y ³ (n) is a sub-filter, y ^ZPD (n) is a recording signal obtained after zero-phase delay filtering, a ₀ ,a ₁ ,a ₂ B ₀ ,b ₁ ,b ₂ N is the signal length of the input signal of the filter, and N is the sample index of the input signal.

After obtaining the plurality of candidate segments, since the recording signal is obtained by playing the test signal, in an alternative embodiment, according to the sequence of each test signal segment in the test signal and the sequence of each candidate segment in the recording signal, the test signal segments and the candidate segments with the same sequence are determined to have a corresponding relationship, for example, the first candidate segment and the first test signal segment, and the corresponding relationship can be determined to have.

Due to the reasons of irregular user operation or equipment vibration and the like in the actual recording process, the duration of the actually obtained recording signal segment is possibly insufficient, and if the impulse response is determined by continuously utilizing the recording signal segment, the calculation accuracy of the impulse response result is affected and the recording signal segment which is invalid is removed.

After obtaining the test signal segment corresponding to each candidate segment, obtaining the segment duration of the candidate segment and the segment duration of the test signal segment corresponding to the candidate segment, obtaining a segment duration difference value based on the segment duration of the candidate segment and the segment duration of the test signal segment, and determining the candidate segment with the segment duration difference value smaller than a threshold value as a recording signal segment; and for the candidate fragments with the fragment duration difference value larger than the threshold value, eliminating the candidate fragments, and not serving as the recording signal fragments.

In this embodiment, after a plurality of non-silent candidate segments in a recording signal are obtained, by obtaining a duration difference value between the candidate segments and a test signal segment corresponding to the candidate segments, and determining the candidate segments with the duration difference value smaller than a threshold value as the recording signal segments corresponding to the test signal segments, the frequency response information of the stoping system can be determined by using the effective recording signal segments in the recording signal, and the accuracy of the frequency response information is improved.

In one embodiment, as shown in fig. 7, S103, determining an impulse response of a recovery system of a client when collecting a target recording signal according to correlations between a plurality of signal samples in a test signal and the target recording signal, includes the following steps:

s701, determining the impulse response time period of the target recording signal according to the correlation between a plurality of signal sample points in the test signal and the target recording signal.

The impulse response period is understood to be the period from the beginning of the impulse response associated with the test signal in the target recording signal to the end of the impulse response.

In an ideal state, the first sample point in the target recording signal is the time for starting the impulse response, if the signal correlation sequence is acquired, the first sample point in the signal correlation sequence is the time for starting the system impulse response, and r (0) is the point with the largest correlation in the signal correlation sequence r (n), but in practical application, the system impulse response has a process of spreading sound energy in the air, which corresponds to a time segment, namely the physical implementation of the system impulse response, and has a physical process of time and energy accumulation; accordingly, the start position of the impulse response between the target recording signal and the test signal is also shifted.

In this embodiment, after the correlation between the plurality of signal samples in the test signal and the target recording signal is obtained, the impulse response period of the target recording signal may be determined according to the correlation corresponding to each of the plurality of signal samples.

S702, based on the correlation of each signal sample point in the impulse response time period, the impulse response of the stoping system of the client when the target recording signal is acquired is obtained.

After the impulse response time period is determined, each signal sample point in the impulse response time period can be determined, and according to the correlation between each signal sample point and the target recording signal, the impulse response of the client side stoping system when the target recording signal is acquired can be determined, specifically, for example, a corresponding sample point sequence can be obtained based on each signal sample point in the impulse response time period and the correlation between each signal sample point and the target recording signal, and the sample point sequence is used as the impulse response.

In this embodiment, the impulse response of the client when the test signal is extracted can be rapidly determined through the correlation between a plurality of signal sample points in the test signal and the target recording signal, so that the efficiency of acquiring the system frequency response information is effectively improved.

In one embodiment, S701 determining the impulse response period of the target recording signal according to the correlation between the plurality of signal samples in the test signal and the target recording signal may include the following steps:

S7011, the maximum sample point with the greatest correlation with the target recording signal is determined among the signal samples.

After the correlation between each signal sample point in the test signal and the target recording signal is obtained, the correlation between each signal sample point and the target recording signal can be compared, and the sample point with the largest correlation is taken as the largest sample point.

S7012, taking the sample point with the largest relativity change as the initial sample point in a preset N periods before the maximum sample point; n is a positive number.

After the maximum sampling point is determined, as the physical process of energy accumulation exists in the physical implementation process of the impulse response of the system, that is, before the impulse response of the system reaches the maximum value, the impulse response already exists (that is, the impulse response does not reach the maximum value instantaneously, but there is a process of gradual accumulation change), the sampling point corresponding to the starting moment of the impulse response of the system can be searched for as the starting sampling point in the preset N periods before the maximum sampling point. In one example, N may be 1, i.e., only the samples within one period need to be searched forward, avoiding a degradation in accuracy due to an excessively enlarged search range for the starting samples.

When searching for the starting sample point in N periods, the sample point with the largest correlation change can be taken as the starting sample point, wherein the sample point with the largest correlation change can be also understood as a position of the abrupt increase of the signal energy, and the energy is changed from a smaller value to a larger value at the starting sample point. In an alternative embodiment, the starting samples may be determined by:

Wherein, I·| represents taking absolute value, d _r (n)＝r(n)-r(n-1)，i _max For the maximum sample, P is one cycle.

S7013, determining a sample point with minimum relativity with the target recording signal in a preset M periods after the initial sample point as an end sample point; m is a positive number.

After the start sample point is determined, the position of the end of the system impulse response may be further identified in M periods after the start sample point, and specifically, a sample point with the smallest correlation with the target recording signal may be taken as an end sample point in M periods after the start sample point.

S7014, based on the start sample point and the end sample point, an impulse response period of the target recording signal is obtained.

After the start and end samples are acquired, the period between the start and end samples may be determined as the impulse response period of the target recording signal.

In this embodiment, by looking forward with the maximum sample point with the greatest correlation as the reference, the time when the impulse response starts to accumulate energy can be identified, and thus the starting sample point of the impulse response period can be quickly and accurately identified.

In one embodiment, S7013 determines, as an end sample, a sample having the smallest correlation with the target recording signal in a preset M periods after the start sample, including:

Acquiring a signal correlation sequence based on the correlation between each signal sample point in the recording signal and the target recording signal; performing low-pass filtering and zero-phase delay filtering on the signal correlation sequence to obtain a processed signal correlation sequence; and in a preset M periods after the maximum sampling point, taking the sampling point with the smallest correlation in the processed signal correlation sequence as an ending sampling point.

In a specific implementation, after obtaining the correlation of each of the plurality of signal samples, a signal correlation sequence may be generated based on the correlation of each signal sample in the test signal. The signal correlation sequence comprises a plurality of sample points, and the value corresponding to each sample point can represent the correlation between the corresponding sample point in the test signal and the target recording signal. In an alternative embodiment, the cross-correlation process may be performed on the target recording signal and the test signal to obtain the signal correlation sequence in the following manner:

wherein; r (n) is a signal-related sequence; x is x _ref (n) is a test signal, x _rcd (n) is a target recording signal; n represents a signal length corresponding to a signal (target recording signal or test signal) for which cross-correlation calculation is performed; n represents the index of the sample point of the independent variable (i.e. the sample point) in the finally calculated signal correlation sequence, and the value range is [0, N-1 ]M is an independent variable, and the value range is [0, N-1 ]]。

After the signal correlation sequence is obtained, the signal correlation sequence can be subjected to low-pass filtering, dead pixels or noise data in the signal correlation sequence can be eliminated by performing low-cylinder filtering, interference information in the signal correlation sequence is reduced, and envelope information of the signal correlation sequence is obtained.

After the signal correlation sequence subjected to the low-pass filtering process is obtained, zero-phase delay filtering process can be further performed on the signal correlation sequence. The processing manners of performing the low-pass filtering processing and the zero-phase delay filtering processing on the signal correlation sequence may specifically refer to the processing manners of performing the low-pass filtering processing and the zero-phase delay filtering processing on the recording signal, which are not described herein.

In the obtaining of the processed signal correlation sequence, a sample point with the smallest correlation in the processed signal correlation sequence can be used as an ending sample point in a preset M periods after the starting sample point. In one example, the following formula may be used:

wherein, in the above formula, r ^ZPD (n) is a low-pass filtered and zero-phase-delayed signal of the signal correlation sequence r (n), and P is 1 period (i.e., m=1).

In this embodiment, by performing the low-pass filtering process and the zero-phase delay filtering process on the signal correlation sequence, the delay of each sample point on the sequence can be reduced while the envelope of the signal correlation sequence is obtained, and the influence of noise data or dead points in the signal correlation sequence is reduced, so that when the sample point with the minimum correlation is searched in the processed signal correlation sequence as the ending sample point, the reliability of the ending sample point can be effectively improved.

In the related art, in order to obtain a better recording or singing effect, a user can sing while playing song audio (e.g., accompaniment or original singing). However, during the recording process, the song audio played by the client and the user's dry sound are recorded together, and the user often lacks professional audio processing equipment, so that more interference signals exist in the audio finally recorded, and the quality of the audio recorded by the client is reduced.

Based on this, in one embodiment, as shown in fig. 8, a recording signal processing method is provided, and this embodiment is illustrated by applying the method to a server, it is understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

S801, obtaining a singing audio signal; the singing audio signal is an audio signal acquired by the client when the accompaniment audio is played.

In practical application, in response to the accompaniment audio playing instruction, the client can play the accompaniment audio, collect audio signals in the accompaniment audio playing process, and meanwhile, the user can perform song singing or playing when the accompaniment audio is played, so that the client can acquire a to-be-processed record audio signal by collecting the audio signals, and send the record audio signal to the server.

S802, determining frequency response information of a stoping system of the client.

The frequency response information of the client is acquired according to the extraction system parameter acquisition method.

In a specific implementation, the client performs audio signal acquisition when playing the accompaniment audio, and the quality of the dry audio is affected when the accompaniment audio is extracted. In this step, after obtaining the recorded audio signal provided by the client, the frequency response information of the client may be determined.

In some optional embodiments, when determining the frequency response information of the client, the client may store the frequency response information of the client in advance, and send the frequency response information of the client corresponding to sending the recorded audio signal to the server; or, the frequency response information of the client can be stored in the server, and then the frequency response information of the client can be found out according to the client identifier corresponding to the client after the recorded audio signal of the client is obtained.

It can be understood that the frequency response information determined in the step can be the frequency response information obtained in real time, that is, the frequency response information uploaded by the client in real time before audio recording, and the client can obtain the frequency response information in the current recording environment before audio recording each time; of course, the historical frequency response information may also be historical frequency response information, that is, the frequency response information used in the current record can be frequency response information obtained in the past record process.

S803, the target audio signal in the recorded audio signal is subjected to inhibition processing according to the frequency response information, and the processed recorded audio signal is obtained.

The target audio signal is an audio signal obtained by the client after extracting the accompaniment audio signal of the accompaniment audio.

After obtaining the frequency response information of the client, the target audio signal in the record audio signal may be subjected to a suppression process according to the frequency response information, for example, the target audio signal in the record audio signal is eliminated, or the signal strength of the target audio signal is reduced (for example, the loudness of the target audio signal is reduced).

In this embodiment, a recording audio signal may be obtained, where the recording audio signal includes an audio signal acquired by a client when playing accompaniment audio; and the frequency response information of the client extraction system can be determined, the frequency response information can be obtained according to the extraction system parameter obtaining method, further, target audio signals in the recorded audio signals can be subjected to inhibition processing according to the frequency response information, the processed recorded audio signals are obtained, and the target audio signals are audio signals obtained after the client extracts accompaniment audio signals of accompaniment audio. In the method, the accurate and reliable frequency response information of the client is obtained, and the target sound signal in the recorded audio signal is restrained by utilizing the frequency response information, so that the quality of the audio recorded by the client can be effectively improved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing audio data and frequency response information. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a method for acquiring parameters of a recovery system or a method for processing a recording signal.

In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 10. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by the processor, implements a method for acquiring parameters of a recovery system or a method for processing a recording signal. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 9 and 10 are block diagrams of only some of the structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring a target recording signal corresponding to a test signal in the recording signals acquired by a stoping system of the client; the recording signal is collected by the client in the process of playing the test signal by the client;

In one embodiment, the computer program, when executed by the processor, further implements the steps of the other embodiments described above.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the steps of the other embodiments described above are also implemented when the processor executes a computer program.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

In one embodiment, the computer program, when executed by a processor, also implements the steps of the other embodiments described above.

determining the frequency response information of the stoping system of the client, wherein the frequency response information of the client is acquired according to the stoping system information acquisition method according to any one of the above;

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method for acquiring information of a recovery system, the method comprising:

2. The method of claim 1, wherein determining an impulse response of the recovery system of the client when the target recording signal is acquired based on correlations of a plurality of signal samples in the test signal with the target recording signal comprises:

3. The method of claim 2, wherein the determining the impulse response period of the target recording signal based on correlations of a plurality of signal samples in the test signal with the target recording signal comprises:

4. The method of claim 3, wherein the determining, as an ending sample, a sample having a smallest correlation with the target recording signal within a preset M periods after the starting sample, comprises:

5. The method of claim 1, wherein the test signal comprises a plurality of test signal segments; the method for obtaining the target recording signal corresponding to the test signal in the recording signals collected by the stoping system of the client comprises the following steps:

6. The method of claim 5, wherein adjacent two of the test signal segments are spliced by silence segments; the determining the recording signal segments corresponding to the test signal segments in the recording signals collected by the stoping system of the client comprises the following steps:

7. The method of claim 5, further comprising, prior to the target recording signal corresponding to the test signal in the recording signals collected by the recovery system of the acquisition client:

the test instruction is used for indicating the client to play the test signal based on a preset play mode and collecting the played test signal to obtain a recording signal; the preset playing mode is a playing mode used for playing accompaniment audio when a recorded audio signal is acquired.

8. A method of processing a recorded signal, the method comprising:

Determining frequency response information of a stoping system of the client, wherein the frequency response information of the client is acquired according to the method of any one of claims 1-7;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the extraction system information acquisition method of any one of claims 1 to 7 or the recorded signal processing method of claim 8.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the stope system information acquisition method of any one of claims 1 to 7 or the recorded signal processing method of claim 8.