CN112863530B

CN112863530B - Sound work generation method and device

Info

Publication number: CN112863530B
Application number: CN202110018240.4A
Authority: CN
Inventors: 熊佳; 罗箫; 马金龙; 焦南凯; 汪暾; 吴斌科; 郑泽南; 卢萧潇; 徐志坚; 谢睿; 陈光尧
Original assignee: Guangzhou Huancheng Culture Media Co ltd
Current assignee: Guangzhou Huancheng Culture Media Co ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2024-08-27
Anticipated expiration: 2041-01-07
Also published as: CN112863530A

Abstract

The invention discloses a method and a device for generating sound works, wherein the method comprises the following steps: receiving a plurality of sound recording sub-segments; according to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording subfragments to obtain a plurality of sound fragments to be synthesized; and splicing the sound clips to be synthesized to generate a target sound work. Therefore, the flexible creation of the recording clips by the user is facilitated, the diversified selection of the selection schemes such as the sound clips, the sound effects and the like is realized, and further, the sound works with more diversified sound effects and better sound quality are generated.

Description

Sound work generation method and device

Technical Field

The present invention relates to the field of audio processing technologies, and in particular, to a method and an apparatus for generating a sound work.

Background

Sound is the most natural and convenient way of communication from person to person.

With the continuous development of internet social contact, more and more social products are available in voice, and more voice playing methods are available. For example, many sound class APPs provide such functions: the user can record own sound and preset some effects, and can listen to different sound effects in trial, and finally select a satisfactory effect to generate a section of sound fragment.

However, in the existing sound APP, a user can only select one effect to generate a final sound clip in the process of recording and generating the sound clip, and the selection scheme of the sound clip and the sound effect is usually single, so that the flexible creation requirement of the user on the sound work cannot be met.

Disclosure of Invention

The invention provides a method and a device for generating sound works, which solve the technical problem that the selection scheme of sound clips and sound effects is single in the prior art, and the flexible creation requirement of users on the sound works cannot be met.

The invention provides a method for generating a sound work, which comprises the following steps:

receiving a plurality of sound recording sub-segments;

According to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording subfragments to obtain a plurality of sound fragments to be synthesized;

and splicing the sound clips to be synthesized to generate a target sound work.

Optionally, the sound effect includes a sound variation effect and a scene sound effect, and the step of respectively configuring the sound effect for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized includes:

Responding to an input sound effect configuration instruction, and selecting the sound changing effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;

And respectively configuring a plurality of recording subfragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.

Optionally, the step of generating the target sound work by performing splicing processing on the plurality of sound clips to be synthesized includes:

Splicing a plurality of sound fragments to be synthesized and carrying out short-time fade-in fade-out processing to generate an intermediate sound fragment;

And carrying out volume coordination processing on the middle sound fragment to generate a target sound work.

Optionally, the step of performing volume coordination processing on the intermediate sound clip to generate a target sound work includes:

Dividing the middle sound fragment into a plurality of frames of sound fragments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound fragment to be processed;

determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;

And carrying out dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain to generate a target sound work.

Optionally, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the step of performing dual gain smoothing operation on the to-be-processed sound clip of each frame based on the log-domain amplitude gain, and generating a target sound work includes:

calculating a first smooth gain value corresponding to a j-th frame of the sound fragment to be processed according to the logarithmic domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the sound fragment to be processed; wherein j is more than or equal to 4,j and is an integer;

performing a first gain smoothing operation on the j-th frame of the sound fragment to be processed by adopting the first smoothing gain value, so that the first smoothing processing process of the j-th frame of the sound fragment to be processed is completed;

Converting the logarithmic domain amplitude gain to the linear domain amplitude gain;

determining a plurality of sampling points from a j-th frame of the smoothed sound fragment;

Calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the smooth sound fragment;

Performing a second gain smoothing operation on each sampling point by adopting the sampling point gain, so that the second smoothing processing process of the j-th frame of the sound fragment to be processed is completed;

And after the second smoothing processing process of each frame of the sound fragment to be processed is finished, obtaining the target sound work.

The invention also provides a device for generating the sound work, which comprises:

the recording sub-segment receiving module is used for receiving a plurality of recording sub-segments;

The sound effect configuration module is used for respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;

And the splicing processing module is used for carrying out splicing processing by adopting the plurality of sound fragments to be synthesized to generate a target sound work.

Optionally, the sound effect includes a sound variation effect and a scene sound effect, and the sound effect configuration module includes:

the sound effect selection submodule is used for responding to an input sound effect configuration instruction and selecting the sound effect changing and scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;

And the sound fragment to be synthesized generating sub-module is used for respectively configuring a plurality of sound recording sub-fragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.

Optionally, the splicing processing module includes:

The splicing sub-module is used for splicing the sound fragments to be synthesized and carrying out short-time fade-in fade-out processing to generate an intermediate sound fragment;

And the volume coordination processing sub-module is used for performing volume coordination processing on the middle sound fragment to generate a target sound work.

Optionally, the volume coordination processing submodule includes:

the segment dividing unit is used for dividing the middle sound segment into multiple frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segment to be processed;

the logarithmic domain amplitude gain determining unit is used for determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;

And the dual gain smoothing operation unit is used for executing dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain to generate a target sound work.

Optionally, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:

A first smooth gain value determining unit, configured to calculate a first smooth gain value corresponding to a j-th frame of the to-be-processed sound fragment according to the logarithmic domain amplitude gain corresponding to the j-th frame of the to-be-processed sound fragment and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the to-be-processed sound fragment; wherein j is more than or equal to 4,j and is an integer;

A first gain smoothing operation execution unit configured to execute a first gain smoothing operation on the j-th frame of the sound fragment to be processed using the first smoothing gain value, so that the first smoothing process of the j-th frame of the sound fragment to be processed is completed;

A gain conversion unit configured to convert the logarithmic-domain amplitude gain into the linear-domain amplitude gain;

A sampling point determining unit configured to determine a plurality of sampling points from a j-th frame of the smoothed sound fragment;

a sampling point gain determining unit, configured to calculate a sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the smooth sound segment;

A second gain smoothing operation execution unit configured to execute a second gain smoothing operation for each of the sampling points using the sampling point gain so that the second smoothing processing of the j-th frame of the sound fragment to be processed is completed;

And the target sound work generating unit is used for obtaining the target sound work after the second smoothing processing process of each frame of the sound fragment to be processed is completed.

From the above technical scheme, the invention has the following advantages:

According to the method, a plurality of sound recording subfragments input by a user are received, corresponding sound effects are respectively configured for each sound recording subfragment according to sound effect configuration instructions input by the user for each sound recording subfragment, a plurality of sound fragments to be synthesized are obtained, and finally the plurality of sound fragments to be synthesized are adopted for splicing processing so as to generate a target sound work. The method solves the technical problems that the selection scheme of sound clips and sound effects in the prior art is single, and the flexible creation requirement of users on sound works cannot be met. The method is convenient for users to flexibly create the recording clips, realizes diversified selection of selection schemes such as sound clips, sound effects and the like, and further generates sound works with more diversified sound effects and better sound quality.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a flowchart of steps of a method for generating a sound work according to a first embodiment of the present invention;

fig. 2 is a flowchart of steps of a method for generating a sound work according to a second embodiment of the present invention;

FIG. 3 is a schematic diagram of a dynamic range control curve according to a second embodiment of the present invention;

fig. 4 is a block diagram of a sound work generating apparatus according to a third embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a method and a device for generating sound works, which are used for solving the technical problem that the selection scheme of sound clips and sound effects is single in the prior art, and the flexible creation requirement of users on the sound works cannot be met. The user can record a section of multi-person speech by a single person, and different sound effects are adopted for each sub-section by selecting different sound processing methods, so that a sound work of 'multi-person dialogue' is finally synthesized. The whole process is like dubbing of sound optimization to some film and television works, and the later stage is a process of splicing the film and television works into a complete work. The process usually needs to be completed by cooperation of multiple persons, dubbing, editing and synthesizing, and the whole process is simplified by setting different sound effects such as sound variation on sound sub-segments, so that a single user can automatically generate a sound dialogue work by simple operation.

In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a method for generating a sound work according to an embodiment of the present invention.

step 101, receiving a plurality of recording subfragments;

In the embodiment of the invention, a plurality of recording sub-clips can be recorded by a single user or by a plurality of users.

It should be noted that each recording sub-section may have the same recording duration, or may have different recording durations, which is not limited in the embodiment of the present invention.

Step 102, according to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording sub-segments to obtain a plurality of sound segments to be synthesized;

After the plurality of recording sub-segments are obtained, corresponding sound effects can be respectively configured for the plurality of recording sub-segments based on the sound effect configuration instruction input by the user, so that each recording sub-segment can be provided with the scene sound effect and the character sound changing effect which meet the requirements, and a plurality of sound segments to be synthesized are obtained.

And step 103, splicing the sound clips to be synthesized to generate a target sound work.

After a plurality of sound clips to be synthesized are obtained, because the different sound clips to be synthesized have different conditions when recording or when setting sound effects, the defects of different sound sizes, different sound effects, irregular endpoints and the like may exist, so that the plurality of sound clips to be synthesized are further spliced, the butt joint positions of the sound clips are subjected to smoothing, post-processing and the like, and a target sound work is generated.

In the embodiment of the invention, a plurality of sound clips to be synthesized are obtained by receiving a plurality of sound recording sub-clips input by a user, and then configuring corresponding sound effects for each sound recording sub-clip according to sound effect configuration instructions input by the user for each sound recording sub-clip, and finally splicing the plurality of sound clips to be synthesized to generate a target sound work. The method solves the technical problems that the selection scheme of sound clips and sound effects in the prior art is single, and the flexible creation requirement of users on sound works cannot be met. The method is convenient for users to flexibly create the recording clips, realizes diversified selection of selection schemes such as sound clips, sound effects and the like, and further generates sound works with more diversified sound effects and better sound quality.

Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a method for generating a sound work according to a second embodiment of the present invention.

step 201, receiving a plurality of recording sub-segments;

in the embodiment of the present invention, the implementation process of step 201 is similar to that of step 101, and will not be repeated here.

Step 202, according to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording sub-segments to obtain a plurality of sound segments to be synthesized;

Optionally, the sound effects include a sound variation effect and a scene sound effect, and step 202 may include the following sub-steps:

In the embodiment of the invention, after a plurality of recording sub-segments are acquired, sound changing effects and scene sound effects corresponding to the sound effect configuration instructions can be selected from a preset sound effect library in response to the sound effect configuration instructions input by a user, the voice of a person in the recording sub-segments is changed according to the corresponding sound changing effects, and then the corresponding scene sound effects are configured for the background sound of the recording sub-segments, so that a plurality of corresponding sound segments to be synthesized are obtained.

The sound effect library may store various sound effects, such as a niu sound, a girl sound, a front-to-back sound, a large-to-tertiary sound, a thrill sound, a monster sound, a robot sound, a strange sound, a heavy mechanical sound, an electric sound, etc., and various scene sound effects, such as a bird song, a tidal sound, a light music, a tension atmosphere sound, a thrill atmosphere sound, etc., which are not limited in the embodiment of the present invention.

In the embodiment of the present invention, the technical features of step 103 in the above-mentioned embodiment "the step of performing the splicing process by using a plurality of the sound segments to be synthesized" to generate the target sound work "may be replaced by the following steps 203 to 204:

step 203, splicing a plurality of sound clips to be synthesized and performing short-time fade-in fade-out processing to generate an intermediate sound clip;

since the process of recording sound is actively controlled by the user, the initiation and termination are typically not gradual processes, and if adjacent sound clips are spliced directly, noise, such as "pyridazine", may be generated at the splice.

In order to avoid the occurrence of multiple splicing noise in the finally generated sound works, when the embodiment of the invention adopts a plurality of sound fragments to be synthesized for splicing, short-time fade-in fade-out processing can be adopted, so that the sound is gradually increased at the beginning of each sound fragment, and gradually decreased at the end, and the adjacent sound fragments can have a smooth transition during splicing, so that no noise is generated. And then splicing the sound clips to be synthesized in sequence to generate the middle sound clip.

Alternatively, in order not to affect the effect of the main body sound, the duration of the fade-in and fade-out processing is controlled to be 100ms, or set by a technician according to the scene requirement, which is not limited by the embodiment of the present invention.

And 204, performing volume coordination processing on the middle sound fragment to generate a target sound work.

After the sound clips are spliced together, there is also a need to perform a volume coordination process, i.e., dynamic range control of the sound, on the overall sound work. To achieve both the effect and the improvement of the processing efficiency, step 204 may comprise the following sub-steps S1-S3:

s1, dividing the middle sound fragment into a plurality of frames of sound fragments to be processed according to a preset time length, and calculating the input frame amplitude of each frame of sound fragment to be processed;

further, after the intermediate sound fragment is obtained, dividing the intermediate sound fragment into multiple frames of sound fragments to be processed according to a preset time length, and simultaneously calculating the input frame amplitude of each frame of sound fragment to be processed.

In a specific implementation, taking a frame division with a preset time length of 10ms as an example, calculating the amplitude A of an input frame, two calculation modes of peak or rms can be adopted, and the calculation modes are respectively shown as the following formula (1) and the formula (2):

A_peak＝max{|x_i||i＝1,2,...,N} (1)

Where N is the number of sampling points per frame of data, |x _i | is the absolute value representing the magnitude of the i-th sampling point. A _peak is the maximum point of the absolute value of the amplitude of each frame of data, and a _rms is the root mean square of each frame of data.

And according to the psychoacoustic principle of the human ear, the response of the human ear to the loudness of the signal is approximately logarithmic, rather than linear, with the amplitude of the signal. The input frame amplitude a in the linear domain can be converted into the input frame amplitude a _dB in the logarithmic domain where the gain value to be adjusted for the sound fragment to be processed is calculated.

Taking the sample data as 16-bit wide example, the conversion formula is shown as formula (3):

A_dB＝20*log(A/32768) (3)

wherein 32768 is the maximum absolute value that can be represented by a 16bit width, and 2 ¹⁵ = 32768,1 bits are sign bits.

S2, determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;

The dynamic range (DYNAMIC RANGE) is the ratio of the maximum and minimum values of a variable signal (e.g., sound or light). It can also be expressed in terms of 10 base logarithms (decibels) or 2 base logarithms.

In a specific implementation, a corresponding dynamic range control curve may be configured based on user input, as shown in fig. 3, where the dynamic range control curve includes an input amplitude a and an output amplitude B, the unit is dB, the straight line portion is a dynamic range control curve of an unregulated input amplitude and an unregulated output amplitude, and the curve portion is a dynamic range control curve of an regulated input amplitude and an regulated output amplitude according to an embodiment of the present invention.

Each corresponding output amplitude B can be obtained based on each input amplitude a, so as to calculate the required logarithmic-domain amplitude gain delta which needs to be increased or decreased, and the formula is shown as formula (4).

Δ＝B_dB-A_dB (4)

And S3, performing dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain, and generating an intermediate sound fragment.

Further, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and step S4 may include the sub-steps of:

In one example of the present invention, in order to smooth the transition from frame to frame of the signal after adjustment without generating broken noise, a dual gain smoothing operation is required between adjacent frames, including a first smoothing process and a second smoothing process.

The first smoothing process: after the logarithmic domain gain is calculated, performing gain smoothing once, and adopting weighted sliding average of the current jth frame and the historical 3 frames to obtain the smoothed current jth frame gain, wherein the calculation mode is as shown in the formula (5):

Where θ _k denotes a weighting coefficient of the kth frame, and Δ _k denotes an logarithmic-domain gain value of the kth frame.

In a specific implementation, the number of frames for smoothing with the gain may be 40-60 ms long, or set by the technician according to the debugging, which is not limited by the embodiment of the present invention.

Since the amplitude representation of the input sound fragment to be processed and the output target sound work are in a linear domain, after the first smoothing process is executed, the gain value delta in the logarithmic domain is converted into the gain delta in the linear domain, so that the gain processing can be performed on the input signal, and the conversion mode is shown as the formula (6):

When the second smoothing process is executed, the gain difference alpha _j of the current jth frame relative to the jth-1 frame and the gain increment alpha _j/N of each point (N is the number of data points of each frame) are firstly obtained, the gain g _j,i of each sampling point of the jth frame is finally obtained, and the gain g _j,i is multiplied by the sampling point x _j,i corresponding to the input signal, so that the target sound work y _j,i is obtained.

α_j＝δ_j-δ_j-1 (7)

g_j,i＝δ_j-1+i*α_j/N (8)

y_j,i＝g_j,i*x_j,i (9)

Where i represents the i-th sample point in each frame of the smoothed sound fragment.

Alternatively, the above manner of multi-segment dynamic range control may be used not only for file-processed non-real-time scenes, but also for frame-processed real-time scenes. For non-real-time scenes processed according to files, further sound volume adjustment can be performed according to the conditions of the whole audio files, so that the whole hearing feeling after the multi-section sound is spliced is more harmonious.

Referring to fig. 4, fig. 4 is a block diagram illustrating a sound work generating apparatus according to a third embodiment of the present invention.

The invention provides a device for generating sound works, which comprises:

A recording sub-segment receiving module 401, configured to receive a plurality of recording sub-segments;

the sound effect configuration module 402 is configured to configure sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction, so as to obtain a plurality of sound segments to be synthesized;

And the splicing processing module 403 is configured to perform splicing processing by using a plurality of to-be-synthesized sound clips, so as to generate a target sound work.

Optionally, the sound effects include a sound variation effect and a scene sound effect, and the sound effect configuration module 402 includes:

Optionally, the splicing processing module 403 includes:

Optionally, the volume coordination processing submodule includes:

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of generating a sound work, comprising:

receiving a plurality of sound recording sub-segments; the recording sub-segments have different recording time lengths;

splicing the sound clips to be synthesized to generate a target sound work;

The step of generating the target sound work by adopting a plurality of sound clips to be synthesized to carry out splicing processing comprises the following steps:

Performing volume coordination processing on the middle sound fragment to generate a target sound work;

the step of performing volume coordination processing on the intermediate sound clip to generate a target sound work comprises the following steps:

2. The method of generating a sound work according to claim 1, wherein the sound effect includes a sound variation effect and a scene sound effect, and the step of respectively configuring the sound effect for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized includes:

3. The method of generating a sound work according to claim 1, wherein the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the step of performing the dual gain smoothing operation on the sound clip to be processed per frame based on the log-domain amplitude gain, generates a target sound work includes:

Converting the logarithmic domain amplitude gain to a linear domain amplitude gain;

determining a plurality of sampling points from a j-th frame of the sound fragment to be processed;

Calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed;

4. A sound work generating apparatus comprising:

the recording sub-segment receiving module is used for receiving a plurality of recording sub-segments; the recording sub-segments have different recording time lengths;

the splicing processing module is used for carrying out splicing processing on the sound clips to be synthesized to generate a target sound work;

The splicing processing module comprises:

The volume coordination processing sub-module is used for performing volume coordination processing on the middle sound fragment to generate a target sound work;

the volume coordination processing submodule comprises:

5. The apparatus for generating a sound work according to claim 4, wherein the sound effects include a sound varying effect and a scene sound effect, and the sound effect configuration module includes:

6. The apparatus for generating a sound work according to claim 4, wherein the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:

a gain conversion unit for converting the logarithmic domain amplitude gain into a linear domain amplitude gain;

A sampling point determining unit configured to determine a plurality of sampling points from a j-th frame of the sound fragment to be processed;

A sampling point gain determining unit, configured to calculate a sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed;