CN112863530B - Sound work generation method and device - Google Patents
Sound work generation method and device Download PDFInfo
- Publication number
- CN112863530B CN112863530B CN202110018240.4A CN202110018240A CN112863530B CN 112863530 B CN112863530 B CN 112863530B CN 202110018240 A CN202110018240 A CN 202110018240A CN 112863530 B CN112863530 B CN 112863530B
- Authority
- CN
- China
- Prior art keywords
- sound
- gain
- frame
- processed
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a method and a device for generating sound works, wherein the method comprises the following steps: receiving a plurality of sound recording sub-segments; according to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording subfragments to obtain a plurality of sound fragments to be synthesized; and splicing the sound clips to be synthesized to generate a target sound work. Therefore, the flexible creation of the recording clips by the user is facilitated, the diversified selection of the selection schemes such as the sound clips, the sound effects and the like is realized, and further, the sound works with more diversified sound effects and better sound quality are generated.
Description
Technical Field
The present invention relates to the field of audio processing technologies, and in particular, to a method and an apparatus for generating a sound work.
Background
Sound is the most natural and convenient way of communication from person to person.
With the continuous development of internet social contact, more and more social products are available in voice, and more voice playing methods are available. For example, many sound class APPs provide such functions: the user can record own sound and preset some effects, and can listen to different sound effects in trial, and finally select a satisfactory effect to generate a section of sound fragment.
However, in the existing sound APP, a user can only select one effect to generate a final sound clip in the process of recording and generating the sound clip, and the selection scheme of the sound clip and the sound effect is usually single, so that the flexible creation requirement of the user on the sound work cannot be met.
Disclosure of Invention
The invention provides a method and a device for generating sound works, which solve the technical problem that the selection scheme of sound clips and sound effects is single in the prior art, and the flexible creation requirement of users on the sound works cannot be met.
The invention provides a method for generating a sound work, which comprises the following steps:
receiving a plurality of sound recording sub-segments;
According to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording subfragments to obtain a plurality of sound fragments to be synthesized;
and splicing the sound clips to be synthesized to generate a target sound work.
Optionally, the sound effect includes a sound variation effect and a scene sound effect, and the step of respectively configuring the sound effect for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized includes:
Responding to an input sound effect configuration instruction, and selecting the sound changing effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
And respectively configuring a plurality of recording subfragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.
Optionally, the step of generating the target sound work by performing splicing processing on the plurality of sound clips to be synthesized includes:
Splicing a plurality of sound fragments to be synthesized and carrying out short-time fade-in fade-out processing to generate an intermediate sound fragment;
And carrying out volume coordination processing on the middle sound fragment to generate a target sound work.
Optionally, the step of performing volume coordination processing on the intermediate sound clip to generate a target sound work includes:
Dividing the middle sound fragment into a plurality of frames of sound fragments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound fragment to be processed;
determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
And carrying out dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain to generate a target sound work.
Optionally, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the step of performing dual gain smoothing operation on the to-be-processed sound clip of each frame based on the log-domain amplitude gain, and generating a target sound work includes:
calculating a first smooth gain value corresponding to a j-th frame of the sound fragment to be processed according to the logarithmic domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the sound fragment to be processed; wherein j is more than or equal to 4,j and is an integer;
performing a first gain smoothing operation on the j-th frame of the sound fragment to be processed by adopting the first smoothing gain value, so that the first smoothing processing process of the j-th frame of the sound fragment to be processed is completed;
Converting the logarithmic domain amplitude gain to the linear domain amplitude gain;
determining a plurality of sampling points from a j-th frame of the smoothed sound fragment;
Calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the smooth sound fragment;
Performing a second gain smoothing operation on each sampling point by adopting the sampling point gain, so that the second smoothing processing process of the j-th frame of the sound fragment to be processed is completed;
And after the second smoothing processing process of each frame of the sound fragment to be processed is finished, obtaining the target sound work.
The invention also provides a device for generating the sound work, which comprises:
the recording sub-segment receiving module is used for receiving a plurality of recording sub-segments;
The sound effect configuration module is used for respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;
And the splicing processing module is used for carrying out splicing processing by adopting the plurality of sound fragments to be synthesized to generate a target sound work.
Optionally, the sound effect includes a sound variation effect and a scene sound effect, and the sound effect configuration module includes:
the sound effect selection submodule is used for responding to an input sound effect configuration instruction and selecting the sound effect changing and scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
And the sound fragment to be synthesized generating sub-module is used for respectively configuring a plurality of sound recording sub-fragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.
Optionally, the splicing processing module includes:
The splicing sub-module is used for splicing the sound fragments to be synthesized and carrying out short-time fade-in fade-out processing to generate an intermediate sound fragment;
And the volume coordination processing sub-module is used for performing volume coordination processing on the middle sound fragment to generate a target sound work.
Optionally, the volume coordination processing submodule includes:
the segment dividing unit is used for dividing the middle sound segment into multiple frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segment to be processed;
the logarithmic domain amplitude gain determining unit is used for determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
And the dual gain smoothing operation unit is used for executing dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain to generate a target sound work.
Optionally, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:
A first smooth gain value determining unit, configured to calculate a first smooth gain value corresponding to a j-th frame of the to-be-processed sound fragment according to the logarithmic domain amplitude gain corresponding to the j-th frame of the to-be-processed sound fragment and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the to-be-processed sound fragment; wherein j is more than or equal to 4,j and is an integer;
A first gain smoothing operation execution unit configured to execute a first gain smoothing operation on the j-th frame of the sound fragment to be processed using the first smoothing gain value, so that the first smoothing process of the j-th frame of the sound fragment to be processed is completed;
A gain conversion unit configured to convert the logarithmic-domain amplitude gain into the linear-domain amplitude gain;
A sampling point determining unit configured to determine a plurality of sampling points from a j-th frame of the smoothed sound fragment;
a sampling point gain determining unit, configured to calculate a sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the smooth sound segment;
A second gain smoothing operation execution unit configured to execute a second gain smoothing operation for each of the sampling points using the sampling point gain so that the second smoothing processing of the j-th frame of the sound fragment to be processed is completed;
And the target sound work generating unit is used for obtaining the target sound work after the second smoothing processing process of each frame of the sound fragment to be processed is completed.
From the above technical scheme, the invention has the following advantages:
According to the method, a plurality of sound recording subfragments input by a user are received, corresponding sound effects are respectively configured for each sound recording subfragment according to sound effect configuration instructions input by the user for each sound recording subfragment, a plurality of sound fragments to be synthesized are obtained, and finally the plurality of sound fragments to be synthesized are adopted for splicing processing so as to generate a target sound work. The method solves the technical problems that the selection scheme of sound clips and sound effects in the prior art is single, and the flexible creation requirement of users on sound works cannot be met. The method is convenient for users to flexibly create the recording clips, realizes diversified selection of selection schemes such as sound clips, sound effects and the like, and further generates sound works with more diversified sound effects and better sound quality.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flowchart of steps of a method for generating a sound work according to a first embodiment of the present invention;
fig. 2 is a flowchart of steps of a method for generating a sound work according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a dynamic range control curve according to a second embodiment of the present invention;
fig. 4 is a block diagram of a sound work generating apparatus according to a third embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method and a device for generating sound works, which are used for solving the technical problem that the selection scheme of sound clips and sound effects is single in the prior art, and the flexible creation requirement of users on the sound works cannot be met. The user can record a section of multi-person speech by a single person, and different sound effects are adopted for each sub-section by selecting different sound processing methods, so that a sound work of 'multi-person dialogue' is finally synthesized. The whole process is like dubbing of sound optimization to some film and television works, and the later stage is a process of splicing the film and television works into a complete work. The process usually needs to be completed by cooperation of multiple persons, dubbing, editing and synthesizing, and the whole process is simplified by setting different sound effects such as sound variation on sound sub-segments, so that a single user can automatically generate a sound dialogue work by simple operation.
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a method for generating a sound work according to an embodiment of the present invention.
The invention provides a method for generating a sound work, which comprises the following steps:
step 101, receiving a plurality of recording subfragments;
In the embodiment of the invention, a plurality of recording sub-clips can be recorded by a single user or by a plurality of users.
It should be noted that each recording sub-section may have the same recording duration, or may have different recording durations, which is not limited in the embodiment of the present invention.
Step 102, according to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording sub-segments to obtain a plurality of sound segments to be synthesized;
After the plurality of recording sub-segments are obtained, corresponding sound effects can be respectively configured for the plurality of recording sub-segments based on the sound effect configuration instruction input by the user, so that each recording sub-segment can be provided with the scene sound effect and the character sound changing effect which meet the requirements, and a plurality of sound segments to be synthesized are obtained.
And step 103, splicing the sound clips to be synthesized to generate a target sound work.
After a plurality of sound clips to be synthesized are obtained, because the different sound clips to be synthesized have different conditions when recording or when setting sound effects, the defects of different sound sizes, different sound effects, irregular endpoints and the like may exist, so that the plurality of sound clips to be synthesized are further spliced, the butt joint positions of the sound clips are subjected to smoothing, post-processing and the like, and a target sound work is generated.
In the embodiment of the invention, a plurality of sound clips to be synthesized are obtained by receiving a plurality of sound recording sub-clips input by a user, and then configuring corresponding sound effects for each sound recording sub-clip according to sound effect configuration instructions input by the user for each sound recording sub-clip, and finally splicing the plurality of sound clips to be synthesized to generate a target sound work. The method solves the technical problems that the selection scheme of sound clips and sound effects in the prior art is single, and the flexible creation requirement of users on sound works cannot be met. The method is convenient for users to flexibly create the recording clips, realizes diversified selection of selection schemes such as sound clips, sound effects and the like, and further generates sound works with more diversified sound effects and better sound quality.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a method for generating a sound work according to a second embodiment of the present invention.
The invention provides a method for generating a sound work, which comprises the following steps:
step 201, receiving a plurality of recording sub-segments;
in the embodiment of the present invention, the implementation process of step 201 is similar to that of step 101, and will not be repeated here.
Step 202, according to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording sub-segments to obtain a plurality of sound segments to be synthesized;
Optionally, the sound effects include a sound variation effect and a scene sound effect, and step 202 may include the following sub-steps:
Responding to an input sound effect configuration instruction, and selecting the sound changing effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
And respectively configuring a plurality of recording subfragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.
In the embodiment of the invention, after a plurality of recording sub-segments are acquired, sound changing effects and scene sound effects corresponding to the sound effect configuration instructions can be selected from a preset sound effect library in response to the sound effect configuration instructions input by a user, the voice of a person in the recording sub-segments is changed according to the corresponding sound changing effects, and then the corresponding scene sound effects are configured for the background sound of the recording sub-segments, so that a plurality of corresponding sound segments to be synthesized are obtained.
The sound effect library may store various sound effects, such as a niu sound, a girl sound, a front-to-back sound, a large-to-tertiary sound, a thrill sound, a monster sound, a robot sound, a strange sound, a heavy mechanical sound, an electric sound, etc., and various scene sound effects, such as a bird song, a tidal sound, a light music, a tension atmosphere sound, a thrill atmosphere sound, etc., which are not limited in the embodiment of the present invention.
In the embodiment of the present invention, the technical features of step 103 in the above-mentioned embodiment "the step of performing the splicing process by using a plurality of the sound segments to be synthesized" to generate the target sound work "may be replaced by the following steps 203 to 204:
step 203, splicing a plurality of sound clips to be synthesized and performing short-time fade-in fade-out processing to generate an intermediate sound clip;
since the process of recording sound is actively controlled by the user, the initiation and termination are typically not gradual processes, and if adjacent sound clips are spliced directly, noise, such as "pyridazine", may be generated at the splice.
In order to avoid the occurrence of multiple splicing noise in the finally generated sound works, when the embodiment of the invention adopts a plurality of sound fragments to be synthesized for splicing, short-time fade-in fade-out processing can be adopted, so that the sound is gradually increased at the beginning of each sound fragment, and gradually decreased at the end, and the adjacent sound fragments can have a smooth transition during splicing, so that no noise is generated. And then splicing the sound clips to be synthesized in sequence to generate the middle sound clip.
Alternatively, in order not to affect the effect of the main body sound, the duration of the fade-in and fade-out processing is controlled to be 100ms, or set by a technician according to the scene requirement, which is not limited by the embodiment of the present invention.
And 204, performing volume coordination processing on the middle sound fragment to generate a target sound work.
After the sound clips are spliced together, there is also a need to perform a volume coordination process, i.e., dynamic range control of the sound, on the overall sound work. To achieve both the effect and the improvement of the processing efficiency, step 204 may comprise the following sub-steps S1-S3:
s1, dividing the middle sound fragment into a plurality of frames of sound fragments to be processed according to a preset time length, and calculating the input frame amplitude of each frame of sound fragment to be processed;
further, after the intermediate sound fragment is obtained, dividing the intermediate sound fragment into multiple frames of sound fragments to be processed according to a preset time length, and simultaneously calculating the input frame amplitude of each frame of sound fragment to be processed.
In a specific implementation, taking a frame division with a preset time length of 10ms as an example, calculating the amplitude A of an input frame, two calculation modes of peak or rms can be adopted, and the calculation modes are respectively shown as the following formula (1) and the formula (2):
Apeak=max{|xi||i=1,2,...,N} (1)
Where N is the number of sampling points per frame of data, |x i | is the absolute value representing the magnitude of the i-th sampling point. A peak is the maximum point of the absolute value of the amplitude of each frame of data, and a rms is the root mean square of each frame of data.
And according to the psychoacoustic principle of the human ear, the response of the human ear to the loudness of the signal is approximately logarithmic, rather than linear, with the amplitude of the signal. The input frame amplitude a in the linear domain can be converted into the input frame amplitude a dB in the logarithmic domain where the gain value to be adjusted for the sound fragment to be processed is calculated.
Taking the sample data as 16-bit wide example, the conversion formula is shown as formula (3):
AdB=20*log(A/32768) (3)
wherein 32768 is the maximum absolute value that can be represented by a 16bit width, and 2 15 = 32768,1 bits are sign bits.
S2, determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
The dynamic range (DYNAMIC RANGE) is the ratio of the maximum and minimum values of a variable signal (e.g., sound or light). It can also be expressed in terms of 10 base logarithms (decibels) or 2 base logarithms.
In a specific implementation, a corresponding dynamic range control curve may be configured based on user input, as shown in fig. 3, where the dynamic range control curve includes an input amplitude a and an output amplitude B, the unit is dB, the straight line portion is a dynamic range control curve of an unregulated input amplitude and an unregulated output amplitude, and the curve portion is a dynamic range control curve of an regulated input amplitude and an regulated output amplitude according to an embodiment of the present invention.
Each corresponding output amplitude B can be obtained based on each input amplitude a, so as to calculate the required logarithmic-domain amplitude gain delta which needs to be increased or decreased, and the formula is shown as formula (4).
Δ=BdB-AdB (4)
And S3, performing dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain, and generating an intermediate sound fragment.
Further, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and step S4 may include the sub-steps of:
calculating a first smooth gain value corresponding to a j-th frame of the sound fragment to be processed according to the logarithmic domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the sound fragment to be processed; wherein j is more than or equal to 4,j and is an integer;
performing a first gain smoothing operation on the j-th frame of the sound fragment to be processed by adopting the first smoothing gain value, so that the first smoothing processing process of the j-th frame of the sound fragment to be processed is completed;
Converting the logarithmic domain amplitude gain to the linear domain amplitude gain;
determining a plurality of sampling points from a j-th frame of the smoothed sound fragment;
Calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the smooth sound fragment;
Performing a second gain smoothing operation on each sampling point by adopting the sampling point gain, so that the second smoothing processing process of the j-th frame of the sound fragment to be processed is completed;
And after the second smoothing processing process of each frame of the sound fragment to be processed is finished, obtaining the target sound work.
In one example of the present invention, in order to smooth the transition from frame to frame of the signal after adjustment without generating broken noise, a dual gain smoothing operation is required between adjacent frames, including a first smoothing process and a second smoothing process.
The first smoothing process: after the logarithmic domain gain is calculated, performing gain smoothing once, and adopting weighted sliding average of the current jth frame and the historical 3 frames to obtain the smoothed current jth frame gain, wherein the calculation mode is as shown in the formula (5):
Where θ k denotes a weighting coefficient of the kth frame, and Δ k denotes an logarithmic-domain gain value of the kth frame.
In a specific implementation, the number of frames for smoothing with the gain may be 40-60 ms long, or set by the technician according to the debugging, which is not limited by the embodiment of the present invention.
Since the amplitude representation of the input sound fragment to be processed and the output target sound work are in a linear domain, after the first smoothing process is executed, the gain value delta in the logarithmic domain is converted into the gain delta in the linear domain, so that the gain processing can be performed on the input signal, and the conversion mode is shown as the formula (6):
When the second smoothing process is executed, the gain difference alpha j of the current jth frame relative to the jth-1 frame and the gain increment alpha j/N of each point (N is the number of data points of each frame) are firstly obtained, the gain g j,i of each sampling point of the jth frame is finally obtained, and the gain g j,i is multiplied by the sampling point x j,i corresponding to the input signal, so that the target sound work y j,i is obtained.
αj=δj-δj-1 (7)
gj,i=δj-1+i*αj/N (8)
yj,i=gj,i*xj,i (9)
Where i represents the i-th sample point in each frame of the smoothed sound fragment.
Alternatively, the above manner of multi-segment dynamic range control may be used not only for file-processed non-real-time scenes, but also for frame-processed real-time scenes. For non-real-time scenes processed according to files, further sound volume adjustment can be performed according to the conditions of the whole audio files, so that the whole hearing feeling after the multi-section sound is spliced is more harmonious.
In the embodiment of the invention, a plurality of sound clips to be synthesized are obtained by receiving a plurality of sound recording sub-clips input by a user, and then configuring corresponding sound effects for each sound recording sub-clip according to sound effect configuration instructions input by the user for each sound recording sub-clip, and finally splicing the plurality of sound clips to be synthesized to generate a target sound work. The method solves the technical problems that the selection scheme of sound clips and sound effects in the prior art is single, and the flexible creation requirement of users on sound works cannot be met. The method is convenient for users to flexibly create the recording clips, realizes diversified selection of selection schemes such as sound clips, sound effects and the like, and further generates sound works with more diversified sound effects and better sound quality.
Referring to fig. 4, fig. 4 is a block diagram illustrating a sound work generating apparatus according to a third embodiment of the present invention.
The invention provides a device for generating sound works, which comprises:
A recording sub-segment receiving module 401, configured to receive a plurality of recording sub-segments;
the sound effect configuration module 402 is configured to configure sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction, so as to obtain a plurality of sound segments to be synthesized;
And the splicing processing module 403 is configured to perform splicing processing by using a plurality of to-be-synthesized sound clips, so as to generate a target sound work.
Optionally, the sound effects include a sound variation effect and a scene sound effect, and the sound effect configuration module 402 includes:
the sound effect selection submodule is used for responding to an input sound effect configuration instruction and selecting the sound effect changing and scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
And the sound fragment to be synthesized generating sub-module is used for respectively configuring a plurality of sound recording sub-fragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.
Optionally, the splicing processing module 403 includes:
The splicing sub-module is used for splicing the sound fragments to be synthesized and carrying out short-time fade-in fade-out processing to generate an intermediate sound fragment;
And the volume coordination processing sub-module is used for performing volume coordination processing on the middle sound fragment to generate a target sound work.
Optionally, the volume coordination processing submodule includes:
the segment dividing unit is used for dividing the middle sound segment into multiple frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segment to be processed;
the logarithmic domain amplitude gain determining unit is used for determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
And the dual gain smoothing operation unit is used for executing dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain to generate a target sound work.
Optionally, the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:
A first smooth gain value determining unit, configured to calculate a first smooth gain value corresponding to a j-th frame of the to-be-processed sound fragment according to the logarithmic domain amplitude gain corresponding to the j-th frame of the to-be-processed sound fragment and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the to-be-processed sound fragment; wherein j is more than or equal to 4,j and is an integer;
A first gain smoothing operation execution unit configured to execute a first gain smoothing operation on the j-th frame of the sound fragment to be processed using the first smoothing gain value, so that the first smoothing process of the j-th frame of the sound fragment to be processed is completed;
A gain conversion unit configured to convert the logarithmic-domain amplitude gain into the linear-domain amplitude gain;
A sampling point determining unit configured to determine a plurality of sampling points from a j-th frame of the smoothed sound fragment;
a sampling point gain determining unit, configured to calculate a sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the smooth sound segment;
A second gain smoothing operation execution unit configured to execute a second gain smoothing operation for each of the sampling points using the sampling point gain so that the second smoothing processing of the j-th frame of the sound fragment to be processed is completed;
And the target sound work generating unit is used for obtaining the target sound work after the second smoothing processing process of each frame of the sound fragment to be processed is completed.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (6)
1. A method of generating a sound work, comprising:
receiving a plurality of sound recording sub-segments; the recording sub-segments have different recording time lengths;
According to the received sound effect configuration instruction, respectively configuring sound effects for a plurality of recording subfragments to obtain a plurality of sound fragments to be synthesized;
splicing the sound clips to be synthesized to generate a target sound work;
The step of generating the target sound work by adopting a plurality of sound clips to be synthesized to carry out splicing processing comprises the following steps:
Splicing a plurality of sound fragments to be synthesized and carrying out short-time fade-in fade-out processing to generate an intermediate sound fragment;
Performing volume coordination processing on the middle sound fragment to generate a target sound work;
the step of performing volume coordination processing on the intermediate sound clip to generate a target sound work comprises the following steps:
Dividing the middle sound fragment into a plurality of frames of sound fragments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound fragment to be processed;
determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
And carrying out dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain to generate a target sound work.
2. The method of generating a sound work according to claim 1, wherein the sound effect includes a sound variation effect and a scene sound effect, and the step of respectively configuring the sound effect for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized includes:
Responding to an input sound effect configuration instruction, and selecting the sound changing effect and the scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
And respectively configuring a plurality of recording subfragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.
3. The method of generating a sound work according to claim 1, wherein the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the step of performing the dual gain smoothing operation on the sound clip to be processed per frame based on the log-domain amplitude gain, generates a target sound work includes:
calculating a first smooth gain value corresponding to a j-th frame of the sound fragment to be processed according to the logarithmic domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the sound fragment to be processed; wherein j is more than or equal to 4,j and is an integer;
performing a first gain smoothing operation on the j-th frame of the sound fragment to be processed by adopting the first smoothing gain value, so that the first smoothing processing process of the j-th frame of the sound fragment to be processed is completed;
Converting the logarithmic domain amplitude gain to a linear domain amplitude gain;
determining a plurality of sampling points from a j-th frame of the sound fragment to be processed;
Calculating the sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed;
Performing a second gain smoothing operation on each sampling point by adopting the sampling point gain, so that the second smoothing processing process of the j-th frame of the sound fragment to be processed is completed;
And after the second smoothing processing process of each frame of the sound fragment to be processed is finished, obtaining the target sound work.
4. A sound work generating apparatus comprising:
the recording sub-segment receiving module is used for receiving a plurality of recording sub-segments; the recording sub-segments have different recording time lengths;
The sound effect configuration module is used for respectively configuring sound effects for the plurality of recording sub-segments according to the received sound effect configuration instruction to obtain a plurality of sound segments to be synthesized;
the splicing processing module is used for carrying out splicing processing on the sound clips to be synthesized to generate a target sound work;
The splicing processing module comprises:
The splicing sub-module is used for splicing the sound fragments to be synthesized and carrying out short-time fade-in fade-out processing to generate an intermediate sound fragment;
The volume coordination processing sub-module is used for performing volume coordination processing on the middle sound fragment to generate a target sound work;
the volume coordination processing submodule comprises:
the segment dividing unit is used for dividing the middle sound segment into multiple frames of sound segments to be processed according to a preset time length and calculating the input frame amplitude of each frame of sound segment to be processed;
the logarithmic domain amplitude gain determining unit is used for determining logarithmic domain amplitude gain corresponding to each input frame amplitude according to a preset dynamic range control curve;
And the dual gain smoothing operation unit is used for executing dual gain smoothing operation on the sound fragments to be processed of each frame based on the logarithmic domain amplitude gain to generate a target sound work.
5. The apparatus for generating a sound work according to claim 4, wherein the sound effects include a sound varying effect and a scene sound effect, and the sound effect configuration module includes:
the sound effect selection submodule is used for responding to an input sound effect configuration instruction and selecting the sound effect changing and scene sound effect corresponding to the sound effect configuration instruction from a preset sound effect library;
And the sound fragment to be synthesized generating sub-module is used for respectively configuring a plurality of sound recording sub-fragments by adopting the sound changing effect and the scene sound effect to obtain a plurality of sound fragments to be synthesized.
6. The apparatus for generating a sound work according to claim 4, wherein the dual gain smoothing operation includes a first smoothing process and a second smoothing process, and the dual gain smoothing operation unit includes:
A first smooth gain value determining unit, configured to calculate a first smooth gain value corresponding to a j-th frame of the to-be-processed sound fragment according to the logarithmic domain amplitude gain corresponding to the j-th frame of the to-be-processed sound fragment and the logarithmic domain amplitude gain corresponding to the j-3 th frame of the to-be-processed sound fragment; wherein j is more than or equal to 4,j and is an integer;
A first gain smoothing operation execution unit configured to execute a first gain smoothing operation on the j-th frame of the sound fragment to be processed using the first smoothing gain value, so that the first smoothing process of the j-th frame of the sound fragment to be processed is completed;
a gain conversion unit for converting the logarithmic domain amplitude gain into a linear domain amplitude gain;
A sampling point determining unit configured to determine a plurality of sampling points from a j-th frame of the sound fragment to be processed;
A sampling point gain determining unit, configured to calculate a sampling point gain of each sampling point according to the linear domain amplitude gain corresponding to the j-th frame of the sound fragment to be processed;
A second gain smoothing operation execution unit configured to execute a second gain smoothing operation for each of the sampling points using the sampling point gain so that the second smoothing processing of the j-th frame of the sound fragment to be processed is completed;
And the target sound work generating unit is used for obtaining the target sound work after the second smoothing processing process of each frame of the sound fragment to be processed is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110018240.4A CN112863530B (en) | 2021-01-07 | 2021-01-07 | Sound work generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110018240.4A CN112863530B (en) | 2021-01-07 | 2021-01-07 | Sound work generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112863530A CN112863530A (en) | 2021-05-28 |
CN112863530B true CN112863530B (en) | 2024-08-27 |
Family
ID=76004855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110018240.4A Active CN112863530B (en) | 2021-01-07 | 2021-01-07 | Sound work generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112863530B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113870896A (en) * | 2021-09-27 | 2021-12-31 | 动者科技(杭州)有限责任公司 | Motion sound false judgment method and device based on time-frequency graph and convolutional neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104410748A (en) * | 2014-10-17 | 2015-03-11 | 广东小天才科技有限公司 | Method for adding background sound effect according to position of mobile terminal and mobile terminal |
CN106060707A (en) * | 2016-05-27 | 2016-10-26 | 北京小米移动软件有限公司 | Reverberation processing method and device |
CN108064406A (en) * | 2015-06-22 | 2018-05-22 | 时光机资本有限公司 | It is synchronous for the rhythm of the cross-fade of music audio frequency segment for multimedia |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101710488B (en) * | 2009-11-20 | 2011-08-03 | 安徽科大讯飞信息科技股份有限公司 | Method and device for voice synthesis |
CN104517605B (en) * | 2014-12-04 | 2017-11-28 | 北京云知声信息技术有限公司 | A kind of sound bite splicing system and method for phonetic synthesis |
KR101800362B1 (en) * | 2016-09-08 | 2017-11-22 | 최윤하 | Music composition support apparatus based on harmonics |
WO2018077364A1 (en) * | 2016-10-28 | 2018-05-03 | Transformizer Aps | Method for generating artificial sound effects based on existing sound clips |
CN107197404B (en) * | 2017-05-05 | 2020-05-12 | 广州盈可视电子科技有限公司 | Automatic sound effect adjusting method and device and recording and broadcasting system |
CN107154264A (en) * | 2017-05-18 | 2017-09-12 | 北京大生在线科技有限公司 | The method that online teaching wonderful is extracted |
CN108010512B (en) * | 2017-12-05 | 2021-04-30 | 广东小天才科技有限公司 | Sound effect acquisition method and recording terminal |
CN108347529B (en) * | 2018-01-31 | 2021-02-23 | 维沃移动通信有限公司 | Audio playing method and mobile terminal |
CN108877753B (en) * | 2018-06-15 | 2020-01-21 | 百度在线网络技术(北京)有限公司 | Music synthesis method and system, terminal and computer readable storage medium |
CN109346044B (en) * | 2018-11-23 | 2023-06-23 | 广州酷狗计算机科技有限公司 | Audio processing method, device and storage medium |
CN109686347B (en) * | 2018-11-30 | 2021-04-23 | 北京达佳互联信息技术有限公司 | Sound effect processing method, sound effect processing device, electronic equipment and readable medium |
CN110675848B (en) * | 2019-09-30 | 2023-05-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio processing method, device and storage medium |
CN111798831B (en) * | 2020-06-16 | 2023-11-28 | 武汉理工大学 | Sound particle synthesis method and device |
CN112133277B (en) * | 2020-11-20 | 2021-02-26 | 北京猿力未来科技有限公司 | Sample generation method and device |
-
2021
- 2021-01-07 CN CN202110018240.4A patent/CN112863530B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104410748A (en) * | 2014-10-17 | 2015-03-11 | 广东小天才科技有限公司 | Method for adding background sound effect according to position of mobile terminal and mobile terminal |
CN108064406A (en) * | 2015-06-22 | 2018-05-22 | 时光机资本有限公司 | It is synchronous for the rhythm of the cross-fade of music audio frequency segment for multimedia |
CN106060707A (en) * | 2016-05-27 | 2016-10-26 | 北京小米移动软件有限公司 | Reverberation processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112863530A (en) | 2021-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8676361B2 (en) | Acoustical virtual reality engine and advanced techniques for enhancing delivered sound | |
US8874245B2 (en) | Effects transitions in a music and audio playback system | |
US20220286781A1 (en) | Method and apparatus for listening scene construction and storage medium | |
AU2007243586B2 (en) | Audio gain control using specific-loudness-based auditory event detection | |
KR101164937B1 (en) | Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal | |
EP2329661B1 (en) | Binaural filters for monophonic compatibility and loudspeaker compatibility | |
JP2012235310A (en) | Signal processing apparatus and method, program, and data recording medium | |
US20070025566A1 (en) | System and method for processing audio data | |
US10728688B2 (en) | Adaptive audio construction | |
CN110246508B (en) | Signal modulation method, device and storage medium | |
CN112863530B (en) | Sound work generation method and device | |
US10178491B2 (en) | Apparatus and a method for manipulating an input audio signal | |
JP5911852B2 (en) | Variable exponential mean detector and dynamic range controller | |
CN103812462A (en) | Loudness control method and device | |
CN113077771B (en) | Asynchronous chorus sound mixing method and device, storage medium and electronic equipment | |
CN111564158A (en) | Configurable sound changing device | |
CN115835112A (en) | Projector-based adaptive sound effect debugging method, system, device and platform | |
JP3197975B2 (en) | Pitch control method and device | |
CN113905307A (en) | Color slide block | |
CN116994545B (en) | Dynamic original sound adjusting method and device for K song system | |
CN115188394B (en) | Sound mixing method, device, electronic equipment and storage medium | |
EP1317807A2 (en) | System and method for processing audio data | |
CN119296554A (en) | Multi-channel mixing control method, device, equipment and medium | |
CN116057626A (en) | Noise reduction using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |