CN110149548A - Video dubbing method, electronic device and readable storage medium storing program for executing - Google Patents
Video dubbing method, electronic device and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN110149548A CN110149548A CN201811122718.2A CN201811122718A CN110149548A CN 110149548 A CN110149548 A CN 110149548A CN 201811122718 A CN201811122718 A CN 201811122718A CN 110149548 A CN110149548 A CN 110149548A
- Authority
- CN
- China
- Prior art keywords
- video
- vocabulary
- subtitle
- file
- user speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4396—Processing of audio elementary streams by muting the audio signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/485—End-user interface for client configuration
- H04N21/4852—End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The invention discloses a kind of video dubbing method, electronic device and computer readable storage mediums.Video dubbing method includes: playing video file;Processing and the associated subtitle file of video file from audio repository to read personalized speech file corresponding with current subtitle vocabulary, audio repository includes at least one personalized speech file, and personalized speech file includes library vocabulary and user speech segment corresponding with library vocabulary;To associated video file and original audio silence processing corresponding with current subtitle vocabulary;Play the user speech segment in the personalized speech file of corresponding current subtitle vocabulary.Video dubbing method, electronic device and the computer readable storage medium of embodiment of the present invention, when playing video, electronic device can play the audio dubbed by user, and the interaction of electronic device and user when strengthening video playing promotes the interest of video playing.
Description
Technical field
The present invention relates to technical field of video processing, in particular to a kind of video dubbing method, electronic device and computer
Readable storage medium storing program for executing.
Background technique
Currently, the majority video such as TV play, film is all by the multiframe figure for dubbing information and shooting of professional voice-over actor
As encapsulation obtains.Although this video after professional voice-over actor dubs has a higher ornamental value, but the friendship with user
Mutual property is poor, and interest is relatively low.
Summary of the invention
The embodiment provides a kind of video dubbing method, electronic device and computer readable storage mediums.
The video dubbing method of embodiment of the present invention includes: playing video file;Processing is associated with the video file
Subtitle file to read corresponding with current subtitle vocabulary personalized speech file from audio repository, the audio repository is including extremely
It is few that property voice document, the personalized speech file include library vocabulary and user speech corresponding with the library vocabulary one by one
Segment;To the association video file and original audio silence processing corresponding with the current subtitle vocabulary;Play corresponding institute
State the user speech segment in the personalized speech file of current subtitle vocabulary.
The video dubbing method of embodiment of the present invention includes: to read video and audio repository, and the video includes video text
Part, subtitle file and original audio, the audio repository include library vocabulary and user speech segment corresponding with the library vocabulary;?
Searched in the audio repository with the matched library vocabulary of the subtitle file, with user's language of the correspondence library vocabulary
Tablet section generates the synchronization association information of personalized audio and the subtitle file and the personalized audio;According to described same
It walks related information and is associated with the video file, the subtitle file and personalized audio formation individualized video;It is playing
The personalized audio is played when the individualized video.
The electronic device of embodiment of the present invention includes one or more processors, memory and one or more programs.
Wherein one or more of programs are stored in the memory, and are configured to by one or more of processors
It executes, described program includes the instruction for executing above-mentioned video dubbing method.
The computer readable storage medium of embodiment of the present invention, including the computer journey being used in combination with electronic device
Sequence, the computer program can be executed by processor to complete the instruction of above-mentioned video dubbing method.
Video dubbing method, electronic device and the computer readable storage medium of embodiment of the present invention are playing video
When, electronic device can play the audio dubbed by user, the interaction of electronic device and user when strengthening video playing, be promoted
The interest of video playing.
Additional aspect and advantage of the invention will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Fig. 2 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Fig. 3 is the structural schematic diagram of the electronic device of certain embodiments of the present invention.
Fig. 4 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Fig. 5 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Fig. 6 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Fig. 7 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Fig. 8 is the module diagram of the identification module of the video dubbing installation of certain embodiments of the present invention.
Fig. 9 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 10 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 11 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 12 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 13 is the flow diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 14 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 15 is the module diagram of the volume determining module of the video dubbing installation of certain embodiments of the present invention.
Figure 16 is the flow diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 17 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 18 is the electronic device of certain embodiments of the present invention and the connection schematic diagram of computer readable storage medium.
Figure 19 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 20 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 21 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 22 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 23 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 24 is the module diagram of the identification module of the video dubbing installation of certain embodiments of the present invention.
Figure 25 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 26 is the module diagram of the matching module of the video dubbing installation of certain embodiments of the present invention.
Figure 27 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 28 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 29 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 30 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 31 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Figure 32 is the module diagram of the volume determining module of the video dubbing installation of certain embodiments of the present invention.
Figure 33 is the flow diagram of the video dubbing method of certain embodiments of the present invention.
Figure 34 is the module diagram of the video dubbing installation of certain embodiments of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Also referring to Fig. 1 and Fig. 3, the present invention provides a kind of video dubbing method, is used for electronic device 100.Video is matched
Sound method includes:
S12: playing video file;
S14: processing and the associated subtitle file of video file are corresponding with current subtitle vocabulary to read from audio repository
Personalized speech file, audio repository include at least one personalized speech file, personalized speech file include library vocabulary and with
The corresponding user speech segment of library vocabulary;
S16: to associated video file and original audio silence processing corresponding with current subtitle vocabulary;With
S18: the user speech segment in the personalized speech file of corresponding current subtitle vocabulary is played.
Referring to Figure 2 together and Fig. 3, the present invention also provides a kind of video dubbing installations 10.Video dubbing installation 10 can be used
In electronic device 100.The video dubbing method of embodiment of the present invention can be realized by video dubbing installation 10.Video dubs dress
Setting 10 includes the first playing module 12, first processing module 14, Second processing module 16 and the second playing module 18.Step S12
It can be realized by the first playing module 12.Step S14 can be realized by first processing module 14.Step S16 can be by second
It manages module 16 to realize, step S18 can be realized by the second playing module 18.
In other words, the first playing module 12 can be used for playing video file.First processing module 14 can be used for handle with
The associated subtitle file of video file to read corresponding with current subtitle vocabulary personalized speech file from audio repository,
In, audio repository includes at least one personalized speech file, and personalized speech file includes library vocabulary and corresponding with library vocabulary
User speech segment.Second processing module 16 can be used for associated video file and original audio corresponding with current subtitle vocabulary
Silence processing.Second playing module 18 can be used for playing the user speech in the personalized speech file of corresponding current subtitle vocabulary
Segment.
Wherein, electronic device 100 can be mobile phone, tablet computer, laptop, desktop computer, wearable device (such as
Smartwatch, Intelligent bracelet, intelligent glasses, intelligent helmet etc.) etc..
Electronic device 100 includes display screen 40, processor 20, memory 30 and electroacoustic component 50 (for example, loudspeaker, ear
Machine etc.).First playing module 12 can be the display screen 40 of electronic device 100, for showing broadcasting image.First processing module
14 and Second processing module 16 can be stored in can realize respectively indicated by step S14 and step S16 in memory 30
Function program, processor 20 execute program step S14 and step S16 can be completed.Second playing module 18 can be electricity
The electroacoustic component 50 of sub-device 100, for playing audio.
Video is usually made of video file, subtitle file and audio file three parts.Wherein, video file is by carrying
The multiframe of timestamp (i.e. play time) plays image composition, and multiframe plays image and broadcast with being greater than the distinguishable frame per second of human eye
Smooth dynamic menu is formed after putting.Audio file generally comprises audio and timestamp (i.e. play time), audio instruction
The speech content of the personage in image is played, the timestamp of audio is used for timestamp (i.e. play time) phase with broadcasting image
Matching is to realize being played simultaneously for video file and audio file.Subtitle file include text information corresponding with audio and
Play time corresponding with text information, the play time of subtitle file can be used for and the timestamp of audio, broadcasting image
Timestamp match to realize being played simultaneously for video file, subtitle file, audio file three.
Audio file in current video (such as TV play, film etc.) is all by acquiring matching for professional voice-over actor
What sound obtained.Although this video after voice-over actor dubs has higher ornamental value, but when user's broadcasting video, it can only
Hear dubbing for professional voice-over actor, in this way, the broadcast mode of video is more single, interest poor with the interactivity of user
It is lower.
In the video dubbing method of embodiment of the present invention, electronic device 100 acquires user in daily life various
Voice is simultaneously stored into memory 30, then identifies text information corresponding with voice, and then, fractionation text information forms multiple
Library vocabulary, then multiple user speech segments are formed based on library vocabulary splitting voice, to establish a library vocabulary and user speech
The corresponding audio repository of segment.
Incorporated by reference to Fig. 4, when user inputs the instruction for playing video on electronic device 100, electronic device 100 first will
Original audio silence processing in video.Then, subtitle file is extracted from video, and the subtitle in subtitle file is torn open
Divide to obtain multiple subtitle vocabulary and play time corresponding with multiple subtitle vocabulary.Then, electronic device 100 plays view
Frequency file plays multiframe with certain frame per second and plays image.In each of video file playing process play time
Under, electronic device 100 searches subtitle vocabulary (the i.e. current subtitle word with broadcasting needed for current play time in audio repository
Converge) corresponding personalized speech file, i.e. user speech segment;After finding out user speech segment, the electroacoustic of electronic device 100
Element 50 plays user speech segment.Under next play time, electronic device 100 continues to execute lookup and next broadcasting
The operation for the user speech segment that the corresponding user speech segment of subtitle vocabulary and broadcasting played needed for time point is found out, such as
This moves in circles, until all broadcasting images in video file finish.
When playing video, electronic device 100 can be played is matched the video dubbing method of embodiment of the present invention by user
The personalized speech file of sound, the interaction of electronic device 100 and user when strengthening video playing, promotes the entertaining of video playing
Property.
Also referring to Fig. 3, Fig. 5 and Fig. 6, in some embodiments, audio repository can be obtained by following steps.?
I other words video dubbing method is before step S12 playing video file further include:
S111: acquisition user utilizes the voice of 100 typing of electronic device;With
S112: speech recognition voice is to obtain library vocabulary and user speech segment.
Wherein, step S112 further comprises:
S1121: speech recognition voice is to obtain text information;
S1122: dismantling text information is to obtain multiple library vocabulary;
S1123: voice is disassembled to obtain multiple user speech corresponding with multiple library vocabulary according to multiple library vocabulary
Segment;With
S1124: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use
The corresponding library vocabulary of family sound bite.
Also referring to Fig. 3, Fig. 7 and Fig. 8, in some embodiments, video dubbing installation 10 includes acquisition module 111
With identification module 112.Identification module 112 includes that the first recognition unit 1121, first dismantling unit 1122, second disassembles unit
1123 and storage unit 1124.Step S111 can be realized by acquisition module 111.Step S112 can be real by identification module 112
It is existing.Step S1121 can be realized by the first recognition unit 1121.Step S1122 can be realized by the first dismantling unit 1122.Step
Rapid S1123 can be realized by the second dismantling unit 1123.Step S1124 can be realized by storage unit 1124.
In other words, acquisition module 111 can be used for acquiring the voice that user utilizes 100 typing of electronic device.Identification module
112 can be used for voice described in speech recognition to obtain library vocabulary and user speech segment.First recognition unit 1121 can be used for language
Sound identifies voice to obtain text information.First dismantling unit 1122 can be used for disassembling text information to obtain multiple library vocabulary.
Second dismantling unit 1123 can be used for disassembling voice according to multiple library vocabulary corresponding multiple with multiple library vocabulary to obtain
User speech segment.Storage unit 1124 can be used for for each user speech segment being stored in audio repository, each user speech
The file name of segment is the corresponding library vocabulary of user speech segment.
Wherein, acquisition module 111 can be the acoustoelectric element 60 being arranged on electronic device 100, such as microphone.Identification
Module 112 can be stored in the program that function indicated by step S112 may be implemented in memory 30, and processor 20 is held
Step S112 can be completed in line program.First recognition unit 1121, first dismantling unit 1122, second is disassembled unit 1123 and is deposited
Storage unit 1124 can be stored in the subprogram being located under program corresponding with identification module 112 in memory 30.Processing
Step S1121 may be implemented to step S1124 in 20 execution subroutine of device.Storage unit 1124 can be in electronic device 100
Memory 30.
The trigger condition that electronic device 100 acquires the various voices of user in daily life can be user's input, i.e.,
The sound-recording function of electronic device 100 is opened by user, user is spoken by concentration to improve the recording of multiclass vocabulary at this time.Or
Person, the trigger condition that electronic device 100 acquires the various voices of user in daily life are also possible to by electronic device 100 certainly
Dynamic triggering.For example, dial or receive calls in electronic device 100 (such as dial and receive calls, the voice in social software it is logical
Words, video calling etc.) when, electronic device 100 opens the voice in communication process of the sound-recording function to record user;Alternatively, with
The opening time section of the sound-recording function of electronic device 100 is arranged according to own actual situation for family, for example, user is in setting recording function
The opening time section of energy is daily 19:00-21:00, then electronic device 100 opens sound-recording function in daily 19:00, daily
21:00 close sound-recording function, user set the sound-recording function opening time section mode can persistently be opened to avoid sound-recording function
Lead to the higher problem of the energy consumption of electronic device 100;Alternatively, electronic device 100 opens recording function every the first predetermined amount of time
Can, and sound-recording function is closed after the second predetermined amount of time, in this way, sound-recording function is opened without lasting, the energy of electronic device 100
It consumes lower.
Incorporated by reference to Fig. 9, after the acoustoelectric element 60 on electronic device 100 acquires the voice of user, processor 20 is first to voice
It is identified to obtain text information, then controls memory 30 and store the text information after identification.Wherein, speech recognition forms text
The process of information specifically includes: firstly, processor 20 cuts off the mute of the head and the tail both ends of voice, being done with reducing to subsequent processing
It disturbs.Then, voice is divided into multiframe using mobile window function by processor 20.Then, processor 20 does acoustics to each frame voice
Feature extraction, then the acoustic feature of each frame voice is matched with acoustic model with the state of each frame voice of determination, and
Combinations of states based on multiframe voice finally can be realized phonotactics voice and turn to text information at phoneme at word
It changes.
After processor 20 converts speech into text information, it can further pass through participle technique, such as Forward Maximum Method
Method, reverse maximum matching method, minimum syncopation, two-way maximum matching method etc. are disassembled text information to obtain multiple library words
It converges.Then, voice dismantling is one-to-one with library vocabulary according to the corresponding relationship of multiple library vocabulary and voice by processor 20
Multiple user speech segments.Under catalogue where user speech segment is finally stored in 30 sound intermediate frequency library of memory, each user
The file name of sound bite is library vocabulary corresponding with the user speech segment.
For example, the processor 20 in electronic device 100 controls acoustoelectric element 60 first and acquires " today that user issues
Weather is very good " voice.Then, the voice of " today, weather was very good " is converted to " today day by speech recognition by processor 20
Gas is very good " text information.Then, it is more that processor 20, which is based on participle technique for the text information dismantling of " today, weather was very good ",
A library vocabulary, such as " today ", " weather ", " very good " three library vocabulary.Then, processor 20 is corresponding according to the library vocabulary disassembled
Dismantling voice obtains multiple user speech segments, for example, library vocabulary " today " is corresponding user speech segment " today ", library vocabulary
" weather " is corresponding user speech segment " weather ", and library vocabulary is " very good " corresponding user speech segment " very good ".Finally, memory 30
By " today ", " weather ", " very good " three user speech segments storages into audio repository, the filename of each user speech segment
Library vocabulary referred to as corresponding with the user speech segment.For example, storage file entitled " today of user speech segment " today "
.mp3 ", the storage file of user speech segment " weather " is entitled " weather .mp3 ", the storage file of user speech segment " very good "
Entitled " very good .mp3 ".After user speech segment can be convenient using library corresponding with user speech segment vocabulary as file name
The search of user speech segment is directly carried out in continuous step according to file name.
In this way, electronic device 100 enriches the user in audio repository by the voice of multi collect, identification and dismantling user
Sound bite, user speech segment abundant are conducive to promote the integrality that video is dubbed.
Referring to Fig. 10, in some embodiments, step S14 processing and the associated subtitle file of video file are with from sound
Personalized speech file corresponding with current subtitle vocabulary is read in frequency library includes:
S141: the current subtitle in subtitle file is extracted;
S142: it splits current subtitle and obtains multiple current subtitle vocabulary and play time corresponding with current subtitle vocabulary
Point;With
S143: the library vocabulary with current subtitle terminology match is searched in audio repository to obtain user's language of corresponding library vocabulary
Tablet section;
The user speech segment that step S18 is played in the personalized speech file of corresponding current subtitle vocabulary includes:
S181: user speech segment is played in play time.
Also referring to Fig. 3 and Figure 11, in some embodiments, first processing module 14 includes extraction unit 141, tears open
Sub-unit 142 and searching unit 143.Step S141 can be realized by extraction unit 141.Step S142 can be by split cells
142 realize.Step S143 can be realized by searching unit 143.Step S181 can be realized by the second playing module 18.
In other words, extraction unit 141 can be used for extracting the current subtitle in subtitle file.Split cells 142 can be used for
It splits current subtitle and obtains multiple current subtitle vocabulary and play time corresponding with current subtitle vocabulary.Searching unit
143 can be used for searching the library vocabulary with current subtitle terminology match in audio repository to obtain the user speech piece of corresponding library vocabulary
Section.Second playing module 18 can be further used for playing user speech segment in play time.
Wherein, extraction unit 141, split cells 142 and searching unit 143 can be stored in memory 30 can be with
The program of function indicated by step S141, step S142 and step S143 is realized respectively, and processor 20 executes program can be complete
At step S141, step S142 and step S143.
Specifically, the subtitle file in each video usually includes text information corresponding with audio and and text
The corresponding play time of information.When video playing, the timestamp of audio and the play time of subtitle vocabulary are based on a ginseng
It examines clock and does synchronization, to realize being played simultaneously for audio file and subtitle file.For the user speech segment in audio repository is made
It is played out for personalized speech file, processor 20 extracts multiple current subtitles in subtitle file, current subtitle packet first
Include the particular content and play time of subtitle.Processor 20 split current subtitle obtain multiple current subtitle vocabulary and with it is more
A current subtitle vocabulary matched multiple play times one by one.
By taking subtitle file format is SRT format as an example, " your fish ball is bothered in a certain sentence lines, such as " wheat pocket story "
The corresponding current subtitle form of the lines of asperities " specifically: " your fish ball of 00:00:00,000-- > 00:00:04,400 trouble is thick
Face ".Wherein, " 00:00:00,000-- > 00:00:04,400 " be play time, and " trouble your fish ball asperities " is and broadcasts
Put time point the specific word content in " 00:00:00,000-- > 00:00:04,400 " corresponding current subtitle.Processor 20
Current subtitle is split to obtain multiple current subtitle vocabulary based on participle technique.Specifically, processor 20, which is incited somebody to action, " bothers you
This current subtitle of fish ball asperities " split with obtain multiple current subtitle vocabulary and with current subtitle vocabulary it is matched one by one
Multiple play times, be respectively as follows: " trouble -00:00:00,000-- > 00:00:01,000 ", " you are -00:00:01, and 000-- >
00:00:02,000 ", " fish ball -00:00:02,000-- > 00:00:03,000 ", " asperities -00:00:03,000-- > 00:00:
04,400”。
In video file playing process, processor 20 is searched and the use of each current subtitle terminology match in audio repository
Family sound bite e.g. inscribes lookup user speech segment corresponding with current subtitle vocabulary " trouble " in 00:00:00,000
It " trouble .mp3 " and plays user speech segment " trouble .mp3 ", lookup and current subtitle vocabulary is inscribed in 00:00:01,000
" you " corresponding user speech segment " your .mp3 " simultaneously plays user speech segment " your .mp3 ", inscribes in 00:00:02,000
It searches user speech segment " fish ball .mp3 " corresponding with current subtitle vocabulary " fish ball " and plays user speech segment " fish ball
.mp3 ", lookup user speech segment " asperities corresponding with current subtitle vocabulary " asperities " is inscribed in 00:00:03,000
.mp3 " and user speech segment " asperities .mp3 " is played.
In this way, can play the voice dubbed by user when video playing, realize that the personalization of user is dubbed, user experience
More preferably.
In some embodiments, subtitle file can be hard subtitle or soft subtitile.Hard subtitle is also referred to as " embedded subtitle ", i.e.,
Multiple current subtitles have played compression of images in same group of data with the multiframe in video file, can not as watermark
Separation.Soft subtitile is also referred to as " plug-in subtitle ", i.e., subtitle file and video file are independent at two parts data, then are encapsulated into a view
In frequency.The subtitle file of above-mentioned SRT format is soft subtitile.When subtitle file is soft subtitile, current subtitle and correspondence are current
The play time of subtitle can be extracted directly.But when subtitle file is hard subtitle, current subtitle and corresponding current subtitle
Play time can not directly extract.At this point, processor 20 can pass through the text of identification video file played in image
Word realizes the extraction of current subtitle.It is appreciated that playing image under hard subtitle and directly containing the information of current subtitle, i.e.,
Current subtitle, which has been directly embedded into, to be played in image, at this point it is possible to extract current subtitle in broadcasting figure from playing in image
The locating region as in, and the text in the region is identified to extract current subtitle.Further, since every broadcasting image has
Corresponding timestamp, processor 20 can calculate this based on the timestamp of one or more broadcasting image corresponding with current subtitle
The play time of current subtitle.In this way, current subtitle and corresponding current word can be extracted when subtitle file is hard subtitle
The play time of curtain.
The video dubbing method of embodiment of the present invention is provided simultaneously with the subtitle processing capacity of soft subtitile and hard subtitle no matter
Subtitle file in video is soft subtitile or hard subtitle, can realize that the personalization of user is dubbed, user experience is more preferably.
Also referring to Figure 12 and Figure 13, in some embodiments, video dubbing method is played in step S18 to should
After the step of user speech segment in the personalized speech file of preceding subtitle vocabulary further include:
S19: the broadcast sound volume of the user speech segment of the corresponding shape of the mouth as one speaks is determined according to the shape of the mouth as one speaks of personage in video file.
Wherein, step S19 further comprises:
S191: the currently playing figure chosen in image with user speech fragment association is played from multiframe according to play time
Picture;
S192: the shape of the mouth as one speaks of personage in currently playing image is identified;
S193: the actual aspect ratio of the shape of the mouth as one speaks is calculated according to the width of the shape of the mouth as one speaks and height;
S194: volume amplification factor is calculated according to actual aspect ratio and default the ratio of width to height;With
S195: user speech segment corresponding with the shape of the mouth as one speaks of personage in currently playing image is determined according to volume amplification factor
Broadcast sound volume.
Also referring to Fig. 3, Figure 14 and Figure 15, in some embodiments, video dubbing installation 10 further includes that volume is true
Cover half block 19.Volume determining module 19 includes selection unit 191, the second recognition unit 192, the first computing unit 193, second meter
Calculate unit 194 and volume determination unit 195.Step S19 can be realized by volume determining module 19.Step S191 can be by choosing
Unit 191 is realized.Step S192 can be realized by the second recognition unit 192.Step S193 can be real by the first computing unit 193
It is existing.Step S194 can be realized by the second computing unit 194.Step S195 can be realized by volume determination unit 195.
In other words, volume determining module 19 can be used for determining the use of the corresponding shape of the mouth as one speaks according to the shape of the mouth as one speaks of personage in video file
The broadcast sound volume of family sound bite.Selection unit 191, which can be used for being played according to play time from multiframe, to be chosen and is used in image
The associated currently playing image of family sound bite.Second recognition unit 192 can be used to identify the mouth of personage in currently playing image
Type.First computing unit 193 can be used for calculating the actual aspect ratio of the shape of the mouth as one speaks according to the width and height of the shape of the mouth as one speaks.Second computing unit
194 can be used for calculating volume amplification factor according to actual aspect ratio and default the ratio of width to height.Volume determination unit 195 can be used for basis
Volume amplification factor determines the broadcast sound volume of user speech segment corresponding with the shape of the mouth as one speaks of personage in currently playing image.
Wherein, volume determining module 19, which can be stored in memory 30, may be implemented function indicated by step S19
The program of energy, processor 20, which executes program, can be completed step S19.Selection unit 191, the second recognition unit 192, first calculate
Unit 193, the second computing unit 194 and volume determination unit 195 can be stored in can realize respectively in memory 30
Step S191, the program of function indicated by step S192, step S193, step S194 and step S195, processor 20 execute
Step S191 can be completed to step S195 in program.
Specifically, it dubs and usually requires to rise and fall in view of the emotion of role, the embodiment that usual emotion rises and falls in dubbing is
The volume change of audio.Therefore, the use when playing the personalized speech file that user dubs, in each personalized speech file
The emotion that the broadcast sound volume of family sound bite is also required to adapt to role rises and falls and makees corresponding variation.Wherein, it can be regarded by identification
The ratio of width to height of the shape of the mouth as one speaks of personage determines each use in the currently playing image with each user speech fragment association of frequency file
The broadcast sound volume that family sound bite needs.Specifically, each user speech segment has a play time, plays figure
As being also corresponding with play time.Therefore, for the play time of each user speech segment, processor 20 is primarily based on
Play time plays the frame that the corresponding play time is found out in image from multiframe or multiframe plays image, this frame or more
It is the currently playing image with the user speech fragment association under the play time that frame, which plays image,.Then, processor 20
The personage in the currently playing image of each frame is identified according to face recognition algorithms, further identifies the shape of the mouth as one speaks of personage.With
Afterwards, processor 20 calculates actual aspect ratio (the i.e. width and height of the shape of the mouth as one speaks according to the width and height of the shape of the mouth as one speaks recognized
Than).When calculating the actual aspect ratio of the shape of the mouth as one speaks, if playing image has multiframe, processor 20 can recognize multiple shape of the mouth as one speaks, can use at this time
Intermediate value, average value or maximum value of the width of multiple shape of the mouth as one speaks etc. accordingly take the height of multiple shape of the mouth as one speaks as final width
(that is, when width takes intermediate value, also correspondence takes intermediate value to height as final height for intermediate value, average value or maximum value etc.;Work as width
When being averaged, also correspondence is averaged height;When width is maximized, also correspondence is maximized height);Alternatively, processing
Device 20 calculates the actual aspect ratio that every frame plays the shape of the mouth as one speaks in image first, then takes intermediate value to the multiple actual aspect ratios found out, puts down
Mean value or maximum value etc. are as final actual aspect ratio.Then, processor 20 calculates the actual aspect ratio (developed width of the shape of the mouth as one speaks
With the ratio of actual height) with the ratio of default the ratio of width to height, which is volume amplification factor.In this way, processor 20 can be counted
Multiple volume amplification factors are calculated, each volume amplification factor is by personage in an associated frame or the currently playing image of multiframe
The shape of the mouth as one speaks be calculated, a frame or the currently playing image of multiframe are associated with user speech segment, processor 20 according to
One frame or the corresponding volume amplification factor calculating of the currently playing image of multiframe are associated with a frame or the currently playing image of multiframe
User speech segment broadcast sound volume, i.e. broadcast sound volume=default broadcast sound volume × volume amplification factor.
In addition, there may be multiple personages in the currently playing image of a frame, then processor 20 handles the currently playing figure of a frame
The actual aspect ratio of multiple shape of the mouth as one speaks can be obtained as after.At this point, processor 20 rejects people not opening in currently playing image first
The actual aspect ratio of the corresponding shape of the mouth as one speaks of object, specifically, for example, one detection threshold value of setting, is less than in the actual aspect ratio of the shape of the mouth as one speaks
Illustrate that the corresponding personage of the shape of the mouth as one speaks is not opening when detection threshold value to speak, then rejects the actual aspect ratio of the corresponding shape of the mouth as one speaks of the personage.
After the actual aspect ratio for rejecting the corresponding shape of the mouth as one speaks of personage not opening in currently playing image, if only remaining next practical wide high
The value of ratio then directly calculates volume amplification factor with the actual aspect ratio.It is corresponding in the personage for rejecting not opening in broadcasting image
The shape of the mouth as one speaks actual aspect ratio after, if there remains the value of multiple actual aspect ratios, in the actual aspect ratio for taking multiple shape of the mouth as one speaks
Value, average value or maximum value etc. calculate volume amplification factor.In this way, the standard for the volume amplification factor being calculated can be promoted
True property.
Figure 16 is please referred to, in some embodiments, video dubbing method is before step S12 further include:
S113: determine that target dubs role according to the input of user;
Step S14 processing is corresponding with current subtitle vocabulary to read from audio repository with the associated subtitle file of video file
Personalized speech file further include:
S144: processing original audio is to mark off the voice messaging and the non-targeted voice for dubbing role that target dubs role
Information;
Step S141 extract subtitle file in current subtitle include:
S1411: target subtitle corresponding with the voice messaging that target dubs role is extracted from subtitle file;
S1412: the current subtitle in target subtitle is extracted.
Also referring to Fig. 3 and Figure 17, in some embodiments, video dubbing installation 10 further includes role determination module
113.First processing module 14 further includes division unit 144.Step S113 can be realized by role determination module 113.Step
S144 can be realized by division unit 144.Step S1411 and step S1412 can be realized by extraction unit 141.
In other words, role determination module 113 can be used for determining that target dubs role according to the input of user.Division unit
144 can be used for handling original audio to mark off the voice messaging and the non-targeted voice letter for dubbing role that target dubs role
Breath.Extraction unit 141 can be used for extracting from subtitle file corresponding with the voice messaging that target dubs role target subtitle and
Extract the current subtitle in target subtitle.
Wherein, role determination module 113, division unit 144 can be stored in can realize respectively in memory 30
The program of function indicated by step S113 and step S144, processor 20, which executes program, can be completed step S113 and step
S144.Extraction unit 141 is also possible to store may be implemented indicated by step S1411 and step S1412 in memory 30
Function program, processor 20 execute program step S1411 and step S1412 can be completed.
It is appreciated that in some cases, user merely desires to dub some or certain several roles in video.This
When, user can set target first and dub role, and processor 20 identifies that target is matched by Application on Voiceprint Recognition from original audio
The voice messaging of sound role, and be classified as the voice messaging that target is dubbed except the voice messaging of role is removed in original audio
The non-targeted voice messaging for dubbing role.Then, processor 20 screens the voice that role is dubbed with target from subtitle file again
The corresponding target subtitle of information, and target subtitle is divided by multiple current subtitles according to play time.Then, processor 20
Current subtitle is split to obtain multiple current subtitle vocabulary and play time corresponding with multiple current subtitle vocabulary.With
Afterwards, processor 20 is searched with the user speech segment of each current subtitle terminology match in audio repository as personalized speech text
Part.When playing video, the voice that the target of broadcasting dubs role is user speech segment, and the non-targeted of broadcasting dubs role's
Voice is original audio.In this way, user's property of can choose some or certain several roles in video are dubbed, reinforce view
The interest that frequency is dubbed further improves the usage experience of user.
In the video dubbing method described in above-mentioned any one embodiment, the video dubbed via user can be by
Electronic device 100 is automatically stored in memory 30 or is stored in memory 30 manually by user.The video of storage can be same
When voluntarily select to protect comprising original audio and the personalized audio being composed of multiple personalized speech files, or by user
Deposit any one audio in original audio and personalized audio.When in video simultaneously include original audio and personalized audio
When, in the follow-up play video, dub movement without executing in playing process, but can directly according to user's selection come
Play original audio or personalized audio.If user selects to play original audio, personalized audio is mute;If user's selection is broadcast
Personalized audio is put, then original audio is mute;If user does not select, broadcasting personalized audio is defaulted.
Referring again to Fig. 3, the present invention also provides a kind of electronic devices 100.Electronic device 100 includes one or more processing
Device 20, memory 30 and one or more programs.Wherein one or more programs are stored in memory 30, and are configured
It is executed at by one or more processors 20.Program includes dubbing for executing video described in above-mentioned any one embodiment
The instruction of method.
For example, program includes the instruction for executing following steps incorporated by reference to Fig. 1:
S12: playing video file;
S14: processing and the associated subtitle file of video file are corresponding with current subtitle vocabulary to read from audio repository
Personalized speech file, audio repository include at least one personalized speech file, personalized speech file include library vocabulary and with
The corresponding user speech segment of library vocabulary;
S16: to associated video file and original audio silence processing corresponding with current subtitle vocabulary;With
S18: the user speech segment in the personalized speech file of corresponding current subtitle vocabulary is played.
For another example program further includes the instruction for executing following steps incorporated by reference to Fig. 5:
S1121: speech recognition voice is to obtain text information;
S1122: dismantling text information is to obtain multiple library vocabulary;
S1123: voice is disassembled to obtain multiple user speech corresponding with multiple library vocabulary according to multiple library vocabulary
Segment;With
S1124: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use
The corresponding library vocabulary of family sound bite.
Figure 18 is please referred to, the present invention also provides a kind of computer readable storage mediums 200.Computer readable storage medium
200 include the computer program being used in combination with electronic device 100.Computer program can be executed above-mentioned to complete by processor 20
Video dubbing method described in any one embodiment.
For example, computer program can be executed by processor 20 to complete following steps incorporated by reference to Fig. 1:
S12: playing video file;
S14: processing and the associated subtitle file of video file are corresponding with current subtitle vocabulary to read from audio repository
Personalized speech file, audio repository include at least one personalized speech file, personalized speech file include library vocabulary and with
The corresponding user speech segment of library vocabulary;
S16: to associated video file and original audio silence processing corresponding with current subtitle vocabulary;With
S18: the user speech segment in the personalized speech file of corresponding current subtitle vocabulary is played.
For another example computer program can also be executed by processor 20 to complete following steps incorporated by reference to Fig. 5:
S1121: speech recognition voice is to obtain text information;
S1122: dismantling text information is to obtain multiple library vocabulary;
S1123: voice is disassembled to obtain multiple user speech corresponding with multiple library vocabulary according to multiple library vocabulary
Segment;With
S1124: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use
The corresponding library vocabulary of family sound bite.
The video method of any of the above-described embodiment description refers to utilizing user's language during user plays video
Tablet section dubs video.In some embodiments, video dub can also be using user speech segment
The silent processing in backstage is completed, and in other words, user's input video on electronic device 100 dubs instruction, and processor 20 executes
The operation that video is dubbed, but in this dubbing process, electronic device 100 can't play video and watch to user.It is dubbing
After the completion, processor 20 generates the video (i.e. following individualized videos) dubbed by user, and controls display screen 40 or electroacoustic
Element 50 prompts user to dub completion.At this point, the display screen of electronic device 100 broadcasts 40 when user's click play individualized video
Video file, subtitle file are put, while the audio that the broadcasting of electroacoustic component 50 of electronic device 100 is dubbed by user is (i.e. following
Personalized audio).
For this purpose, the present invention also provides a kind of videos that can be used for electronic device 100 to dub also referring to Fig. 3 and Figure 19
Method.Video dubbing method includes:
S23: reading video and audio repository, video include video file, subtitle file and original audio, and audio repository includes library
Vocabulary and user speech segment corresponding with library vocabulary;
S24: searched in audio repository with the matched library vocabulary of subtitle file, with the user speech segment of corresponding library vocabulary
Generate the synchronization association information of personalized audio and subtitle file and personalized audio;With
S25: individualized video is formed according to synchronization association information association video file, subtitle file and personalized audio;
With
S26: personalized audio is played when playing individualized video.
Also referring to Fig. 3 and Figure 20, the present invention provides a kind of video dubbing installation 20.Video dubbing installation 20 is for electricity
Sub-device 100.The video dubbing method of embodiment of the present invention can be realized by video dubbing installation 20.Video dubbing installation 20
Including read module 23, matching module 24, relating module 25 and playing module 26.Step S23 can be realized by read module 23.
Step S24 can be realized by matching module 24.Step S25 can be realized by relating module 25.Step S26 can be by playing module
26 realize.
In other words, read module 23 can be used for reading video and audio repository, video include video file, subtitle file and
Original audio, audio repository include library vocabulary and user speech segment corresponding with library vocabulary.Matching module 24 can be used in audio
Searched in library with the matched library vocabulary of subtitle file, with the user speech segment of corresponding library vocabulary generate personalized audio and
The synchronization association information of subtitle file and personalized audio.Relating module 25 can be used for according to synchronization association information association video text
Part, subtitle file and personalized audio form individualized video.Playing module 26 can be used for the broadcasting when playing individualized video
Personalized audio.
Wherein, read module 23, matching module 24 and relating module 25 can be stored in can divide in memory 30
Not Shi Xian function indicated by step S23, step S24 and step S25 program.Processor, which executes program, can complete step
S23 to step S25.Playing module 26 can be the electroacoustic component 50 of electronic device 100, for playing personalized audio.
In the video dubbing method of embodiment of the present invention, electronic device 100 acquires user in daily life various
Voice is simultaneously stored into memory 30, then identifies text information corresponding with voice, and then, fractionation text information forms multiple
Library vocabulary, then multiple user speech segments are formed based on library vocabulary splitting voice, to establish a library vocabulary and user speech
The corresponding audio repository of segment.When user inputs the instruction dubbed for video on electronic device 100, electronic device 100 is logical
The corresponding text of subtitle file in identification video is crossed, user speech piece corresponding with the text of subtitle file is found out in audio repository
Section, to form new personalized audio according to multiple user speech segments, video file, subtitle file and personalized audio can
To form individualized video, i.e., the video dubbed by user.
Wherein, the synchronization association information of subtitle file and personalized audio is the timestamp letter carried in personalized audio
Breath.The video file, subtitle file and personalized audio, which are associated with, according to synchronization association synchronizing information forms individualized video
Process is to encapsulate video file, subtitle file and personalized audio to obtain the encapsulation process of individualized video part.In personalization
In the encapsulation process of video, electronic device 100 can will be regarded by synchronization association information, the i.e. timestamp information of personalized audio
Frequency file and personalized audio interleaved.Individualized video can be packaged into different formats, for example, TS format, MKV lattice
Formula, MOV format etc..Different formats has different file structures, and the format of individualized video can voluntarily be selected by user.
The video dubbing method of embodiment of the present invention can be dubbed to obtain a based on user speech segment for video
Property video, and play the individualized video dubbed by user, the friendship of electronic device 100 and user when strengthening video playing
Mutually, the interest that video file plays is promoted.
1 and Figure 22 referring to Figure 2 together, in some embodiments, audio repository can be obtained by following step, namely
It is to say, the video dubbing method of embodiment of the present invention is before step S23 reads video and audio repository further include:
S21: acquisition user utilizes the voice of 100 typing of electronic device;With
S22: speech recognition voice is to obtain multiple library vocabulary and multiple user speech segments.
Wherein, step S22 further comprises:
S221: speech recognition voice is to obtain text information;
S222: dismantling text information is to obtain multiple library vocabulary;
S223: voice is disassembled to obtain multiple user speech pieces corresponding with multiple library vocabulary according to multiple library vocabulary
Section;With
S224: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use
The corresponding library vocabulary of family sound bite.
3 and Figure 24 referring to Figure 2 together, in some embodiments, video dubbing installation 20 further include acquisition module 21
With identification module 22.Step S21 can be realized by acquisition module 21.Step S22 can be realized by identification module 22.Identification module
22 include that the first recognition unit 221, first dismantling unit 222, second disassembles unit 223 and storage unit 224.Step S221 can
To be realized by the first recognition unit 221.Step S222 can be realized by the first dismantling unit 222.Step S223 can be by second
Unit 223 is disassembled to realize.Step S224 can be realized by storage unit 224.
In other words, acquisition module 21 can be used for acquiring the voice that user utilizes 100 typing of electronic device.Identification module 22
It can be used for speech recognition voice to obtain multiple library vocabulary and multiple user speech segments.First recognition unit 221 can be used for language
Sound identifies voice to obtain text information.First dismantling unit 222 can be used for disassembling text information to obtain multiple library vocabulary.The
Two dismantling units 223 can be used for disassembling voice according to multiple library vocabulary to obtain multiple users corresponding with multiple library vocabulary
Sound bite.Storage unit 224 can be used for for each user speech segment being stored in audio repository, each user speech segment
File name is library corresponding with user speech segment vocabulary.
Wherein, acquisition module 21 can be the acoustoelectric element being arranged on electronic device 100, such as microphone.Identify mould
Block 22 can be stored in the program that function indicated by step S22 may be implemented in memory 30, and processor 20 executes journey
Step S22 can be completed in sequence.First recognition unit 221, first disassembles unit 222, second and disassembles unit 223 and storage unit
224 can be stored in the subprogram being located under program corresponding with identification module 22 in memory 30.Processor 20 executes
Step S221 may be implemented to step S224 in subprogram.Storage unit 224 can be the memory 30 in electronic device 100.
The mode that electronic device 100 acquires various voices in user's daily life carries out in video display process with aforementioned
The acquisition mode in video dubbing method dubbed is identical, and this will not be repeated here.
20 pairs of processor acquisition voices identified in a manner of obtaining library vocabulary and user speech segment with it is aforementioned
The identification method in video dubbing method dubbed in video display process is identical, does not also repeat them here herein.
In this way, electronic device 100 enriches the user in audio repository by the voice of multi collect, identification and dismantling user
Sound bite, user speech segment abundant are conducive to promote the integrality that video is dubbed.
Figure 25 is please referred to, in some embodiments, step S24 is searched and the matched library word of subtitle file in audio repository
It converges, to generate personalized audio and subtitle file pass synchronous with personalized audio with the user speech segment of corresponding library vocabulary
Joining information includes:
S241: multiple subtitle fragments in subtitle file are extracted;
S242: each subtitle fragment is split to obtain multiple subtitle vocabulary and corresponding multiple with multiple subtitle vocabulary
Play time;
S243: multiple user speech segments with multiple subtitle terminology match are searched in audio repository;
S244: multiple users corresponding with multiple subtitle vocabulary are combined according to the play time dot sequency of multiple subtitle vocabulary
Sound bite is to form personalized audio.
Please refer to Figure 26, in some embodiments, matching module 24 include extraction unit 241, split cells 242,
With unit 243, assembled unit 244.Step S241 can be realized by extraction unit 241.Step S242 can be by split cells 242
It realizes.Step S243 can be realized by matching unit 243.Step S244 can be realized by assembled unit 244.In other words, it mentions
Unit 241 is taken to can be used for extracting multiple subtitle fragments in subtitle file.Split cells 242 can be used for splitting each subtitle fragment
To obtain multiple subtitle vocabulary and multiple play times corresponding with multiple subtitle vocabulary.Matching unit 243 can be used for
Multiple user speech segments with multiple subtitle terminology match are searched in audio repository.Assembled unit 244 can be used for according to multiple
The play time dot sequency of subtitle vocabulary combines multiple user speech segments corresponding with multiple subtitle vocabulary to form personalization
Audio.
Wherein, extraction unit 241, split cells 242, matching unit 243, assembled unit 244 can be stored in storage
In device 30 can realize respectively step S241, step S242, function indicated by step S243 and step S244 program, place
Reason device 20, which executes program, can be completed step S241 to step S244.
Specifically, to form personalized audio according to the user speech segment in audio repository, processor 20 extracts word first
Multiple subtitle fragments and play time corresponding with multiple subtitle fragments in curtain file.With subtitle file format for SRT lattice
For formula, the shape of the corresponding subtitle file of lines of " trouble your fish ball asperities " in a certain sentence lines, such as " wheat pocket story "
Formula specifically: " 00:00:00,000-- > 00:00:04,400 trouble your fish ball asperities ".Wherein, " 00:00:00,000-- > 00:
00:04,400 " be play time, " trouble your fish ball asperities " be with play time " 00:00:00,000-- > 00:
The corresponding subtitle fragment of 00:04,400 ".In this way, subtitle fragment and corresponding with subtitle fragment can be extracted by subtitle file
Play time.
Then, processor 20 again based on participle technique subtitle fragment is split with obtain multiple subtitle vocabulary and with it is more
The corresponding play time of a subtitle vocabulary.Continue by taking the video of " wheat pocket story " as an example, processor 20, which extracts, " bothers your fish
After the subtitle fragment of ball asperities ", processor 20 splits to obtain following subtitle subtitle fragment " trouble your fish ball asperities "
Vocabulary and with subtitle vocabulary matched play time one by one: " trouble -00:00:00,000-- > 00:00:01,000 ", " you -
00:00:01,000-- > 00:00:02,000 ", " fish ball -00:00:02,000-- > 00:00:03,000 ", " asperities -00:00:
03,000-->00:00:04,400”。
Then, processor 20 searched in audio repository with the user speech segment of each subtitle terminology match, e.g., search with
The corresponding user speech segment of subtitle vocabulary " trouble " " trouble .mp3 ", searches user speech piece corresponding with subtitle vocabulary " you "
Section " your .mp3 " searches user speech segment " fish ball .mp3 " corresponding with subtitle vocabulary " fish ball ", searches " thick with subtitle vocabulary
The corresponding user speech segment " asperities .mp3 " in face ".Then, processor 20 is according to the sequencing of play time to multiple use
Family sound bite is combined to get to the personalized audio of " trouble your fish ball group face ".When subtitle fragment is multiple, processing
Device 20 is according to multiple user speech segments of the corresponding multiple subtitle fragments of play time dot sequency combination to form complete individual character
Change audio.
In this way, the personalized audio dubbed by user can be formed.
In the video dubbing method of embodiment of the present invention, subtitle file equally can be hard subtitle or soft subtitile, herein
With no restriction.
Figure 27 is please referred to, in some embodiments, the video dubbing method of embodiment of the present invention is being broadcast in step S26
After playing personalized audio when putting individualized video further include:
S27: when playing individualized video, original audio silence processing.
Figure 28 is please referred to, in some embodiments, video dubbing installation 20 further includes Audio Control Module 27.Step
S27 can be realized by Audio Control Module 27.In other words, Audio Control Module 27 can be used for when playing individualized video,
Original audio silence processing.Wherein, Audio Control Module 27, which can be stored in memory 30, may be implemented step S27 institute
The program of the function of instruction, processor 20, which executes program, can be completed step S27.
Specifically, video file, subtitle file and personalized audio directly can be packaged into individualized video by processor 20,
Video file, subtitle file, personalized audio and original audio can also be packaged into individualized video jointly.Playing individual character
Change video, and in individualized video while when including personalized audio and original audio, there can be following two broadcast mode: (1)
When the unselected audio of user, the default of electronic device 100 plays personalized audio, and electronic device 100 is directly by original audio at this time
It is mute, and play personalized audio;(2) type of the audio based on user's selection carries out the broadcasting of individualized video, Yong Huxuan
When selecting broadcasting original audio, electronic device 100 is mute by personalized audio, and plays original audio;User selects to play individual character
When changing audio, electronic device 100 is mute by original audio, and plays personalized audio.
In this way, providing a variety of broadcast modes for user, the entertaining sexual experience of user is promoted.
9 and Figure 30 referring to Figure 2 together, in some embodiments, the video dubbing method of embodiment of the present invention is also
Include:
S28: the broadcast sound volume of the user speech segment of the corresponding shape of the mouth as one speaks is determined according to the shape of the mouth as one speaks of personage in video file.
Further, step S28 includes:
S281: it plays to choose in image from multiframe according to the play time of user speech segment and be closed with user speech segment
The broadcasting image of connection;
S282: identification plays the shape of the mouth as one speaks of personage in image;
S283: the actual aspect ratio of the shape of the mouth as one speaks is calculated according to the width of the shape of the mouth as one speaks and height;
S284: volume amplification factor is calculated according to actual aspect ratio and default the ratio of width to height;With
S285: it is determined according to volume amplification factor and is broadcast with the shape of the mouth as one speaks of personage corresponding user speech segment in broadcasting image
Playback amount.
Also referring to Figure 31 and Figure 32, in some embodiments, video dubbing installation 20 further includes that volume determines mould
Block 28.Volume determining module 28 includes associative cell 281, the second recognition unit 282, the first computing unit 283, second calculating list
Member 284 and volume determination unit 285.Step S28 can be realized by volume determining module 28.Step S281 can be by associative cell
281 realize.Step S282 can be realized by the second recognition unit 282.Step S283 can be realized by the first computing unit 283.
Step S284 can be realized by the second computing unit 284.Step S285 can be realized by volume determination unit 285.
In other words, volume determining module 28 can be used for determining the use of the corresponding shape of the mouth as one speaks according to the shape of the mouth as one speaks of personage in video file
The broadcast sound volume of family sound bite.Associative cell 281 can be used for being played according to the play time of user speech segment from multiframe
The broadcasting image with user speech fragment association is chosen in image.Second recognition unit 282, which can be used to identify, plays people in image
The shape of the mouth as one speaks of object.First computing unit 283 can be used for calculating the actual aspect ratio of the shape of the mouth as one speaks according to the width and height of the shape of the mouth as one speaks.Second meter
Calculating unit 284 can be used for calculating volume amplification factor according to actual aspect ratio and default the ratio of width to height.Volume determination unit 285 can be used
In the broadcast sound volume for determining user speech segment corresponding with the shape of the mouth as one speaks of personage in image is played according to volume amplification factor.
Wherein, volume determining module 28, which can be stored in memory 30, may be implemented function indicated by step S28
The program of energy, processor 20, which executes program, can be completed step S28.Associative cell 281, the second recognition unit 282, first calculate
Unit 283, the second computing unit 284 and volume determination unit 285 can be stored in can realize respectively in memory 30
Step S281, the program of function indicated by step S282, step S283, step S284 and step S285, processor 20 execute
Step S281 can be completed to step S285 in program.
Specifically, the play time that processor 20 is primarily based on each user speech segment is found out under the play time
A frame or multiframe play image, this frame or multiframe play that image is and the user speech segment under the play time is closed
The broadcasting image of connection.Then, the identification of processor 20 plays the shape of the mouth as one speaks of the personage in image, is calculated according to the width of the shape of the mouth as one speaks and height
The actual aspect ratio of the shape of the mouth as one speaks, then volume amplification factor is calculated based on actual aspect ratio and default the ratio of width to height, it is based ultimately upon volume and puts
Big multiple determines the broadcast sound volume of each user speech segment.The calculating process of above-mentioned broadcast sound volume and foregoing description
The calculation executed in the video dubbing method for dubbing operation in video display process is consistent, and details are not described herein.
Figure 33 is please referred to, in some embodiments, video dubbing method is gone back before step S23 reads video and audio repository
Include:
S29: determine that target dubs role according to the input of user;
Step S24 searched in audio repository with the matched library vocabulary of subtitle file, with the user speech of corresponding library vocabulary
Segment generates the synchronization association information of personalized audio and subtitle file and personalized audio further include:
S245: processing original audio is to mark off the voice messaging and the non-targeted voice for dubbing role that target dubs role
Information;
Step S241 extract subtitle file in multiple subtitle fragments include:
S2411: target subtitle corresponding with the voice messaging that target dubs role is extracted from subtitle file;
S2412: the subtitle fragment in target subtitle is extracted.
Figure 34 is please referred to, in some embodiments, video dubbing installation further includes role determination module 29.Matching module
24 further include division unit 245.Step S29 can be realized by role determination module 29.Step S245 can be by division unit 245
It realizes.Step S2411 and step S2412 can be realized by extraction unit 241.
In other words, role determination module 29 can be used for determining that target dubs role according to the input of user.Division unit
245 can be used for handling original audio to mark off the voice messaging and the non-targeted voice letter for dubbing role that target dubs role
Breath.Extraction unit 241 can be used for extracting target subtitle corresponding with the voice messaging that target dubs role from subtitle file, with
And the subtitle fragment in extraction target subtitle.
Specifically, when user merely desires to dub some or certain several roles in video, user can be first
Setting target dubs role, and processor 20 identifies that target dubs the voice letter of role by Application on Voiceprint Recognition from original audio
Breath, and the voice messaging that removing target dubs role in original audio is classified as the non-targeted voice messaging for dubbing role.With
Afterwards, processor 20 screens target subtitle corresponding with the voice messaging that target dubs role from subtitle file again.Then, it handles
Device 20 found out in audio repository with the matched multiple user speech segments of multiple subtitle fragments in target subtitle, implemented
Journey is consistent with the realization process of foregoing description executed in the video dubbing method for dubbing operation in video display process, herein
It repeats no more.Audio is dubbed in this way, target can be obtained and dub the user of role.Target is dubbed the use of role by processor 20 again
Family dubs audio and merges to obtain personalized audio with the non-targeted original audio for dubbing role.When individualized video plays, play
Target to dub the voice of role be user speech segment, the non-targeted voice for dubbing role of broadcasting is original audio.In this way,
User's property of can choose some or certain several roles in video are dubbed, reinforce the interest that video is dubbed, into one
Step improves the usage experience of user.
Referring again to Fig. 3, the present invention also provides a kind of electronic devices 100.Electronic device 100 includes one or more processing
Device 20, memory 30 and one or more programs.Wherein one or more programs are stored in memory 30, and are configured
It is executed at by one or more processors 20.Program includes dubbing for executing video described in above-mentioned any one embodiment
The instruction of method.
For example, program includes the instruction for executing following steps incorporated by reference to Figure 19:
S23: reading video and audio repository, video include video file, subtitle file and original audio, and audio repository includes library
Vocabulary and user speech segment corresponding with library vocabulary;
S24: searched in audio repository with the matched library vocabulary of subtitle file, with the user speech segment of corresponding library vocabulary
Generate the synchronization association information of personalized audio and subtitle file and personalized audio;With
S25: individualized video is formed according to synchronization association information association video file, subtitle file and personalized audio;
With
S26: personalized audio is played when playing individualized video.
For another example program includes the instruction for executing following steps incorporated by reference to Figure 22:
S221: speech recognition voice is to obtain text information;
S222: dismantling text information is to obtain multiple library vocabulary;
S223: voice is disassembled to obtain multiple user speech pieces corresponding with multiple library vocabulary according to multiple library vocabulary
Section;With
S224: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use
The corresponding library vocabulary of family sound bite.
Referring again to Figure 18, the present invention also provides a kind of computer readable storage mediums.Computer readable storage medium packet
Include the computer program being used in combination with electronic device 100.Computer program can be executed above-mentioned any to complete by processor 20
Video dubbing method described in one embodiment.
For example, computer program can be executed by processor 20 to complete following steps incorporated by reference to Figure 19:
S23: reading video and audio repository, video include video file, subtitle file and original audio, and audio repository includes library
Vocabulary and user speech segment corresponding with library vocabulary;
S24: searched in audio repository with the matched library vocabulary of subtitle file, with the user speech segment of corresponding library vocabulary
Generate the synchronization association information of personalized audio and subtitle file and personalized audio;With
S25: individualized video is formed according to synchronization association information association video file, subtitle file and personalized audio;
With
S26: personalized audio is played when playing individualized video.
For another example computer program can be executed by processor 20 to complete following steps incorporated by reference to Figure 22:
S221: speech recognition voice is to obtain text information;
S222: dismantling text information is to obtain multiple library vocabulary;
S223: voice is disassembled to obtain multiple user speech pieces corresponding with multiple library vocabulary according to multiple library vocabulary
Section;With
S224: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use
The corresponding library vocabulary of family sound bite.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, modifies, replacement and variant.
Claims (15)
1. a kind of video dubbing method, which is characterized in that the video dubbing method includes:
Playing video file;
Processing and the associated subtitle file of the video file from audio repository to read individual character corresponding with current subtitle vocabulary
Change voice document, the audio repository includes at least one personalized speech file, and the personalized speech file includes library vocabulary
And user speech segment corresponding with the library vocabulary;
To the association video file and original audio silence processing corresponding with the current subtitle vocabulary;With
Play the user speech segment in the personalized speech file of the corresponding current subtitle vocabulary.
2. video dubbing method according to claim 1, which is characterized in that the processing is associated with the video file
Subtitle file to include: the step of reading personalized speech file corresponding with current subtitle vocabulary from audio repository
Extract the current subtitle in the subtitle file;
It splits the current subtitle and obtains multiple current subtitle vocabulary and broadcasting corresponding with the current subtitle vocabulary
Time point;With
The library vocabulary with the current subtitle terminology match is searched in the audio repository to obtain the corresponding library vocabulary
The user speech segment;
The step of the user speech segment in the personalized speech file for playing the corresponding current subtitle vocabulary
Suddenly include:
The user speech segment is played in the play time.
3. video dubbing method according to claim 1, which is characterized in that the video dubbing method is filled applied to electronics
It sets, the audio repository is obtained by following steps:
Acquire the voice that user utilizes the electronic device typing;With
Voice described in speech recognition is to obtain the library vocabulary and the user speech segment.
4. video dubbing method according to claim 3, which is characterized in that voice described in the speech recognition is to obtain
The step of stating library vocabulary and the user speech segment include:
Voice described in speech recognition is to obtain text information;
The text information is disassembled to obtain multiple library vocabulary;
The voice is disassembled according to multiple library vocabulary to obtain multiple use corresponding with multiple library vocabulary
Family sound bite;With
Each user speech segment is stored in the audio repository, the file name of each user speech segment is
The corresponding library vocabulary of the user speech segment.
5. video dubbing method according to claim 2, which is characterized in that the video dubbing method is in the broadcasting pair
After the step of answering the user speech segment in the personalized speech file of the current subtitle vocabulary further include:
The broadcast sound volume of the user speech segment of the corresponding shape of the mouth as one speaks is determined according to the shape of the mouth as one speaks of personage in the video file.
6. video dubbing method according to claim 5, which is characterized in that the video file includes multiframe broadcasting figure
Picture, the shape of the mouth as one speaks according to personage in the video file determine the broadcasting sound of the user speech segment of the corresponding shape of the mouth as one speaks
After the step of amount further include:
It is chosen from broadcasting image described in multiframe according to the play time and is broadcast with the current of the user speech fragment association
Put image;
Identify the shape of the mouth as one speaks of personage in the currently playing image;
The actual aspect ratio of the shape of the mouth as one speaks is calculated according to the width of the shape of the mouth as one speaks and height;
Volume amplification factor is calculated according to the actual aspect ratio and default the ratio of width to height;With
The user speech corresponding with the shape of the mouth as one speaks of personage in the currently playing image is determined according to the volume amplification factor
The broadcast sound volume of segment.
7. a kind of video dubbing method, which is characterized in that the video dubbing method includes:
It reads video and audio repository, the video includes video file, subtitle file and original audio, the audio repository includes library
Vocabulary and user speech segment corresponding with the library vocabulary;
Lookup and the matched library vocabulary of the subtitle file in the audio repository, described in the correspondence library vocabulary
User speech segment generates the synchronization association information of personalized audio and the subtitle file and the personalized audio;
Individual character is formed according to video file described in the synchronization association information association, the subtitle file and the personalized audio
Change video;With
The personalized audio is played when playing the individualized video.
8. video dubbing method according to claim 7, which is characterized in that the video dubbing method is filled for electronics
It sets, the audio repository is obtained by following steps:
Acquire the voice that user utilizes the electronic device typing;With
Voice described in speech recognition is to obtain the library vocabulary and the user speech segment.
9. video dubbing method according to claim 8, which is characterized in that voice described in the speech recognition is to obtain
The step of stating library vocabulary and the user speech segment include:
Voice described in speech recognition is to obtain text information;
The text information is disassembled to obtain multiple library vocabulary;
The voice is disassembled according to multiple library vocabulary to obtain multiple use corresponding with multiple library vocabulary
Family sound bite;With
Each user speech segment is stored in the audio repository, the file name of each user speech segment is
The corresponding library vocabulary of the user speech segment.
10. video dubbing method according to claim 8, which is characterized in that lookup and the institute in the audio repository
The matched library vocabulary of subtitle file is stated, to generate personalized sound with the user speech segment of the correspondence library vocabulary
Frequently and the subtitle file and the step of synchronization association information of the personalized audio, include:
Extract multiple subtitle fragments in the subtitle file;
Each subtitle fragment is split to obtain multiple subtitle vocabulary and corresponding multiple with the multiple subtitle vocabulary
Play time;
Multiple user speech segments with multiple subtitle terminology match are searched in the audio repository;With
It is corresponding multiple described according to the combination of the play time dot sequency of multiple subtitle vocabulary and multiple subtitle vocabulary
User speech segment is to form the personalized audio.
11. video dubbing method according to claim 7, which is characterized in that in the individualized video playing process,
The video dubbing method further include:
When playing the individualized video, the original audio silence processing.
12. video dubbing method according to claim 10, which is characterized in that the video dubbing method further include:
The broadcast sound volume of the user speech segment of the corresponding shape of the mouth as one speaks is determined according to the shape of the mouth as one speaks of personage in the video file.
13. video dubbing method according to claim 12, which is characterized in that the video file includes multiframe broadcasting figure
Picture, the shape of the mouth as one speaks according to personage in the video file determine the broadcasting sound of the user speech segment of the corresponding shape of the mouth as one speaks
The step of amount includes:
It is chosen and the user speech piece from broadcasting image described in multiframe according to the play time of the user speech segment
The associated broadcasting image of section;
Identify the shape of the mouth as one speaks of personage in the broadcasting image;
The actual aspect ratio of the shape of the mouth as one speaks is calculated according to the width of the shape of the mouth as one speaks and height;
Volume amplification factor is calculated according to the actual aspect ratio and default the ratio of width to height;With
The user speech segment corresponding with the shape of the mouth as one speaks of personage in the broadcasting image is determined according to the volume amplification factor
Broadcast sound volume.
14. a kind of electronic device characterized by comprising
One or more processors;
Memory;With
One or more programs, wherein one or more of programs are stored in the memory, and be configured to by
One or more of processors execute, and described program includes requiring video described in 1-13 any one to match for perform claim
The instruction of sound method.
15. a kind of computer readable storage medium, which is characterized in that including the computer program being used in combination with electronic device,
The computer program can be executed by processor to complete video dubbing method described in claim 1-13 any one.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811122718.2A CN110149548B (en) | 2018-09-26 | 2018-09-26 | Video dubbing method, electronic device and readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811122718.2A CN110149548B (en) | 2018-09-26 | 2018-09-26 | Video dubbing method, electronic device and readable storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110149548A true CN110149548A (en) | 2019-08-20 |
| CN110149548B CN110149548B (en) | 2022-06-21 |
Family
ID=67589301
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811122718.2A Active CN110149548B (en) | 2018-09-26 | 2018-09-26 | Video dubbing method, electronic device and readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110149548B (en) |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110534131A (en) * | 2019-08-30 | 2019-12-03 | 广州华多网络科技有限公司 | A kind of audio frequency playing method and system |
| CN110691204A (en) * | 2019-09-09 | 2020-01-14 | 苏州臻迪智能科技有限公司 | A kind of audio and video processing method, device, electronic equipment and storage medium |
| CN110769167A (en) * | 2019-10-30 | 2020-02-07 | 合肥名阳信息技术有限公司 | Method for video dubbing based on text-to-speech technology |
| CN111601174A (en) * | 2020-04-26 | 2020-08-28 | 维沃移动通信有限公司 | Method and device for adding subtitles |
| CN112261435A (en) * | 2020-11-06 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Social interaction method, device, system, equipment and storage medium |
| CN112837401A (en) * | 2021-01-27 | 2021-05-25 | 网易(杭州)网络有限公司 | Information processing method and device, computer equipment and storage medium |
| CN113420627A (en) * | 2021-06-15 | 2021-09-21 | 读书郎教育科技有限公司 | System and method capable of generating English dubbing materials |
| CN113825005A (en) * | 2021-09-30 | 2021-12-21 | 北京跳悦智能科技有限公司 | Face video and audio synchronization method and system based on joint training |
| CN114765703A (en) * | 2021-01-13 | 2022-07-19 | 北京中关村科金技术有限公司 | Method and device for dyeing subtitles corresponding to TTS (text to speech) and storage medium |
| CN115171645A (en) * | 2022-06-30 | 2022-10-11 | 北京有竹居网络技术有限公司 | Dubbing method and device, electronic equipment and storage medium |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH06162166A (en) * | 1992-10-20 | 1994-06-10 | Sony Corp | Image creation device |
| JP2003047030A (en) * | 2001-07-31 | 2003-02-14 | Shibasoku:Kk | Lip sync signal generation apparatus |
| US6766299B1 (en) * | 1999-12-20 | 2004-07-20 | Thrillionaire Productions, Inc. | Speech-controlled animation system |
| CN101930747A (en) * | 2010-07-30 | 2010-12-29 | 四川微迪数字技术有限公司 | Method and device for converting voice into mouth shape image |
| JP2011077883A (en) * | 2009-09-30 | 2011-04-14 | Fujifilm Corp | Image file producing method, program for the method, recording medium of the program, and image file producing apparatus |
| CN102054287A (en) * | 2009-11-09 | 2011-05-11 | 腾讯科技(深圳)有限公司 | Facial animation video generating method and device |
| CN104732593A (en) * | 2015-03-27 | 2015-06-24 | 厦门幻世网络科技有限公司 | Three-dimensional animation editing method based on mobile terminal |
| CN104967789A (en) * | 2015-06-16 | 2015-10-07 | 福建省泉州市气象局 | Automatic processing method and system for city window weather dubbing |
| CN106060424A (en) * | 2016-06-14 | 2016-10-26 | 徐文波 | Video dubbing method and device |
| CN107396177A (en) * | 2017-08-28 | 2017-11-24 | 北京小米移动软件有限公司 | Video broadcasting method, device and storage medium |
| CN107451564A (en) * | 2017-07-31 | 2017-12-08 | 上海爱优威软件开发有限公司 | A kind of human face action control method and system |
| US20180253881A1 (en) * | 2017-03-03 | 2018-09-06 | The Governing Council Of The University Of Toronto | System and method for animated lip synchronization |
-
2018
- 2018-09-26 CN CN201811122718.2A patent/CN110149548B/en active Active
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH06162166A (en) * | 1992-10-20 | 1994-06-10 | Sony Corp | Image creation device |
| US6766299B1 (en) * | 1999-12-20 | 2004-07-20 | Thrillionaire Productions, Inc. | Speech-controlled animation system |
| JP2003047030A (en) * | 2001-07-31 | 2003-02-14 | Shibasoku:Kk | Lip sync signal generation apparatus |
| JP2011077883A (en) * | 2009-09-30 | 2011-04-14 | Fujifilm Corp | Image file producing method, program for the method, recording medium of the program, and image file producing apparatus |
| CN102054287A (en) * | 2009-11-09 | 2011-05-11 | 腾讯科技(深圳)有限公司 | Facial animation video generating method and device |
| CN101930747A (en) * | 2010-07-30 | 2010-12-29 | 四川微迪数字技术有限公司 | Method and device for converting voice into mouth shape image |
| CN104732593A (en) * | 2015-03-27 | 2015-06-24 | 厦门幻世网络科技有限公司 | Three-dimensional animation editing method based on mobile terminal |
| CN104967789A (en) * | 2015-06-16 | 2015-10-07 | 福建省泉州市气象局 | Automatic processing method and system for city window weather dubbing |
| CN106060424A (en) * | 2016-06-14 | 2016-10-26 | 徐文波 | Video dubbing method and device |
| US20180253881A1 (en) * | 2017-03-03 | 2018-09-06 | The Governing Council Of The University Of Toronto | System and method for animated lip synchronization |
| CN107451564A (en) * | 2017-07-31 | 2017-12-08 | 上海爱优威软件开发有限公司 | A kind of human face action control method and system |
| CN107396177A (en) * | 2017-08-28 | 2017-11-24 | 北京小米移动软件有限公司 | Video broadcasting method, device and storage medium |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110534131A (en) * | 2019-08-30 | 2019-12-03 | 广州华多网络科技有限公司 | A kind of audio frequency playing method and system |
| CN110691204A (en) * | 2019-09-09 | 2020-01-14 | 苏州臻迪智能科技有限公司 | A kind of audio and video processing method, device, electronic equipment and storage medium |
| CN110691204B (en) * | 2019-09-09 | 2021-04-02 | 苏州臻迪智能科技有限公司 | Audio and video processing method and device, electronic equipment and storage medium |
| CN110769167A (en) * | 2019-10-30 | 2020-02-07 | 合肥名阳信息技术有限公司 | Method for video dubbing based on text-to-speech technology |
| CN111601174A (en) * | 2020-04-26 | 2020-08-28 | 维沃移动通信有限公司 | Method and device for adding subtitles |
| CN112261435A (en) * | 2020-11-06 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Social interaction method, device, system, equipment and storage medium |
| CN114765703A (en) * | 2021-01-13 | 2022-07-19 | 北京中关村科金技术有限公司 | Method and device for dyeing subtitles corresponding to TTS (text to speech) and storage medium |
| CN114765703B (en) * | 2021-01-13 | 2023-07-07 | 北京中关村科金技术有限公司 | Method and device for dyeing TTS voice corresponding subtitle and storage medium |
| CN112837401A (en) * | 2021-01-27 | 2021-05-25 | 网易(杭州)网络有限公司 | Information processing method and device, computer equipment and storage medium |
| CN112837401B (en) * | 2021-01-27 | 2024-04-09 | 网易(杭州)网络有限公司 | Information processing method, device, computer equipment and storage medium |
| CN113420627A (en) * | 2021-06-15 | 2021-09-21 | 读书郎教育科技有限公司 | System and method capable of generating English dubbing materials |
| CN113825005A (en) * | 2021-09-30 | 2021-12-21 | 北京跳悦智能科技有限公司 | Face video and audio synchronization method and system based on joint training |
| CN113825005B (en) * | 2021-09-30 | 2024-05-24 | 北京跳悦智能科技有限公司 | Face video and audio synchronization method and system based on joint training |
| CN115171645A (en) * | 2022-06-30 | 2022-10-11 | 北京有竹居网络技术有限公司 | Dubbing method and device, electronic equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110149548B (en) | 2022-06-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110149548A (en) | Video dubbing method, electronic device and readable storage medium storing program for executing | |
| CN110634483B (en) | Human-computer interaction method, device, electronic device and storage medium | |
| CN107193841B (en) | Method and device for accelerating playing, transmitting and storing of media file | |
| CN104133851B (en) | The detection method and detection device of audio similarity, electronic equipment | |
| CN108231059A (en) | Treating method and apparatus, the device for processing | |
| CN107705783A (en) | A kind of phoneme synthesizing method and device | |
| US20100298959A1 (en) | Speech reproducing method, speech reproducing device, and computer program | |
| US12019676B2 (en) | Method and system for presenting a multimedia stream | |
| JP2011217197A (en) | Electronic apparatus, reproduction control system, reproduction control method, and program thereof | |
| JP2011239141A (en) | Information processing method, information processor, scenery metadata extraction device, lack complementary information generating device and program | |
| CN114911448B (en) | Data processing method, device, equipment and medium | |
| WO2014100893A1 (en) | System and method for the automated customization of audio and video media | |
| CN113821188A (en) | Method, device, electronic device and storage medium for adjusting audio playback speed | |
| US10089898B2 (en) | Information processing device, control method therefor, and computer program | |
| CN110442867A (en) | Image processing method, device, terminal and computer storage medium | |
| CN110324702B (en) | Information pushing method and device in video playing process | |
| CN110992984B (en) | Audio processing method and device and storage medium | |
| KR102797767B1 (en) | Playback control of scene descriptions | |
| JP2003037826A (en) | Substitute image display and tv phone apparatus | |
| CN113538628A (en) | Expression package generation method and device, electronic equipment and computer readable storage medium | |
| CN114339391A (en) | Video data processing method, video data processing device, computer equipment and storage medium | |
| KR101920653B1 (en) | Method and program for edcating language by making comparison sound | |
| CN112562687A (en) | Audio and video processing method and device, recording pen and storage medium | |
| CN100538823C (en) | Language auxiliary expression system and method | |
| CN112492400A (en) | Interaction method, device, equipment, communication method and shooting method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |