CN110149548A

CN110149548A - Video dubbing method, electronic device and readable storage medium storing program for executing

Info

Publication number: CN110149548A
Application number: CN201811122718.2A
Authority: CN
Inventors: 刘玉杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2019-08-20
Anticipated expiration: 2038-09-26
Also published as: CN110149548B

Abstract

The invention discloses a kind of video dubbing method, electronic device and computer readable storage mediums.Video dubbing method includes: playing video file；Processing and the associated subtitle file of video file from audio repository to read personalized speech file corresponding with current subtitle vocabulary, audio repository includes at least one personalized speech file, and personalized speech file includes library vocabulary and user speech segment corresponding with library vocabulary；To associated video file and original audio silence processing corresponding with current subtitle vocabulary；Play the user speech segment in the personalized speech file of corresponding current subtitle vocabulary.Video dubbing method, electronic device and the computer readable storage medium of embodiment of the present invention, when playing video, electronic device can play the audio dubbed by user, and the interaction of electronic device and user when strengthening video playing promotes the interest of video playing.

Description

Video dubbing method, electronic device and readable storage medium storing program for executing

Technical field

The present invention relates to technical field of video processing, in particular to a kind of video dubbing method, electronic device and computer Readable storage medium storing program for executing.

Background technique

Currently, the majority video such as TV play, film is all by the multiframe figure for dubbing information and shooting of professional voice-over actor As encapsulation obtains.Although this video after professional voice-over actor dubs has a higher ornamental value, but the friendship with user Mutual property is poor, and interest is relatively low.

Summary of the invention

The embodiment provides a kind of video dubbing method, electronic device and computer readable storage mediums.

The video dubbing method of embodiment of the present invention includes: playing video file；Processing is associated with the video file Subtitle file to read corresponding with current subtitle vocabulary personalized speech file from audio repository, the audio repository is including extremely It is few that property voice document, the personalized speech file include library vocabulary and user speech corresponding with the library vocabulary one by one Segment；To the association video file and original audio silence processing corresponding with the current subtitle vocabulary；Play corresponding institute State the user speech segment in the personalized speech file of current subtitle vocabulary.

The video dubbing method of embodiment of the present invention includes: to read video and audio repository, and the video includes video text Part, subtitle file and original audio, the audio repository include library vocabulary and user speech segment corresponding with the library vocabulary；? Searched in the audio repository with the matched library vocabulary of the subtitle file, with user's language of the correspondence library vocabulary Tablet section generates the synchronization association information of personalized audio and the subtitle file and the personalized audio；According to described same It walks related information and is associated with the video file, the subtitle file and personalized audio formation individualized video；It is playing The personalized audio is played when the individualized video.

The electronic device of embodiment of the present invention includes one or more processors, memory and one or more programs. Wherein one or more of programs are stored in the memory, and are configured to by one or more of processors It executes, described program includes the instruction for executing above-mentioned video dubbing method.

The computer readable storage medium of embodiment of the present invention, including the computer journey being used in combination with electronic device Sequence, the computer program can be executed by processor to complete the instruction of above-mentioned video dubbing method.

Video dubbing method, electronic device and the computer readable storage medium of embodiment of the present invention are playing video When, electronic device can play the audio dubbed by user, the interaction of electronic device and user when strengthening video playing, be promoted The interest of video playing.

Additional aspect and advantage of the invention will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:

Fig. 1 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Fig. 2 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Fig. 3 is the structural schematic diagram of the electronic device of certain embodiments of the present invention.

Fig. 4 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Fig. 5 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Fig. 6 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Fig. 7 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Fig. 8 is the module diagram of the identification module of the video dubbing installation of certain embodiments of the present invention.

Fig. 9 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 10 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 11 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 12 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 13 is the flow diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 14 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 15 is the module diagram of the volume determining module of the video dubbing installation of certain embodiments of the present invention.

Figure 16 is the flow diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 17 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 18 is the electronic device of certain embodiments of the present invention and the connection schematic diagram of computer readable storage medium.

Figure 19 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 20 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 21 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 22 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 23 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 24 is the module diagram of the identification module of the video dubbing installation of certain embodiments of the present invention.

Figure 25 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 26 is the module diagram of the matching module of the video dubbing installation of certain embodiments of the present invention.

Figure 27 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 28 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 29 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 30 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 31 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Figure 32 is the module diagram of the volume determining module of the video dubbing installation of certain embodiments of the present invention.

Figure 33 is the flow diagram of the video dubbing method of certain embodiments of the present invention.

Figure 34 is the module diagram of the video dubbing installation of certain embodiments of the present invention.

Specific embodiment

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.

Also referring to Fig. 1 and Fig. 3, the present invention provides a kind of video dubbing method, is used for electronic device 100.Video is matched Sound method includes:

S12: playing video file；

S14: processing and the associated subtitle file of video file are corresponding with current subtitle vocabulary to read from audio repository Personalized speech file, audio repository include at least one personalized speech file, personalized speech file include library vocabulary and with The corresponding user speech segment of library vocabulary；

S16: to associated video file and original audio silence processing corresponding with current subtitle vocabulary；With

S18: the user speech segment in the personalized speech file of corresponding current subtitle vocabulary is played.

Referring to Figure 2 together and Fig. 3, the present invention also provides a kind of video dubbing installations 10.Video dubbing installation 10 can be used In electronic device 100.The video dubbing method of embodiment of the present invention can be realized by video dubbing installation 10.Video dubs dress Setting 10 includes the first playing module 12, first processing module 14, Second processing module 16 and the second playing module 18.Step S12 It can be realized by the first playing module 12.Step S14 can be realized by first processing module 14.Step S16 can be by second It manages module 16 to realize, step S18 can be realized by the second playing module 18.

In other words, the first playing module 12 can be used for playing video file.First processing module 14 can be used for handle with The associated subtitle file of video file to read corresponding with current subtitle vocabulary personalized speech file from audio repository, In, audio repository includes at least one personalized speech file, and personalized speech file includes library vocabulary and corresponding with library vocabulary User speech segment.Second processing module 16 can be used for associated video file and original audio corresponding with current subtitle vocabulary Silence processing.Second playing module 18 can be used for playing the user speech in the personalized speech file of corresponding current subtitle vocabulary Segment.

Wherein, electronic device 100 can be mobile phone, tablet computer, laptop, desktop computer, wearable device (such as Smartwatch, Intelligent bracelet, intelligent glasses, intelligent helmet etc.) etc..

Electronic device 100 includes display screen 40, processor 20, memory 30 and electroacoustic component 50 (for example, loudspeaker, ear Machine etc.).First playing module 12 can be the display screen 40 of electronic device 100, for showing broadcasting image.First processing module 14 and Second processing module 16 can be stored in can realize respectively indicated by step S14 and step S16 in memory 30 Function program, processor 20 execute program step S14 and step S16 can be completed.Second playing module 18 can be electricity The electroacoustic component 50 of sub-device 100, for playing audio.

Video is usually made of video file, subtitle file and audio file three parts.Wherein, video file is by carrying The multiframe of timestamp (i.e. play time) plays image composition, and multiframe plays image and broadcast with being greater than the distinguishable frame per second of human eye Smooth dynamic menu is formed after putting.Audio file generally comprises audio and timestamp (i.e. play time), audio instruction The speech content of the personage in image is played, the timestamp of audio is used for timestamp (i.e. play time) phase with broadcasting image Matching is to realize being played simultaneously for video file and audio file.Subtitle file include text information corresponding with audio and Play time corresponding with text information, the play time of subtitle file can be used for and the timestamp of audio, broadcasting image Timestamp match to realize being played simultaneously for video file, subtitle file, audio file three.

Audio file in current video (such as TV play, film etc.) is all by acquiring matching for professional voice-over actor What sound obtained.Although this video after voice-over actor dubs has higher ornamental value, but when user's broadcasting video, it can only Hear dubbing for professional voice-over actor, in this way, the broadcast mode of video is more single, interest poor with the interactivity of user It is lower.

In the video dubbing method of embodiment of the present invention, electronic device 100 acquires user in daily life various Voice is simultaneously stored into memory 30, then identifies text information corresponding with voice, and then, fractionation text information forms multiple Library vocabulary, then multiple user speech segments are formed based on library vocabulary splitting voice, to establish a library vocabulary and user speech The corresponding audio repository of segment.

Incorporated by reference to Fig. 4, when user inputs the instruction for playing video on electronic device 100, electronic device 100 first will Original audio silence processing in video.Then, subtitle file is extracted from video, and the subtitle in subtitle file is torn open Divide to obtain multiple subtitle vocabulary and play time corresponding with multiple subtitle vocabulary.Then, electronic device 100 plays view Frequency file plays multiframe with certain frame per second and plays image.In each of video file playing process play time Under, electronic device 100 searches subtitle vocabulary (the i.e. current subtitle word with broadcasting needed for current play time in audio repository Converge) corresponding personalized speech file, i.e. user speech segment；After finding out user speech segment, the electroacoustic of electronic device 100 Element 50 plays user speech segment.Under next play time, electronic device 100 continues to execute lookup and next broadcasting The operation for the user speech segment that the corresponding user speech segment of subtitle vocabulary and broadcasting played needed for time point is found out, such as This moves in circles, until all broadcasting images in video file finish.

When playing video, electronic device 100 can be played is matched the video dubbing method of embodiment of the present invention by user The personalized speech file of sound, the interaction of electronic device 100 and user when strengthening video playing, promotes the entertaining of video playing Property.

Also referring to Fig. 3, Fig. 5 and Fig. 6, in some embodiments, audio repository can be obtained by following steps.? I other words video dubbing method is before step S12 playing video file further include:

S111: acquisition user utilizes the voice of 100 typing of electronic device；With

S112: speech recognition voice is to obtain library vocabulary and user speech segment.

Wherein, step S112 further comprises:

S1121: speech recognition voice is to obtain text information；

S1122: dismantling text information is to obtain multiple library vocabulary；

S1123: voice is disassembled to obtain multiple user speech corresponding with multiple library vocabulary according to multiple library vocabulary Segment；With

S1124: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use The corresponding library vocabulary of family sound bite.

Also referring to Fig. 3, Fig. 7 and Fig. 8, in some embodiments, video dubbing installation 10 includes acquisition module 111 With identification module 112.Identification module 112 includes that the first recognition unit 1121, first dismantling unit 1122, second disassembles unit 1123 and storage unit 1124.Step S111 can be realized by acquisition module 111.Step S112 can be real by identification module 112 It is existing.Step S1121 can be realized by the first recognition unit 1121.Step S1122 can be realized by the first dismantling unit 1122.Step Rapid S1123 can be realized by the second dismantling unit 1123.Step S1124 can be realized by storage unit 1124.

In other words, acquisition module 111 can be used for acquiring the voice that user utilizes 100 typing of electronic device.Identification module 112 can be used for voice described in speech recognition to obtain library vocabulary and user speech segment.First recognition unit 1121 can be used for language Sound identifies voice to obtain text information.First dismantling unit 1122 can be used for disassembling text information to obtain multiple library vocabulary. Second dismantling unit 1123 can be used for disassembling voice according to multiple library vocabulary corresponding multiple with multiple library vocabulary to obtain User speech segment.Storage unit 1124 can be used for for each user speech segment being stored in audio repository, each user speech The file name of segment is the corresponding library vocabulary of user speech segment.

Wherein, acquisition module 111 can be the acoustoelectric element 60 being arranged on electronic device 100, such as microphone.Identification Module 112 can be stored in the program that function indicated by step S112 may be implemented in memory 30, and processor 20 is held Step S112 can be completed in line program.First recognition unit 1121, first dismantling unit 1122, second is disassembled unit 1123 and is deposited Storage unit 1124 can be stored in the subprogram being located under program corresponding with identification module 112 in memory 30.Processing Step S1121 may be implemented to step S1124 in 20 execution subroutine of device.Storage unit 1124 can be in electronic device 100 Memory 30.

The trigger condition that electronic device 100 acquires the various voices of user in daily life can be user's input, i.e., The sound-recording function of electronic device 100 is opened by user, user is spoken by concentration to improve the recording of multiclass vocabulary at this time.Or Person, the trigger condition that electronic device 100 acquires the various voices of user in daily life are also possible to by electronic device 100 certainly Dynamic triggering.For example, dial or receive calls in electronic device 100 (such as dial and receive calls, the voice in social software it is logical Words, video calling etc.) when, electronic device 100 opens the voice in communication process of the sound-recording function to record user；Alternatively, with The opening time section of the sound-recording function of electronic device 100 is arranged according to own actual situation for family, for example, user is in setting recording function The opening time section of energy is daily 19:00-21:00, then electronic device 100 opens sound-recording function in daily 19:00, daily 21:00 close sound-recording function, user set the sound-recording function opening time section mode can persistently be opened to avoid sound-recording function Lead to the higher problem of the energy consumption of electronic device 100；Alternatively, electronic device 100 opens recording function every the first predetermined amount of time Can, and sound-recording function is closed after the second predetermined amount of time, in this way, sound-recording function is opened without lasting, the energy of electronic device 100 It consumes lower.

Incorporated by reference to Fig. 9, after the acoustoelectric element 60 on electronic device 100 acquires the voice of user, processor 20 is first to voice It is identified to obtain text information, then controls memory 30 and store the text information after identification.Wherein, speech recognition forms text The process of information specifically includes: firstly, processor 20 cuts off the mute of the head and the tail both ends of voice, being done with reducing to subsequent processing It disturbs.Then, voice is divided into multiframe using mobile window function by processor 20.Then, processor 20 does acoustics to each frame voice Feature extraction, then the acoustic feature of each frame voice is matched with acoustic model with the state of each frame voice of determination, and Combinations of states based on multiframe voice finally can be realized phonotactics voice and turn to text information at phoneme at word It changes.

After processor 20 converts speech into text information, it can further pass through participle technique, such as Forward Maximum Method Method, reverse maximum matching method, minimum syncopation, two-way maximum matching method etc. are disassembled text information to obtain multiple library words It converges.Then, voice dismantling is one-to-one with library vocabulary according to the corresponding relationship of multiple library vocabulary and voice by processor 20 Multiple user speech segments.Under catalogue where user speech segment is finally stored in 30 sound intermediate frequency library of memory, each user The file name of sound bite is library vocabulary corresponding with the user speech segment.

For example, the processor 20 in electronic device 100 controls acoustoelectric element 60 first and acquires " today that user issues Weather is very good " voice.Then, the voice of " today, weather was very good " is converted to " today day by speech recognition by processor 20 Gas is very good " text information.Then, it is more that processor 20, which is based on participle technique for the text information dismantling of " today, weather was very good ", A library vocabulary, such as " today ", " weather ", " very good " three library vocabulary.Then, processor 20 is corresponding according to the library vocabulary disassembled Dismantling voice obtains multiple user speech segments, for example, library vocabulary " today " is corresponding user speech segment " today ", library vocabulary " weather " is corresponding user speech segment " weather ", and library vocabulary is " very good " corresponding user speech segment " very good ".Finally, memory 30 By " today ", " weather ", " very good " three user speech segments storages into audio repository, the filename of each user speech segment Library vocabulary referred to as corresponding with the user speech segment.For example, storage file entitled " today of user speech segment " today " .mp3 ", the storage file of user speech segment " weather " is entitled " weather .mp3 ", the storage file of user speech segment " very good " Entitled " very good .mp3 ".After user speech segment can be convenient using library corresponding with user speech segment vocabulary as file name The search of user speech segment is directly carried out in continuous step according to file name.

In this way, electronic device 100 enriches the user in audio repository by the voice of multi collect, identification and dismantling user Sound bite, user speech segment abundant are conducive to promote the integrality that video is dubbed.

Referring to Fig. 10, in some embodiments, step S14 processing and the associated subtitle file of video file are with from sound Personalized speech file corresponding with current subtitle vocabulary is read in frequency library includes:

S141: the current subtitle in subtitle file is extracted；

S142: it splits current subtitle and obtains multiple current subtitle vocabulary and play time corresponding with current subtitle vocabulary Point；With

S143: the library vocabulary with current subtitle terminology match is searched in audio repository to obtain user's language of corresponding library vocabulary Tablet section；

The user speech segment that step S18 is played in the personalized speech file of corresponding current subtitle vocabulary includes:

S181: user speech segment is played in play time.

Also referring to Fig. 3 and Figure 11, in some embodiments, first processing module 14 includes extraction unit 141, tears open Sub-unit 142 and searching unit 143.Step S141 can be realized by extraction unit 141.Step S142 can be by split cells 142 realize.Step S143 can be realized by searching unit 143.Step S181 can be realized by the second playing module 18.

In other words, extraction unit 141 can be used for extracting the current subtitle in subtitle file.Split cells 142 can be used for It splits current subtitle and obtains multiple current subtitle vocabulary and play time corresponding with current subtitle vocabulary.Searching unit 143 can be used for searching the library vocabulary with current subtitle terminology match in audio repository to obtain the user speech piece of corresponding library vocabulary Section.Second playing module 18 can be further used for playing user speech segment in play time.

Wherein, extraction unit 141, split cells 142 and searching unit 143 can be stored in memory 30 can be with The program of function indicated by step S141, step S142 and step S143 is realized respectively, and processor 20 executes program can be complete At step S141, step S142 and step S143.

Specifically, the subtitle file in each video usually includes text information corresponding with audio and and text The corresponding play time of information.When video playing, the timestamp of audio and the play time of subtitle vocabulary are based on a ginseng It examines clock and does synchronization, to realize being played simultaneously for audio file and subtitle file.For the user speech segment in audio repository is made It is played out for personalized speech file, processor 20 extracts multiple current subtitles in subtitle file, current subtitle packet first Include the particular content and play time of subtitle.Processor 20 split current subtitle obtain multiple current subtitle vocabulary and with it is more A current subtitle vocabulary matched multiple play times one by one.

By taking subtitle file format is SRT format as an example, " your fish ball is bothered in a certain sentence lines, such as " wheat pocket story " The corresponding current subtitle form of the lines of asperities " specifically: " your fish ball of 00:00:00,000-- > 00:00:04,400 trouble is thick Face ".Wherein, " 00:00:00,000-- > 00:00:04,400 " be play time, and " trouble your fish ball asperities " is and broadcasts Put time point the specific word content in " 00:00:00,000-- > 00:00:04,400 " corresponding current subtitle.Processor 20 Current subtitle is split to obtain multiple current subtitle vocabulary based on participle technique.Specifically, processor 20, which is incited somebody to action, " bothers you This current subtitle of fish ball asperities " split with obtain multiple current subtitle vocabulary and with current subtitle vocabulary it is matched one by one Multiple play times, be respectively as follows: " trouble -00:00:00,000-- > 00:00:01,000 ", " you are -00:00:01, and 000-- > 00:00:02,000 ", " fish ball -00:00:02,000-- > 00:00:03,000 ", " asperities -00:00:03,000-- > 00:00: 04,400”。

In video file playing process, processor 20 is searched and the use of each current subtitle terminology match in audio repository Family sound bite e.g. inscribes lookup user speech segment corresponding with current subtitle vocabulary " trouble " in 00:00:00,000 It " trouble .mp3 " and plays user speech segment " trouble .mp3 ", lookup and current subtitle vocabulary is inscribed in 00:00:01,000 " you " corresponding user speech segment " your .mp3 " simultaneously plays user speech segment " your .mp3 ", inscribes in 00:00:02,000 It searches user speech segment " fish ball .mp3 " corresponding with current subtitle vocabulary " fish ball " and plays user speech segment " fish ball .mp3 ", lookup user speech segment " asperities corresponding with current subtitle vocabulary " asperities " is inscribed in 00:00:03,000 .mp3 " and user speech segment " asperities .mp3 " is played.

In this way, can play the voice dubbed by user when video playing, realize that the personalization of user is dubbed, user experience More preferably.

In some embodiments, subtitle file can be hard subtitle or soft subtitile.Hard subtitle is also referred to as " embedded subtitle ", i.e., Multiple current subtitles have played compression of images in same group of data with the multiframe in video file, can not as watermark Separation.Soft subtitile is also referred to as " plug-in subtitle ", i.e., subtitle file and video file are independent at two parts data, then are encapsulated into a view In frequency.The subtitle file of above-mentioned SRT format is soft subtitile.When subtitle file is soft subtitile, current subtitle and correspondence are current The play time of subtitle can be extracted directly.But when subtitle file is hard subtitle, current subtitle and corresponding current subtitle Play time can not directly extract.At this point, processor 20 can pass through the text of identification video file played in image Word realizes the extraction of current subtitle.It is appreciated that playing image under hard subtitle and directly containing the information of current subtitle, i.e., Current subtitle, which has been directly embedded into, to be played in image, at this point it is possible to extract current subtitle in broadcasting figure from playing in image The locating region as in, and the text in the region is identified to extract current subtitle.Further, since every broadcasting image has Corresponding timestamp, processor 20 can calculate this based on the timestamp of one or more broadcasting image corresponding with current subtitle The play time of current subtitle.In this way, current subtitle and corresponding current word can be extracted when subtitle file is hard subtitle The play time of curtain.

The video dubbing method of embodiment of the present invention is provided simultaneously with the subtitle processing capacity of soft subtitile and hard subtitle no matter Subtitle file in video is soft subtitile or hard subtitle, can realize that the personalization of user is dubbed, user experience is more preferably.

Also referring to Figure 12 and Figure 13, in some embodiments, video dubbing method is played in step S18 to should After the step of user speech segment in the personalized speech file of preceding subtitle vocabulary further include:

S19: the broadcast sound volume of the user speech segment of the corresponding shape of the mouth as one speaks is determined according to the shape of the mouth as one speaks of personage in video file.

Wherein, step S19 further comprises:

S191: the currently playing figure chosen in image with user speech fragment association is played from multiframe according to play time Picture；

S192: the shape of the mouth as one speaks of personage in currently playing image is identified；

S193: the actual aspect ratio of the shape of the mouth as one speaks is calculated according to the width of the shape of the mouth as one speaks and height；

S194: volume amplification factor is calculated according to actual aspect ratio and default the ratio of width to height；With

S195: user speech segment corresponding with the shape of the mouth as one speaks of personage in currently playing image is determined according to volume amplification factor Broadcast sound volume.

Also referring to Fig. 3, Figure 14 and Figure 15, in some embodiments, video dubbing installation 10 further includes that volume is true Cover half block 19.Volume determining module 19 includes selection unit 191, the second recognition unit 192, the first computing unit 193, second meter Calculate unit 194 and volume determination unit 195.Step S19 can be realized by volume determining module 19.Step S191 can be by choosing Unit 191 is realized.Step S192 can be realized by the second recognition unit 192.Step S193 can be real by the first computing unit 193 It is existing.Step S194 can be realized by the second computing unit 194.Step S195 can be realized by volume determination unit 195.

In other words, volume determining module 19 can be used for determining the use of the corresponding shape of the mouth as one speaks according to the shape of the mouth as one speaks of personage in video file The broadcast sound volume of family sound bite.Selection unit 191, which can be used for being played according to play time from multiframe, to be chosen and is used in image The associated currently playing image of family sound bite.Second recognition unit 192 can be used to identify the mouth of personage in currently playing image Type.First computing unit 193 can be used for calculating the actual aspect ratio of the shape of the mouth as one speaks according to the width and height of the shape of the mouth as one speaks.Second computing unit 194 can be used for calculating volume amplification factor according to actual aspect ratio and default the ratio of width to height.Volume determination unit 195 can be used for basis Volume amplification factor determines the broadcast sound volume of user speech segment corresponding with the shape of the mouth as one speaks of personage in currently playing image.

Wherein, volume determining module 19, which can be stored in memory 30, may be implemented function indicated by step S19 The program of energy, processor 20, which executes program, can be completed step S19.Selection unit 191, the second recognition unit 192, first calculate Unit 193, the second computing unit 194 and volume determination unit 195 can be stored in can realize respectively in memory 30 Step S191, the program of function indicated by step S192, step S193, step S194 and step S195, processor 20 execute Step S191 can be completed to step S195 in program.

Specifically, it dubs and usually requires to rise and fall in view of the emotion of role, the embodiment that usual emotion rises and falls in dubbing is The volume change of audio.Therefore, the use when playing the personalized speech file that user dubs, in each personalized speech file The emotion that the broadcast sound volume of family sound bite is also required to adapt to role rises and falls and makees corresponding variation.Wherein, it can be regarded by identification The ratio of width to height of the shape of the mouth as one speaks of personage determines each use in the currently playing image with each user speech fragment association of frequency file The broadcast sound volume that family sound bite needs.Specifically, each user speech segment has a play time, plays figure As being also corresponding with play time.Therefore, for the play time of each user speech segment, processor 20 is primarily based on Play time plays the frame that the corresponding play time is found out in image from multiframe or multiframe plays image, this frame or more It is the currently playing image with the user speech fragment association under the play time that frame, which plays image,.Then, processor 20 The personage in the currently playing image of each frame is identified according to face recognition algorithms, further identifies the shape of the mouth as one speaks of personage.With Afterwards, processor 20 calculates actual aspect ratio (the i.e. width and height of the shape of the mouth as one speaks according to the width and height of the shape of the mouth as one speaks recognized Than).When calculating the actual aspect ratio of the shape of the mouth as one speaks, if playing image has multiframe, processor 20 can recognize multiple shape of the mouth as one speaks, can use at this time Intermediate value, average value or maximum value of the width of multiple shape of the mouth as one speaks etc. accordingly take the height of multiple shape of the mouth as one speaks as final width (that is, when width takes intermediate value, also correspondence takes intermediate value to height as final height for intermediate value, average value or maximum value etc.；Work as width When being averaged, also correspondence is averaged height；When width is maximized, also correspondence is maximized height)；Alternatively, processing Device 20 calculates the actual aspect ratio that every frame plays the shape of the mouth as one speaks in image first, then takes intermediate value to the multiple actual aspect ratios found out, puts down Mean value or maximum value etc. are as final actual aspect ratio.Then, processor 20 calculates the actual aspect ratio (developed width of the shape of the mouth as one speaks With the ratio of actual height) with the ratio of default the ratio of width to height, which is volume amplification factor.In this way, processor 20 can be counted Multiple volume amplification factors are calculated, each volume amplification factor is by personage in an associated frame or the currently playing image of multiframe The shape of the mouth as one speaks be calculated, a frame or the currently playing image of multiframe are associated with user speech segment, processor 20 according to One frame or the corresponding volume amplification factor calculating of the currently playing image of multiframe are associated with a frame or the currently playing image of multiframe User speech segment broadcast sound volume, i.e. broadcast sound volume=default broadcast sound volume × volume amplification factor.

In addition, there may be multiple personages in the currently playing image of a frame, then processor 20 handles the currently playing figure of a frame The actual aspect ratio of multiple shape of the mouth as one speaks can be obtained as after.At this point, processor 20 rejects people not opening in currently playing image first The actual aspect ratio of the corresponding shape of the mouth as one speaks of object, specifically, for example, one detection threshold value of setting, is less than in the actual aspect ratio of the shape of the mouth as one speaks Illustrate that the corresponding personage of the shape of the mouth as one speaks is not opening when detection threshold value to speak, then rejects the actual aspect ratio of the corresponding shape of the mouth as one speaks of the personage. After the actual aspect ratio for rejecting the corresponding shape of the mouth as one speaks of personage not opening in currently playing image, if only remaining next practical wide high The value of ratio then directly calculates volume amplification factor with the actual aspect ratio.It is corresponding in the personage for rejecting not opening in broadcasting image The shape of the mouth as one speaks actual aspect ratio after, if there remains the value of multiple actual aspect ratios, in the actual aspect ratio for taking multiple shape of the mouth as one speaks Value, average value or maximum value etc. calculate volume amplification factor.In this way, the standard for the volume amplification factor being calculated can be promoted True property.

Figure 16 is please referred to, in some embodiments, video dubbing method is before step S12 further include:

S113: determine that target dubs role according to the input of user；

Step S14 processing is corresponding with current subtitle vocabulary to read from audio repository with the associated subtitle file of video file Personalized speech file further include:

S144: processing original audio is to mark off the voice messaging and the non-targeted voice for dubbing role that target dubs role Information；

Step S141 extract subtitle file in current subtitle include:

S1411: target subtitle corresponding with the voice messaging that target dubs role is extracted from subtitle file；

S1412: the current subtitle in target subtitle is extracted.

Also referring to Fig. 3 and Figure 17, in some embodiments, video dubbing installation 10 further includes role determination module 113.First processing module 14 further includes division unit 144.Step S113 can be realized by role determination module 113.Step S144 can be realized by division unit 144.Step S1411 and step S1412 can be realized by extraction unit 141.

In other words, role determination module 113 can be used for determining that target dubs role according to the input of user.Division unit 144 can be used for handling original audio to mark off the voice messaging and the non-targeted voice letter for dubbing role that target dubs role Breath.Extraction unit 141 can be used for extracting from subtitle file corresponding with the voice messaging that target dubs role target subtitle and Extract the current subtitle in target subtitle.

Wherein, role determination module 113, division unit 144 can be stored in can realize respectively in memory 30 The program of function indicated by step S113 and step S144, processor 20, which executes program, can be completed step S113 and step S144.Extraction unit 141 is also possible to store may be implemented indicated by step S1411 and step S1412 in memory 30 Function program, processor 20 execute program step S1411 and step S1412 can be completed.

It is appreciated that in some cases, user merely desires to dub some or certain several roles in video.This When, user can set target first and dub role, and processor 20 identifies that target is matched by Application on Voiceprint Recognition from original audio The voice messaging of sound role, and be classified as the voice messaging that target is dubbed except the voice messaging of role is removed in original audio The non-targeted voice messaging for dubbing role.Then, processor 20 screens the voice that role is dubbed with target from subtitle file again The corresponding target subtitle of information, and target subtitle is divided by multiple current subtitles according to play time.Then, processor 20 Current subtitle is split to obtain multiple current subtitle vocabulary and play time corresponding with multiple current subtitle vocabulary.With Afterwards, processor 20 is searched with the user speech segment of each current subtitle terminology match in audio repository as personalized speech text Part.When playing video, the voice that the target of broadcasting dubs role is user speech segment, and the non-targeted of broadcasting dubs role's Voice is original audio.In this way, user's property of can choose some or certain several roles in video are dubbed, reinforce view The interest that frequency is dubbed further improves the usage experience of user.

In the video dubbing method described in above-mentioned any one embodiment, the video dubbed via user can be by Electronic device 100 is automatically stored in memory 30 or is stored in memory 30 manually by user.The video of storage can be same When voluntarily select to protect comprising original audio and the personalized audio being composed of multiple personalized speech files, or by user Deposit any one audio in original audio and personalized audio.When in video simultaneously include original audio and personalized audio When, in the follow-up play video, dub movement without executing in playing process, but can directly according to user's selection come Play original audio or personalized audio.If user selects to play original audio, personalized audio is mute；If user's selection is broadcast Personalized audio is put, then original audio is mute；If user does not select, broadcasting personalized audio is defaulted.

Referring again to Fig. 3, the present invention also provides a kind of electronic devices 100.Electronic device 100 includes one or more processing Device 20, memory 30 and one or more programs.Wherein one or more programs are stored in memory 30, and are configured It is executed at by one or more processors 20.Program includes dubbing for executing video described in above-mentioned any one embodiment The instruction of method.

For example, program includes the instruction for executing following steps incorporated by reference to Fig. 1:

S12: playing video file；

For another example program further includes the instruction for executing following steps incorporated by reference to Fig. 5:

S1121: speech recognition voice is to obtain text information；

S1122: dismantling text information is to obtain multiple library vocabulary；

Figure 18 is please referred to, the present invention also provides a kind of computer readable storage mediums 200.Computer readable storage medium 200 include the computer program being used in combination with electronic device 100.Computer program can be executed above-mentioned to complete by processor 20 Video dubbing method described in any one embodiment.

For example, computer program can be executed by processor 20 to complete following steps incorporated by reference to Fig. 1:

S12: playing video file；

For another example computer program can also be executed by processor 20 to complete following steps incorporated by reference to Fig. 5:

S1121: speech recognition voice is to obtain text information；

S1122: dismantling text information is to obtain multiple library vocabulary；

The video method of any of the above-described embodiment description refers to utilizing user's language during user plays video Tablet section dubs video.In some embodiments, video dub can also be using user speech segment The silent processing in backstage is completed, and in other words, user's input video on electronic device 100 dubs instruction, and processor 20 executes The operation that video is dubbed, but in this dubbing process, electronic device 100 can't play video and watch to user.It is dubbing After the completion, processor 20 generates the video (i.e. following individualized videos) dubbed by user, and controls display screen 40 or electroacoustic Element 50 prompts user to dub completion.At this point, the display screen of electronic device 100 broadcasts 40 when user's click play individualized video Video file, subtitle file are put, while the audio that the broadcasting of electroacoustic component 50 of electronic device 100 is dubbed by user is (i.e. following Personalized audio).

For this purpose, the present invention also provides a kind of videos that can be used for electronic device 100 to dub also referring to Fig. 3 and Figure 19 Method.Video dubbing method includes:

S23: reading video and audio repository, video include video file, subtitle file and original audio, and audio repository includes library Vocabulary and user speech segment corresponding with library vocabulary；

S24: searched in audio repository with the matched library vocabulary of subtitle file, with the user speech segment of corresponding library vocabulary Generate the synchronization association information of personalized audio and subtitle file and personalized audio；With

S25: individualized video is formed according to synchronization association information association video file, subtitle file and personalized audio； With

S26: personalized audio is played when playing individualized video.

Also referring to Fig. 3 and Figure 20, the present invention provides a kind of video dubbing installation 20.Video dubbing installation 20 is for electricity Sub-device 100.The video dubbing method of embodiment of the present invention can be realized by video dubbing installation 20.Video dubbing installation 20 Including read module 23, matching module 24, relating module 25 and playing module 26.Step S23 can be realized by read module 23. Step S24 can be realized by matching module 24.Step S25 can be realized by relating module 25.Step S26 can be by playing module 26 realize.

In other words, read module 23 can be used for reading video and audio repository, video include video file, subtitle file and Original audio, audio repository include library vocabulary and user speech segment corresponding with library vocabulary.Matching module 24 can be used in audio Searched in library with the matched library vocabulary of subtitle file, with the user speech segment of corresponding library vocabulary generate personalized audio and The synchronization association information of subtitle file and personalized audio.Relating module 25 can be used for according to synchronization association information association video text Part, subtitle file and personalized audio form individualized video.Playing module 26 can be used for the broadcasting when playing individualized video Personalized audio.

Wherein, read module 23, matching module 24 and relating module 25 can be stored in can divide in memory 30 Not Shi Xian function indicated by step S23, step S24 and step S25 program.Processor, which executes program, can complete step S23 to step S25.Playing module 26 can be the electroacoustic component 50 of electronic device 100, for playing personalized audio.

In the video dubbing method of embodiment of the present invention, electronic device 100 acquires user in daily life various Voice is simultaneously stored into memory 30, then identifies text information corresponding with voice, and then, fractionation text information forms multiple Library vocabulary, then multiple user speech segments are formed based on library vocabulary splitting voice, to establish a library vocabulary and user speech The corresponding audio repository of segment.When user inputs the instruction dubbed for video on electronic device 100, electronic device 100 is logical The corresponding text of subtitle file in identification video is crossed, user speech piece corresponding with the text of subtitle file is found out in audio repository Section, to form new personalized audio according to multiple user speech segments, video file, subtitle file and personalized audio can To form individualized video, i.e., the video dubbed by user.

Wherein, the synchronization association information of subtitle file and personalized audio is the timestamp letter carried in personalized audio Breath.The video file, subtitle file and personalized audio, which are associated with, according to synchronization association synchronizing information forms individualized video Process is to encapsulate video file, subtitle file and personalized audio to obtain the encapsulation process of individualized video part.In personalization In the encapsulation process of video, electronic device 100 can will be regarded by synchronization association information, the i.e. timestamp information of personalized audio Frequency file and personalized audio interleaved.Individualized video can be packaged into different formats, for example, TS format, MKV lattice Formula, MOV format etc..Different formats has different file structures, and the format of individualized video can voluntarily be selected by user.

The video dubbing method of embodiment of the present invention can be dubbed to obtain a based on user speech segment for video Property video, and play the individualized video dubbed by user, the friendship of electronic device 100 and user when strengthening video playing Mutually, the interest that video file plays is promoted.

1 and Figure 22 referring to Figure 2 together, in some embodiments, audio repository can be obtained by following step, namely It is to say, the video dubbing method of embodiment of the present invention is before step S23 reads video and audio repository further include:

S21: acquisition user utilizes the voice of 100 typing of electronic device；With

S22: speech recognition voice is to obtain multiple library vocabulary and multiple user speech segments.

Wherein, step S22 further comprises:

S221: speech recognition voice is to obtain text information；

S222: dismantling text information is to obtain multiple library vocabulary；

S223: voice is disassembled to obtain multiple user speech pieces corresponding with multiple library vocabulary according to multiple library vocabulary Section；With

S224: each user speech segment is stored in audio repository, and the file name of each user speech segment is to use The corresponding library vocabulary of family sound bite.

3 and Figure 24 referring to Figure 2 together, in some embodiments, video dubbing installation 20 further include acquisition module 21 With identification module 22.Step S21 can be realized by acquisition module 21.Step S22 can be realized by identification module 22.Identification module 22 include that the first recognition unit 221, first dismantling unit 222, second disassembles unit 223 and storage unit 224.Step S221 can To be realized by the first recognition unit 221.Step S222 can be realized by the first dismantling unit 222.Step S223 can be by second Unit 223 is disassembled to realize.Step S224 can be realized by storage unit 224.

In other words, acquisition module 21 can be used for acquiring the voice that user utilizes 100 typing of electronic device.Identification module 22 It can be used for speech recognition voice to obtain multiple library vocabulary and multiple user speech segments.First recognition unit 221 can be used for language Sound identifies voice to obtain text information.First dismantling unit 222 can be used for disassembling text information to obtain multiple library vocabulary.The Two dismantling units 223 can be used for disassembling voice according to multiple library vocabulary to obtain multiple users corresponding with multiple library vocabulary Sound bite.Storage unit 224 can be used for for each user speech segment being stored in audio repository, each user speech segment File name is library corresponding with user speech segment vocabulary.

Wherein, acquisition module 21 can be the acoustoelectric element being arranged on electronic device 100, such as microphone.Identify mould Block 22 can be stored in the program that function indicated by step S22 may be implemented in memory 30, and processor 20 executes journey Step S22 can be completed in sequence.First recognition unit 221, first disassembles unit 222, second and disassembles unit 223 and storage unit 224 can be stored in the subprogram being located under program corresponding with identification module 22 in memory 30.Processor 20 executes Step S221 may be implemented to step S224 in subprogram.Storage unit 224 can be the memory 30 in electronic device 100.

The mode that electronic device 100 acquires various voices in user's daily life carries out in video display process with aforementioned The acquisition mode in video dubbing method dubbed is identical, and this will not be repeated here.

20 pairs of processor acquisition voices identified in a manner of obtaining library vocabulary and user speech segment with it is aforementioned The identification method in video dubbing method dubbed in video display process is identical, does not also repeat them here herein.

Figure 25 is please referred to, in some embodiments, step S24 is searched and the matched library word of subtitle file in audio repository It converges, to generate personalized audio and subtitle file pass synchronous with personalized audio with the user speech segment of corresponding library vocabulary Joining information includes:

S241: multiple subtitle fragments in subtitle file are extracted；

S242: each subtitle fragment is split to obtain multiple subtitle vocabulary and corresponding multiple with multiple subtitle vocabulary Play time；

S243: multiple user speech segments with multiple subtitle terminology match are searched in audio repository；

S244: multiple users corresponding with multiple subtitle vocabulary are combined according to the play time dot sequency of multiple subtitle vocabulary Sound bite is to form personalized audio.

Please refer to Figure 26, in some embodiments, matching module 24 include extraction unit 241, split cells 242, With unit 243, assembled unit 244.Step S241 can be realized by extraction unit 241.Step S242 can be by split cells 242 It realizes.Step S243 can be realized by matching unit 243.Step S244 can be realized by assembled unit 244.In other words, it mentions Unit 241 is taken to can be used for extracting multiple subtitle fragments in subtitle file.Split cells 242 can be used for splitting each subtitle fragment To obtain multiple subtitle vocabulary and multiple play times corresponding with multiple subtitle vocabulary.Matching unit 243 can be used for Multiple user speech segments with multiple subtitle terminology match are searched in audio repository.Assembled unit 244 can be used for according to multiple The play time dot sequency of subtitle vocabulary combines multiple user speech segments corresponding with multiple subtitle vocabulary to form personalization Audio.

Wherein, extraction unit 241, split cells 242, matching unit 243, assembled unit 244 can be stored in storage In device 30 can realize respectively step S241, step S242, function indicated by step S243 and step S244 program, place Reason device 20, which executes program, can be completed step S241 to step S244.

Specifically, to form personalized audio according to the user speech segment in audio repository, processor 20 extracts word first Multiple subtitle fragments and play time corresponding with multiple subtitle fragments in curtain file.With subtitle file format for SRT lattice For formula, the shape of the corresponding subtitle file of lines of " trouble your fish ball asperities " in a certain sentence lines, such as " wheat pocket story " Formula specifically: " 00:00:00,000-- > 00:00:04,400 trouble your fish ball asperities ".Wherein, " 00:00:00,000-- > 00: 00:04,400 " be play time, " trouble your fish ball asperities " be with play time " 00:00:00,000-- > 00: The corresponding subtitle fragment of 00:04,400 ".In this way, subtitle fragment and corresponding with subtitle fragment can be extracted by subtitle file Play time.

Then, processor 20 again based on participle technique subtitle fragment is split with obtain multiple subtitle vocabulary and with it is more The corresponding play time of a subtitle vocabulary.Continue by taking the video of " wheat pocket story " as an example, processor 20, which extracts, " bothers your fish After the subtitle fragment of ball asperities ", processor 20 splits to obtain following subtitle subtitle fragment " trouble your fish ball asperities " Vocabulary and with subtitle vocabulary matched play time one by one: " trouble -00:00:00,000-- > 00:00:01,000 ", " you - 00:00:01,000-- > 00:00:02,000 ", " fish ball -00:00:02,000-- > 00:00:03,000 ", " asperities -00:00: 03,000-->00:00:04,400”。

Then, processor 20 searched in audio repository with the user speech segment of each subtitle terminology match, e.g., search with The corresponding user speech segment of subtitle vocabulary " trouble " " trouble .mp3 ", searches user speech piece corresponding with subtitle vocabulary " you " Section " your .mp3 " searches user speech segment " fish ball .mp3 " corresponding with subtitle vocabulary " fish ball ", searches " thick with subtitle vocabulary The corresponding user speech segment " asperities .mp3 " in face ".Then, processor 20 is according to the sequencing of play time to multiple use Family sound bite is combined to get to the personalized audio of " trouble your fish ball group face ".When subtitle fragment is multiple, processing Device 20 is according to multiple user speech segments of the corresponding multiple subtitle fragments of play time dot sequency combination to form complete individual character Change audio.

In this way, the personalized audio dubbed by user can be formed.

In the video dubbing method of embodiment of the present invention, subtitle file equally can be hard subtitle or soft subtitile, herein With no restriction.

Figure 27 is please referred to, in some embodiments, the video dubbing method of embodiment of the present invention is being broadcast in step S26 After playing personalized audio when putting individualized video further include:

S27: when playing individualized video, original audio silence processing.

Figure 28 is please referred to, in some embodiments, video dubbing installation 20 further includes Audio Control Module 27.Step S27 can be realized by Audio Control Module 27.In other words, Audio Control Module 27 can be used for when playing individualized video, Original audio silence processing.Wherein, Audio Control Module 27, which can be stored in memory 30, may be implemented step S27 institute The program of the function of instruction, processor 20, which executes program, can be completed step S27.

Specifically, video file, subtitle file and personalized audio directly can be packaged into individualized video by processor 20, Video file, subtitle file, personalized audio and original audio can also be packaged into individualized video jointly.Playing individual character Change video, and in individualized video while when including personalized audio and original audio, there can be following two broadcast mode: (1) When the unselected audio of user, the default of electronic device 100 plays personalized audio, and electronic device 100 is directly by original audio at this time It is mute, and play personalized audio；(2) type of the audio based on user's selection carries out the broadcasting of individualized video, Yong Huxuan When selecting broadcasting original audio, electronic device 100 is mute by personalized audio, and plays original audio；User selects to play individual character When changing audio, electronic device 100 is mute by original audio, and plays personalized audio.

In this way, providing a variety of broadcast modes for user, the entertaining sexual experience of user is promoted.

9 and Figure 30 referring to Figure 2 together, in some embodiments, the video dubbing method of embodiment of the present invention is also Include:

S28: the broadcast sound volume of the user speech segment of the corresponding shape of the mouth as one speaks is determined according to the shape of the mouth as one speaks of personage in video file.

Further, step S28 includes:

S281: it plays to choose in image from multiframe according to the play time of user speech segment and be closed with user speech segment The broadcasting image of connection；

S282: identification plays the shape of the mouth as one speaks of personage in image；

S283: the actual aspect ratio of the shape of the mouth as one speaks is calculated according to the width of the shape of the mouth as one speaks and height；

S284: volume amplification factor is calculated according to actual aspect ratio and default the ratio of width to height；With

S285: it is determined according to volume amplification factor and is broadcast with the shape of the mouth as one speaks of personage corresponding user speech segment in broadcasting image Playback amount.

Also referring to Figure 31 and Figure 32, in some embodiments, video dubbing installation 20 further includes that volume determines mould Block 28.Volume determining module 28 includes associative cell 281, the second recognition unit 282, the first computing unit 283, second calculating list Member 284 and volume determination unit 285.Step S28 can be realized by volume determining module 28.Step S281 can be by associative cell 281 realize.Step S282 can be realized by the second recognition unit 282.Step S283 can be realized by the first computing unit 283. Step S284 can be realized by the second computing unit 284.Step S285 can be realized by volume determination unit 285.

In other words, volume determining module 28 can be used for determining the use of the corresponding shape of the mouth as one speaks according to the shape of the mouth as one speaks of personage in video file The broadcast sound volume of family sound bite.Associative cell 281 can be used for being played according to the play time of user speech segment from multiframe The broadcasting image with user speech fragment association is chosen in image.Second recognition unit 282, which can be used to identify, plays people in image The shape of the mouth as one speaks of object.First computing unit 283 can be used for calculating the actual aspect ratio of the shape of the mouth as one speaks according to the width and height of the shape of the mouth as one speaks.Second meter Calculating unit 284 can be used for calculating volume amplification factor according to actual aspect ratio and default the ratio of width to height.Volume determination unit 285 can be used In the broadcast sound volume for determining user speech segment corresponding with the shape of the mouth as one speaks of personage in image is played according to volume amplification factor.

Wherein, volume determining module 28, which can be stored in memory 30, may be implemented function indicated by step S28 The program of energy, processor 20, which executes program, can be completed step S28.Associative cell 281, the second recognition unit 282, first calculate Unit 283, the second computing unit 284 and volume determination unit 285 can be stored in can realize respectively in memory 30 Step S281, the program of function indicated by step S282, step S283, step S284 and step S285, processor 20 execute Step S281 can be completed to step S285 in program.

Specifically, the play time that processor 20 is primarily based on each user speech segment is found out under the play time A frame or multiframe play image, this frame or multiframe play that image is and the user speech segment under the play time is closed The broadcasting image of connection.Then, the identification of processor 20 plays the shape of the mouth as one speaks of the personage in image, is calculated according to the width of the shape of the mouth as one speaks and height The actual aspect ratio of the shape of the mouth as one speaks, then volume amplification factor is calculated based on actual aspect ratio and default the ratio of width to height, it is based ultimately upon volume and puts Big multiple determines the broadcast sound volume of each user speech segment.The calculating process of above-mentioned broadcast sound volume and foregoing description The calculation executed in the video dubbing method for dubbing operation in video display process is consistent, and details are not described herein.

Figure 33 is please referred to, in some embodiments, video dubbing method is gone back before step S23 reads video and audio repository Include:

S29: determine that target dubs role according to the input of user；

Step S24 searched in audio repository with the matched library vocabulary of subtitle file, with the user speech of corresponding library vocabulary Segment generates the synchronization association information of personalized audio and subtitle file and personalized audio further include:

S245: processing original audio is to mark off the voice messaging and the non-targeted voice for dubbing role that target dubs role Information；

Step S241 extract subtitle file in multiple subtitle fragments include:

S2411: target subtitle corresponding with the voice messaging that target dubs role is extracted from subtitle file；

S2412: the subtitle fragment in target subtitle is extracted.

Figure 34 is please referred to, in some embodiments, video dubbing installation further includes role determination module 29.Matching module 24 further include division unit 245.Step S29 can be realized by role determination module 29.Step S245 can be by division unit 245 It realizes.Step S2411 and step S2412 can be realized by extraction unit 241.

In other words, role determination module 29 can be used for determining that target dubs role according to the input of user.Division unit 245 can be used for handling original audio to mark off the voice messaging and the non-targeted voice letter for dubbing role that target dubs role Breath.Extraction unit 241 can be used for extracting target subtitle corresponding with the voice messaging that target dubs role from subtitle file, with And the subtitle fragment in extraction target subtitle.

Specifically, when user merely desires to dub some or certain several roles in video, user can be first Setting target dubs role, and processor 20 identifies that target dubs the voice letter of role by Application on Voiceprint Recognition from original audio Breath, and the voice messaging that removing target dubs role in original audio is classified as the non-targeted voice messaging for dubbing role.With Afterwards, processor 20 screens target subtitle corresponding with the voice messaging that target dubs role from subtitle file again.Then, it handles Device 20 found out in audio repository with the matched multiple user speech segments of multiple subtitle fragments in target subtitle, implemented Journey is consistent with the realization process of foregoing description executed in the video dubbing method for dubbing operation in video display process, herein It repeats no more.Audio is dubbed in this way, target can be obtained and dub the user of role.Target is dubbed the use of role by processor 20 again Family dubs audio and merges to obtain personalized audio with the non-targeted original audio for dubbing role.When individualized video plays, play Target to dub the voice of role be user speech segment, the non-targeted voice for dubbing role of broadcasting is original audio.In this way, User's property of can choose some or certain several roles in video are dubbed, reinforce the interest that video is dubbed, into one Step improves the usage experience of user.

For example, program includes the instruction for executing following steps incorporated by reference to Figure 19:

S26: personalized audio is played when playing individualized video.

For another example program includes the instruction for executing following steps incorporated by reference to Figure 22:

S221: speech recognition voice is to obtain text information；

S222: dismantling text information is to obtain multiple library vocabulary；

Referring again to Figure 18, the present invention also provides a kind of computer readable storage mediums.Computer readable storage medium packet Include the computer program being used in combination with electronic device 100.Computer program can be executed above-mentioned any to complete by processor 20 Video dubbing method described in one embodiment.

For example, computer program can be executed by processor 20 to complete following steps incorporated by reference to Figure 19:

S26: personalized audio is played when playing individualized video.

For another example computer program can be executed by processor 20 to complete following steps incorporated by reference to Figure 22:

S221: speech recognition voice is to obtain text information；

S222: dismantling text information is to obtain multiple library vocabulary；

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.

Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims

1. a kind of video dubbing method, which is characterized in that the video dubbing method includes:

Playing video file；

Processing and the associated subtitle file of the video file from audio repository to read individual character corresponding with current subtitle vocabulary Change voice document, the audio repository includes at least one personalized speech file, and the personalized speech file includes library vocabulary And user speech segment corresponding with the library vocabulary；

To the association video file and original audio silence processing corresponding with the current subtitle vocabulary；With

Play the user speech segment in the personalized speech file of the corresponding current subtitle vocabulary.

2. video dubbing method according to claim 1, which is characterized in that the processing is associated with the video file Subtitle file to include: the step of reading personalized speech file corresponding with current subtitle vocabulary from audio repository

Extract the current subtitle in the subtitle file；

It splits the current subtitle and obtains multiple current subtitle vocabulary and broadcasting corresponding with the current subtitle vocabulary Time point；With

The library vocabulary with the current subtitle terminology match is searched in the audio repository to obtain the corresponding library vocabulary The user speech segment；

The step of the user speech segment in the personalized speech file for playing the corresponding current subtitle vocabulary Suddenly include:

The user speech segment is played in the play time.

3. video dubbing method according to claim 1, which is characterized in that the video dubbing method is filled applied to electronics It sets, the audio repository is obtained by following steps:

Acquire the voice that user utilizes the electronic device typing；With

Voice described in speech recognition is to obtain the library vocabulary and the user speech segment.

4. video dubbing method according to claim 3, which is characterized in that voice described in the speech recognition is to obtain The step of stating library vocabulary and the user speech segment include:

Voice described in speech recognition is to obtain text information；

The text information is disassembled to obtain multiple library vocabulary；

The voice is disassembled according to multiple library vocabulary to obtain multiple use corresponding with multiple library vocabulary Family sound bite；With

Each user speech segment is stored in the audio repository, the file name of each user speech segment is The corresponding library vocabulary of the user speech segment.

5. video dubbing method according to claim 2, which is characterized in that the video dubbing method is in the broadcasting pair After the step of answering the user speech segment in the personalized speech file of the current subtitle vocabulary further include:

The broadcast sound volume of the user speech segment of the corresponding shape of the mouth as one speaks is determined according to the shape of the mouth as one speaks of personage in the video file.

6. video dubbing method according to claim 5, which is characterized in that the video file includes multiframe broadcasting figure Picture, the shape of the mouth as one speaks according to personage in the video file determine the broadcasting sound of the user speech segment of the corresponding shape of the mouth as one speaks After the step of amount further include:

It is chosen from broadcasting image described in multiframe according to the play time and is broadcast with the current of the user speech fragment association Put image；

Identify the shape of the mouth as one speaks of personage in the currently playing image；

The actual aspect ratio of the shape of the mouth as one speaks is calculated according to the width of the shape of the mouth as one speaks and height；

Volume amplification factor is calculated according to the actual aspect ratio and default the ratio of width to height；With

The user speech corresponding with the shape of the mouth as one speaks of personage in the currently playing image is determined according to the volume amplification factor The broadcast sound volume of segment.

7. a kind of video dubbing method, which is characterized in that the video dubbing method includes:

It reads video and audio repository, the video includes video file, subtitle file and original audio, the audio repository includes library Vocabulary and user speech segment corresponding with the library vocabulary；

Lookup and the matched library vocabulary of the subtitle file in the audio repository, described in the correspondence library vocabulary User speech segment generates the synchronization association information of personalized audio and the subtitle file and the personalized audio；

Individual character is formed according to video file described in the synchronization association information association, the subtitle file and the personalized audio Change video；With

The personalized audio is played when playing the individualized video.

8. video dubbing method according to claim 7, which is characterized in that the video dubbing method is filled for electronics It sets, the audio repository is obtained by following steps:

Acquire the voice that user utilizes the electronic device typing；With

9. video dubbing method according to claim 8, which is characterized in that voice described in the speech recognition is to obtain The step of stating library vocabulary and the user speech segment include:

Voice described in speech recognition is to obtain text information；

The text information is disassembled to obtain multiple library vocabulary；

10. video dubbing method according to claim 8, which is characterized in that lookup and the institute in the audio repository The matched library vocabulary of subtitle file is stated, to generate personalized sound with the user speech segment of the correspondence library vocabulary Frequently and the subtitle file and the step of synchronization association information of the personalized audio, include:

Extract multiple subtitle fragments in the subtitle file；

Each subtitle fragment is split to obtain multiple subtitle vocabulary and corresponding multiple with the multiple subtitle vocabulary Play time；

Multiple user speech segments with multiple subtitle terminology match are searched in the audio repository；With

It is corresponding multiple described according to the combination of the play time dot sequency of multiple subtitle vocabulary and multiple subtitle vocabulary User speech segment is to form the personalized audio.

11. video dubbing method according to claim 7, which is characterized in that in the individualized video playing process, The video dubbing method further include:

When playing the individualized video, the original audio silence processing.

12. video dubbing method according to claim 10, which is characterized in that the video dubbing method further include:

13. video dubbing method according to claim 12, which is characterized in that the video file includes multiframe broadcasting figure Picture, the shape of the mouth as one speaks according to personage in the video file determine the broadcasting sound of the user speech segment of the corresponding shape of the mouth as one speaks The step of amount includes:

It is chosen and the user speech piece from broadcasting image described in multiframe according to the play time of the user speech segment The associated broadcasting image of section；

Identify the shape of the mouth as one speaks of personage in the broadcasting image；

The user speech segment corresponding with the shape of the mouth as one speaks of personage in the broadcasting image is determined according to the volume amplification factor Broadcast sound volume.

14. a kind of electronic device characterized by comprising

One or more processors；

Memory；With

One or more programs, wherein one or more of programs are stored in the memory, and be configured to by One or more of processors execute, and described program includes requiring video described in 1-13 any one to match for perform claim The instruction of sound method.

15. a kind of computer readable storage medium, which is characterized in that including the computer program being used in combination with electronic device, The computer program can be executed by processor to complete video dubbing method described in claim 1-13 any one.