CN104869326B

CN104869326B - A kind of method for displaying image and equipment of cooperation audio

Info

Publication number: CN104869326B
Application number: CN201510279742.7A
Authority: CN
Inventors: 周有凯; 蒋心怡; 徐晓然
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2015-05-27
Filing date: 2015-05-27
Publication date: 2018-09-11
Anticipated expiration: 2035-05-27
Also published as: CN104869326A

Abstract

Embodiments of the present invention provide a kind of method for displaying image of cooperation audio.This method includes：Run session operational scenarios；Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role when session operational scenarios operate in Speech time section；Static status display is carried out to the shape of the mouth as one speaks of scene role when session operational scenarios operate in time periods of silence；Wherein, Speech time section and time periods of silence are obtained from being divided to session operational scenarios according to the shape information of session operational scenarios audio, the amplitude of wave form of audio is more than the first amplitude threshold in Speech time section, the amplitude of wave form of audio is less than the second amplitude threshold in time periods of silence, and the first amplitude threshold is not less than the second amplitude threshold.Speech time section is divided by audio volume control information and time periods of silence, method of the invention enable the conversation audio of scene role to be matched with mouth shape image, so as to provide more life-like dialogue display effect to the user.In addition, embodiments of the present invention additionally provide a kind of image display of cooperation audio.

Description

A kind of method for displaying image and equipment of cooperation audio

Technical field

Embodiments of the present invention are related to image real time transfer field, more specifically, embodiments of the present invention are related to one The method for displaying image and equipment of kind cooperation audio.

Background technology

Background that this section is intended to provide an explanation of the embodiments of the present invention set forth in the claims or context.Herein Description recognizes it is the prior art not because not being included in this part.

In various game applications, animation video or Computer Simulation application program, it will usually be related to some figures As display needs the session operational scenarios worked in coordination with audio.In these session operational scenarios, scene role can engage in the dialogue in turn.Example Such as, game scenario session operational scenarios would generally be related in game application, in game scenario session operational scenarios, game role can take turns and flow into Row dialogue.As it can be seen that in session operational scenarios, the sound for playing scene part dialog is not only needed, it is also necessary to presentation and conversation audio The matched scene role shape of the mouth as one speaks, that is, needing to present the scene role shape of the mouth as one speaks when scene role speaks into Mobile state Variation.

In order to enable the shape of the mouth as one speaks realizes dynamic variation when scene role speaks, the prior art is using for session operational scenarios The picture for pre-setting scene role's difference shape of the mouth as one speaks, when application program runs to session operational scenarios, by scene role's difference shape of the mouth as one speaks Picture dynamically switch display, the shape of the mouth as one speaks for allowing for the display image Scene role of session operational scenarios in this way can be into Mobile state Variation, to being matched with the dialogue of the audio Scene role of session operational scenarios.

Invention content

It should be noted that in session operational scenarios, scene role is generally not to speak always.In many cases, Scene role speaking in session operational scenarios has a degree of pause, that is, even if under session operational scenarios, scene role is having A little stages are in the state spoken, and the state to seize up in some stages or time slot；Therefore, scene role is having When a little stages or time slot are in silent state, the shape of the mouth as one speaks that scene role is presented is needed to remain unchanged, so More life-like dialogue display effect can be generated.But in the prior art, picture of the application program to scene role's difference shape of the mouth as one speaks Display is switched over, is carried out as unit of session operational scenarios, this makes the image Scene role's shown by session operational scenarios The shape of the mouth as one speaks is constantly in dynamic change, that is, even if in the time slot that session operational scenarios Scene role is in silent state, The scene role shape of the mouth as one speaks is also still in dynamic variation, to cause the conversation audio and shape of the mouth as one speaks figure of session operational scenarios Scene role As that can not match.

Therefore in the prior art, for the part stage of session operational scenarios, even if at scene role is lower at this stage In silent state, show that the shape of the mouth as one speaks of scene role in image is also still in dynamic variation, so as to cause talking with The conversation audio of scene role can not be matched with mouth shape image under the part stage of scene, this is very bothersome mistake Journey.

Thus, it is also very desirable to a kind of method for displaying image and equipment of improved cooperation audio, so that, in session operational scenarios The stage that Scene role is in state of speaking can show the image of scene role's shape of the mouth as one speaks dynamic change, also, talk with field The stage that scape Scene role is in silent state can show the image that scene role's shape of the mouth as one speaks remains unchanged, so that right Talking about the conversation audio of each stage Scene role of scene can match with mouth shape image.

In the present context, embodiments of the present invention be intended to provide it is a kind of cooperation audio method for displaying image and set It is standby.

In the first aspect of embodiment of the present invention, a kind of method for displaying image of cooperation audio is provided, including：Fortune Row session operational scenarios；When the session operational scenarios operate in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role；Work as institute When stating session operational scenarios and operating in time periods of silence, static status display is carried out to the shape of the mouth as one speaks of scene role；Wherein, the Speech time section It is that the session operational scenarios are divided according to the shape information of audio corresponding to the session operational scenarios with the time periods of silence Obtained from, wherein the amplitude of wave form of the shape information is more than the first amplitude threshold in the Speech time section, described The amplitude of wave form of the shape information is less than the second amplitude threshold in time periods of silence, wherein first amplitude threshold is not small In second amplitude threshold.

In the second aspect of embodiment of the present invention, a kind of image display of cooperation audio is provided, including：Fortune Row module, for running session operational scenarios；Dynamic display module is right for when the session operational scenarios operate in Speech time section The shape of the mouth as one speaks of scene role carries out Dynamic Announce；Static status display module, for when the session operational scenarios operate in time periods of silence, Static status display is carried out to the shape of the mouth as one speaks of scene role；Wherein, the Speech time section and the time periods of silence are according to described right Obtained from the shape information of audio corresponding to words scene divides described pair of colored scene, wherein in the Speech time The amplitude of wave form of the shape information is more than the first amplitude threshold, the wave of the shape information in the time periods of silence in section Shape amplitude is less than the second amplitude threshold, wherein first amplitude threshold is not less than second amplitude threshold.

According to embodiment of the present invention, method for displaying image and equipment for configuring audio, according to session operational scenarios Audio volume control information divides session operational scenarios, when the period larger using the amplitude of wave form of audio volume control information is as voice Between section, the period smaller using the amplitude of wave form of audio volume control information, can and when session operational scenarios are run as time periods of silence To carry out dynamic display to the shape of the mouth as one speaks of scene role in Speech time section and can be in time periods of silence to scene role's mouth Type carries out static display.Therefore, in session operational scenarios, show since audio volume control amplitude is larger scene role speaking and Audio volume control amplitude is smaller to show that scene role is not speaking, therefore, the mouth of Dynamic Announce scene role in Speech time section Type and in the shape of the mouth as one speaks of time periods of silence static status display scene role, so that it may so that only speaking in scene role in session operational scenarios When show the image of scene role's shape of the mouth as one speaks dynamic change, and scene role mouthful is shown when session operational scenarios Scene role is silent The image that type remains unchanged, so that the conversation audio of each stage Scene role of session operational scenarios can with mouth shape image It enough matches, obtains more life-like dialogue display effect, better experience is brought for user.

Description of the drawings

Detailed description below, above-mentioned and other mesh of exemplary embodiment of the invention are read by reference to attached drawing , feature and advantage will become prone to understand.In the accompanying drawings, if showing the present invention's by way of example rather than limitation Dry embodiment, wherein：

Fig. 1 schematically shows the block schematic illustration of an exemplary application scene of embodiment of the present invention；

Fig. 2 schematically shows the flow charts for one embodiment of method for displaying image for coordinating audio in the present invention；

Fig. 3 schematically shows the flow chart for another embodiment of method for displaying image for coordinating audio in the present invention；

Fig. 4 schematically shows the flow charts for the another embodiment of method for displaying image for coordinating audio in the present invention；

Fig. 5 schematically shows the structure chart for one embodiment of image display for coordinating audio in the present invention；

In the accompanying drawings, identical or corresponding label indicates identical or corresponding part.

Specific implementation mode

The principle and spirit of the invention are described below with reference to several illustrative embodiments.It should be appreciated that providing this A little embodiments are used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and be not with any Mode limits the scope of the invention.On the contrary, these embodiments are provided so that the disclosure is more thorough and complete, and energy It is enough that the scope of the present disclosure is completely communicated to those skilled in the art.

One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method Or computer program product.Therefore, the disclosure can be with specific implementation is as follows, i.e.,：Complete hardware, complete software The form that (including firmware, resident software, microcode etc.) or hardware and software combine.

According to the embodiment of the present invention, it is proposed that a kind of method for displaying image and equipment of cooperation audio.

Herein, it is to be understood that involved term " session operational scenarios " indicates to include scene angle in application program The plot scene segment of color dialogue, " session operational scenarios " can be realized by one or a set of file, and application program can pass through The file of " session operational scenarios " is called to realize that " session operational scenarios " are run.Wherein, the application program with " session operational scenarios ", such as can To be game application, Computer Simulation application etc., the present invention does not limit this.In addition, any number of elements in attached drawing is equal Unrestricted for example and any name is only used for distinguishing, without any restrictions meaning.

Below with reference to several representative embodiments of the present invention, the principle and spirit of the invention are illustrated in detail.

Summary of the invention

The inventors discovered that in session operational scenarios, scene role is generally not to speak always, in many cases, There are a degree of pauses for scene role speaking in session operational scenarios, that is, even under session operational scenarios, scene role It is only in the state spoken in part stage, and is then to be in silent state in another part stage.But it is existing In technology, application program switches over display to the picture of scene role's difference shape of the mouth as one speaks, be as unit of entire session operational scenarios into Capable, this makes the shape of the mouth as one speaks of the image Scene role shown by session operational scenarios be constantly in dynamic change, therefore, in dialogue field Jing Zhong, even if scene role is still in dynamic change if being in scene role's shape of the mouth as one speaks in the time slot of silent state State, which results in the conversation audios of session operational scenarios Scene role can not be matched with mouth shape image.

The studies above based on inventor, basic principle of the invention are：In view of session operational scenarios sound intermediate frequency waveform shakes Width is sized to reflect whether scene role is speaking, can be according to the audio volume control information of session operational scenarios to talking with field Scape is divided；Show that scene role is speaking since audio volume control amplitude is larger, it can be with the waveform of audio volume control information The amplitude larger period, can be in Speech time section to scene role's when session operational scenarios are run as Speech time section The shape of the mouth as one speaks is dynamically shown so that session operational scenarios Scene role can show the shape of the mouth as one speaks figure of dynamic change when speaking Picture；Show that scene role is not speaking since audio volume control amplitude is smaller, it can be smaller with the amplitude of wave form of audio volume control information Period as time periods of silence, scene role can be carried out in time periods of silence when session operational scenarios are run static Display so that session operational scenarios Scene role can show the mouth shape image remained unchanged not when speaking.Therefore, session operational scenarios The conversation audio of each stage Scene role can be matched with mouth shape image, obtain more life-like dialogue display effect Fruit brings better experience for user.

After the basic principle for describing the present invention, lower mask body introduces the various non-limiting embodiment party of the present invention Formula.

Application scenarios overview

It is the block schematic illustration of an exemplary application scene of embodiments of the present invention referring initially to Fig. 1, Fig. 1.Its In, user can interact to realize session operational scenarios with the client 102 on user equipment, run the application of the session operational scenarios Program can be that the server 101 of application program is supplied to client 102.It will be understood by those skilled in the art that shown in Fig. 1 Block schematic illustration be only an example that embodiments of the present invention can be achieved wherein.Embodiment of the present invention The scope of application is not limited by any aspect of the frame.

It should be noted that user equipment herein can be existing, researching and developing or research and development in the future, Neng Goutong It crosses any type of wiredly and/or wirelessly connection (for example, Wi-Fi, LAN, honeycomb, coaxial cable etc.) and realizes client thereon The 102 any user equipmenies interacted with server 101, including but not limited to：Existing, researching and developing or research and development in the future intelligence It can mobile phone, non-smart mobile phone, tablet computer, laptop PC, desktop personal computer, minicomputer, medium-sized Computer, mainframe computer etc..

It is also to be noted that server 101 herein be only it is existing, researching and developing or in the future research and development, can Configure an example of the equipment of application system.Embodiments of the present invention are unrestricted in this regard.

Based on frame shown in FIG. 1, client 102 can run session operational scenarios.When the session operational scenarios operate in voice When the period, client 102 can carry out Dynamic Announce to the shape of the mouth as one speaks of scene role.When the session operational scenarios operate in mute Between section when, client 102 can to the shape of the mouth as one speaks of scene role carry out static status display.Wherein, the Speech time section and described quiet The sound period is obtained from being divided to the session operational scenarios according to the shape information of audio corresponding to the session operational scenarios, Wherein, the amplitude of wave form of the shape information is more than the first amplitude threshold in the Speech time section, in the mute time The amplitude of wave form of the shape information is less than the second amplitude threshold in section, wherein first amplitude threshold is not less than described the Two amplitude thresholds.

It is understood that the present invention application scenarios in, although herein with below by the action of embodiment of the present invention It is described as being executed by client 102, but these actions can also partly be held by the execution of client 102, partly by server 101 Row, alternatively, these actions can also be executed by server 101.The present invention is unrestricted in terms of executive agent, as long as executing Action disclosed in embodiment of the present invention.

Illustrative methods

With reference to the application scenarios of Fig. 1, describe to be used for according to exemplary embodiment of the invention with reference to figure 2~3 Coordinate the method for displaying image of audio.It should be noted that above application scene is merely for convenience of understanding the spirit of the present invention It is shown with principle, embodiments of the present invention are unrestricted in this regard.On the contrary, embodiments of the present invention can answer For applicable any scene.

Referring to Fig. 2, the flow chart for one embodiment of method for displaying image for coordinating audio in the present invention is shown.In this implementation In example, such as it can specifically include following steps：

Step 201, operation session operational scenarios.

Step 202, when the session operational scenarios operate in Speech time section, it is aobvious into Mobile state to the shape of the mouth as one speaks of scene role Show.

Wherein, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role, can is specifically that dynamically switching shows scene role not With multiple pictures of the shape of the mouth as one speaks, so that the shape of the mouth as one speaks for showing scene role in showing image is in the state of dynamic change.

Step 203, when the session operational scenarios operate in time periods of silence, the shape of the mouth as one speaks of scene role is carried out static aobvious Show.

Wherein, static display is carried out to the shape of the mouth as one speaks of scene role, such as can specifically be to maintain display scene role mouthful The same picture of type, so that the shape of the mouth as one speaks for showing scene role in showing image is in static constant state.Alternatively, Display to the shape of the mouth as one speaks static state of scene role can be for another example multiple pictures that switching shows the identical shape of the mouth as one speaks of scene role, to So that the shape of the mouth as one speaks for showing scene role in showing image is in static state.

It is understood that session operational scenarios are made of Speech time section and time periods of silence, Speech time section and Time periods of silence is to be divided to be obtained to the session operational scenarios according to the shape information of audio corresponding to the session operational scenarios 's.Specifically, the amplitude of wave form of the shape information is more than the first amplitude threshold in the Speech time section, described mute In period the amplitude of wave form of the shape information be less than the second amplitude threshold, that is, for session operational scenarios any one when It carves, if the amplitude of wave form of the moment subaudio frequency is more than the first amplitude threshold, which belongs to the Speech time of session operational scenarios Section, if the amplitude of wave form of the moment subaudio frequency is less than the second amplitude threshold, which belongs to the mute time of session operational scenarios Section.Wherein, first amplitude threshold is not less than second amplitude threshold, that is, when choosing amplitude threshold, it is selected First amplitude threshold and selected both the second amplitude thresholds can be identical threshold values, alternatively, the first selected amplitude Threshold value can also be more than the second selected amplitude threshold.For example, the first amplitude threshold can be arranged with the second amplitude threshold It is 0.2 decibel.

It should be noted that in the present embodiment the Speech time section of session operational scenarios and time periods of silence can be use it is a variety of Different modes are divided, also, may be used in session operational scenarios operational process more under section dividing mode in different times Speech time section and time periods of silence is identified in the different mode of kind.

For example, in some embodiments of the present embodiment, Speech time section can be in dialogue field with time periods of silence Progress is identified and is divided in real time while scape is run.Specifically, it during running session operational scenarios, can obtain in real time The current form information for taking the audio of session operational scenarios determines that current time belongs to language according to the amplitude of wave form of current form information Sound period or time periods of silence, wherein if the amplitude of wave form of current form information is more than the first amplitude threshold, it is determined that Current time belongs to Speech time section, Dynamic Announce can be carried out to the shape of the mouth as one speaks of scene role, if the wave of current form information Shape amplitude is less than the second amplitude threshold, it is determined that current time belongs to time periods of silence, can be carried out to the shape of the mouth as one speaks of scene role Static status display.

For another example, in other embodiments of the present embodiment, Speech time section can talk with time periods of silence Obtained from being divided in advance to session operational scenarios before scene operation, also, the Speech time section that marks off in advance and mute Period pre-recorded before session operational scenarios operation can get off, to identify language according to record when session operational scenarios are run Sound period and time periods of silence.Specifically, in the present embodiment, the Speech time section of the session operational scenarios and described quiet The sound period can for example be recorded in advance as in the time segment information of session operational scenarios configuration；Step 202 can for example have Body is：In response to determining presently described session operational scenarios according to the time segment information during running the session operational scenarios Speech time section is operated in, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role；Step 203 for example can be specially：In response to It runs and determines that presently described session operational scenarios operate in time periods of silence according to time segment information during the session operational scenarios, Static status display is carried out to the shape of the mouth as one speaks of scene role.It, can be with before session operational scenarios operation more specifically, in this embodiment The shape information of the audio of entire session operational scenarios is obtained in advance, and inscribes the amplitude of wave form of shape information when each according to session operational scenarios With the first amplitude threshold, the magnitude relationship of the second amplitude threshold, session operational scenarios are divided into Speech time section and time periods of silence, The Speech time section and time periods of silence of session operational scenarios are recorded as to the time segment information of session operational scenarios again, and talk with field in operation During scape, it can determine that current time belongs to Speech time section or mute in real time by calling the time segment information Period, wherein if the period information indicate current time belong to Speech time section, enter step 202, if this when Between segment information expression currently belong to time periods of silence, then enter step 203.

For another example, in the other embodiment of the present embodiment, Speech time section can be right in advance with time periods of silence Obtained from session operational scenarios are divided, and it is possible to according to the Speech time section and time periods of silence that mark off in advance, right Video image file is generated for session operational scenarios in advance before talking about scene operation, the shape of the mouth as one speaks of video image file Scene role is made to exist Dynamic Announce and the static status display in time periods of silence in Speech time section, so as to can be according to video when session operational scenarios are run Image file shows to control the shape of the mouth as one speaks of scene role.Specifically, in the present embodiment, the shape of the mouth as one speaks to scene role carries out Dynamic Announce, and, the shape of the mouth as one speaks to scene role carries out static presentation, such as may each be by running the dialogue field The video image file that configures for the session operational scenarios in advance is played during scape to realize；The video image file exists In image in Speech time section, the shape of the mouth as one speaks dynamic change of scene role；The video image file is in time periods of silence In image, the shape of the mouth as one speaks static state of scene role is constant.It more specifically, in this embodiment, can before session operational scenarios operation To obtain the shape information of the audio of entire session operational scenarios in advance, and the waveform that shape information is inscribed according to session operational scenarios when each shakes Session operational scenarios are divided into Speech time section and mute by the magnitude relationship between width and the first amplitude threshold, the second amplitude threshold Period is that session operational scenarios are generated in Speech time section Dynamic Announce according still further to the Speech time section and time periods of silence marked off The scene role shape of the mouth as one speaks and in the video image file of time periods of silence static status display scene role's shape of the mouth as one speaks, and in operation session operational scenarios During, it is only necessary to play the video image file, so that it may so that the shape of the mouth as one speaks energy of session operational scenarios operational process Scene role Enough in Speech time section Dynamic Announce and in time periods of silence static status display, without going reality again in session operational scenarios operational process When identify that current time belongs to Speech time section or time periods of silence.

It is understood that in the respective embodiments described above of the present embodiment, some steps are the mistakes in operation session operational scenarios It is executed in journey, some steps first carry out in advance before running session operational scenarios.For being held during running session operational scenarios Capable step, can be performed by operation session operational scenarios, the application program installed on the terminal device, that is, this Class step is that application program executes during running session operational scenarios.For the step first carried out in advance before operation session operational scenarios Suddenly, in some embodiments, such as can be executed when having installed application program update on the terminal device, this When, this kind of step can be executed by having installed application program on the terminal device, that is, this kind of step can be at end It is executed when mounted application program is updated itself in end equipment.For what is first carried out in advance before operation session operational scenarios Step in other embodiments, such as can be the advance of application program before installing application program on the terminal device It is executed in compiling procedure, at this point, this kind of step can write the equipment of application program by technical staff to execute, that is, This kind of step can be executed by the equipment of technical staff when writing application program.

It should be noted that in session operational scenarios, on the one hand, there may be certain during keeping speaking by scene role The pause of a little short time, as it is existing between sentence and sentence pause, existing pause between certain phrases or word, and at this The amplitude of wave form of audio volume control information may be less than the second amplitude threshold at the pause of a little short time, this will result in scene angle There may be the images of several sections of shape of the mouth as one speaks static status displays during keeping speaking for color so that the conversation audio and mouth of scene role What the short time occurred in type image can not matching problem；On the other hand, scene role may deposit during keeping silent In the noise of certain short time, and the amplitude of wave form of audio volume control information may be more than first at the noise of these short time Amplitude threshold, this will result in scene during keeping silent there may be the image of several sections of shape of the mouth as one speaks Dynamic Announces, makes Scene role conversation audio and mouth shape image there is the short time can not matching problem.In order to avoid above-mentioned two aspect field Both the conversation audio of scape role and the mouth shape image short time can not matching problem, in some embodiments of the present embodiment In, such as a minimum interval can be preset so that the Speech time section and time periods of silence of the session operational scenarios It is not less than preset minimum interval.Wherein, which for example could be provided as 0.1 second.

For being preset with the embodiment of minimum interval, when specific implementation, before session operational scenarios operation, such as can With elder generation according to the amplitude of wave form of audio volume control information and the first amplitude threshold, the magnitude relationship of the second amplitude threshold by session operational scenarios It is divided into Speech time section and time periods of silence, then each Speech time section and each time periods of silence are analyzed again, for Former and later two periods are the Speech time section of time periods of silence, if the Speech time section is less than minimum interval, By Speech time Duan Yuqi, former and later two time periods of silence merge into a time periods of silence, and for former and later two periods It is the time periods of silence of Speech time section, if the time periods of silence is less than minimum interval, by the time periods of silence A Speech time section is merged into its former and later two Speech time section, can thus make the Speech time section finally obtained It is not less than minimum interval with time periods of silence.

It is understood that since Speech time section and time periods of silence are all not less than minimum interval, Speech time The time periods of silence of short time can be integrated into Speech time section between section, can thus be spoken in holding with scene role The pause of short time will not cause the static status display of its shape of the mouth as one speaks in the process, so that scene role is during keeping speaking Its mouth shape image can be always maintained at Dynamic Announce, avoid and ask can not coordinating for short time between scene role and mouth shape image Topic；Similarly, the Speech time section of short time can be integrated into time periods of silence between time periods of silence, thus can field Scape role noise of short time during keeping silent will not cause the Dynamic Announce of its shape of the mouth as one speaks, so that scene angle Color its mouth shape image during keeping silent can be always maintained at static status display, avoid scene role and mouth shape image Between the short time can not matching problem.

It should be noted that in order to which more life-like dialogue display effect is presented to user, it is contemplated that scene role sends out The different syllable shape of the mouth as one speaks be it is different, can also be further to Speech time section in some embodiments of the present embodiment It divides, each Speech time section is enable to correspond to the different syllable of scene role, it in this way can during running session operational scenarios With in each Speech time section using with the respective corresponding matched specific shape of the mouth as one speaks of syllable to the shape of the mouth as one speaks of scene role into action State is shown.Specifically, by taking scene role sends out two different syllables as an example, abovementioned steps 202 for example may include：When described When session operational scenarios operate in the first Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role using first shape of the mouth as one speaks；Work as institute When stating session operational scenarios and operating in the second Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role using second shape of the mouth as one speaks；Its In, the first Speech time section and the second Speech time section are that the voice sound of audio is corresponded to according to the Speech time section Obtained from section divides the Speech time section, wherein the speech syllable is in the first Speech time section First pronunciation syllable, the speech syllable is the second pronunciation syllable in the second Speech time section；Wherein, first hair Sound syllable is different syllable with described second syllable that pronounces, and the shape of the mouth as one speaks shape of first shape of the mouth as one speaks and second shape of the mouth as one speaks is not Together.

When specific implementation, such as can be each pronunciation in advance for being presented the embodiment of the different shape of the mouth as one speaks for different syllables Syllable configures corresponding shape of the mouth as one speaks shape image, when session operational scenarios are divided into Speech time section and silence period, for The Speech time section marked off can identify the pronunciation syllable of scene role according to the shape information of its audio, and can be according to Different pronunciation syllables further divide Speech time section so that further divide obtained each Speech time section Corresponding different pronunciation syllable, so that each Speech time section can be respectively adopted in session operational scenarios when operating in each Speech time section The shape of the mouth as one speaks shape image of corresponding pronunciation syllable carries out Dynamic Announce to the shape of the mouth as one speaks of scene role.More specifically, a kind of possibility Embodiment for example can be that run session operational scenarios during, the audio for obtaining session operational scenarios in real time works as prewave Shape information simultaneously determines that current time belongs to which pronunciation syllable time periods of silence still falls within according to current form information Speech time section, if it is determined that current time belongs to time periods of silence, then can carry out static status display to scene role's shape of the mouth as one speaks, such as Fruit determines that current time belongs to the Speech time section of a pronunciation syllable, then can call the shape of the mouth as one speaks shape image pair of the pronunciation syllable The shape of the mouth as one speaks of scene role carries out Dynamic Announce；Alternatively possible embodiment for example can be, before session operational scenarios operation, Can obtain the shape information of the audio of session operational scenarios in advance, and according to it is each when the shape information inscribed session operational scenarios are divided into Time periods of silence and the different phonetic period for corresponding respectively to different pronunciation syllables, then by the syllable that respectively pronounces in session operational scenarios Speech time section and time periods of silence be recorded as the time segment informations of session operational scenarios, and during running session operational scenarios, It can determine that current time is to belong to which pronunciation sound time periods of silence still falls in real time by allocating time segment information The Speech time section of section, if it is determined that current time belongs to time periods of silence, then can be carried out to scene role's shape of the mouth as one speaks static aobvious Show, if it is determined that current time belongs to the Speech time section of a pronunciation syllable, then can call the shape of the mouth as one speaks shape of the pronunciation syllable Image carries out Dynamic Announce to the shape of the mouth as one speaks of scene role；Another possible embodiment can be transported in session operational scenarios Before row, the shape information of the audio of session operational scenarios can be obtained in advance, and according to it is each when the shape information inscribed will talk with field Scape is divided into time periods of silence and corresponds respectively to the different phonetic period of different pronunciation syllables, quiet according still further to what is marked off The Speech time section of sound period and each pronunciation syllable, generates video image file so that the video image for session operational scenarios The shape of the mouth as one speaks of static status display scene role in middle time periods of silence and be utilized respectively each pronunciation in the Speech time section for the syllable that respectively pronounces The corresponding shape of the mouth as one speaks shape image of syllable carrys out the shape of the mouth as one speaks of Dynamic Announce scene role, and when running session operational scenarios, it is only necessary to playing should Video image file identifies that current time is to belong to which pronunciation syllable time periods of silence still falls in real time without going again Speech time section.

Technical solution through this embodiment shows scene role in session operational scenarios since audio volume control amplitude is larger It is speaking and audio volume control amplitude is smaller shows that scene role is not speaking, therefore, the Dynamic Announce field in Speech time section The shape of the mouth as one speaks of scape role and in the shape of the mouth as one speaks of time periods of silence static status display scene role, so that it may so that only on the scene in session operational scenarios Scape role shows the image of the scene role's shape of the mouth as one speaks dynamic change when speaking, and is shown when session operational scenarios Scene role is silent The image that scene role's shape of the mouth as one speaks remains unchanged, so that the conversation audio and mouth of each stage Scene role of session operational scenarios Type image can match, and obtain more life-like dialogue display effect.

In order to enable those skilled in the art are more clearly understood that embodiment of the present invention under concrete application scene, under Face as a specific example, is introduced embodiment of the present invention by two application scenarios.

In Application Scenarios-Example one, the Speech time section and time periods of silence of session operational scenarios are drawn in advance by the first equipment Divide and be recorded in time segment information, session operational scenarios then determine scene angle by the second equipment when running according to time segment information The display mode of the color shape of the mouth as one speaks.Wherein, the first equipment for example can be that technical staff writes the terminal device of application program, provides and answer With the server apparatus of program or the terminal device of user installation application client, the second equipment for example can be user The terminal device of application client is installed.Specifically, the embodiment of Application Scenarios-Example one, may refer to shown in Fig. 3 The present invention in cooperation audio another embodiment of method for displaying image flow chart, the present embodiment for example can specifically include as Lower step：

Step 301, the first equipment divide instruction in response to the period to session operational scenarios in application program, obtain dialogue field The audio volume control information of scape.

Specifically, the first equipment can be obtained in the audio for getting session operational scenarios by being parsed to the audio Obtain shape information.

Step 302, the first equipment according to the audio volume control information by session operational scenarios be divided into Speech time section and it is mute when Between section.

Specifically, for any time in session operational scenarios, if the amplitude of wave form of shape information is more than the first amplitude threshold Value, then the moment can be divided into Speech time section, if the amplitude of wave form of shape information be less than the second amplitude threshold, this when Quarter can be divided into time periods of silence.In addition, it is more than the Speech time section of the first amplitude threshold for amplitude of wave form, it can also be by It is divided again according to the corresponding pronunciation syllable of shape information, to mark off the Speech time section of the different pronunciation syllables of each correspondence. Again in addition, being the Speech time section of time periods of silence for former and later two periods, if the Speech time section is less than minimum Former and later two time periods of silence of Speech time Duan Yuqi can also be merged into a time periods of silence by time interval, also, It is the time periods of silence of Speech time section for former and later two periods, if the time periods of silence is less than between minimum time Every can also the time periods of silence and its former and later two Speech time section be merged into a Speech time section, can made in this way The Speech time section and time periods of silence that must be finally obtained are not less than minimum interval.

Step 303, the first equipment by the Speech time section and time periods of silence of session operational scenarios be recorded session operational scenarios when Between segment information, and be saved in time segment information is corresponding with session operational scenarios in application program.

Step 304, the second equipment call the period of session operational scenarios to believe in response to the triggering command of operation session operational scenarios Breath.

Step 305, the second equipment are in the operational process of session operational scenarios, according to the time segment information of session operational scenarios, in real time Ground determines that current time belongs to Speech time section or time periods of silence.

Step 306, the second equipment belong to Speech time section in response to current time, to the shape of the mouth as one speaks of scene role into Mobile state Display.

Specifically, if can determine that Speech time section belongs to which pronunciation syllable, the second equipment according to time segment information Dynamic Announce can be carried out to scene role using the shape of the mouth as one speaks shape image in advance for pronunciation syllable configuration.

Step 307, the second equipment belong to time periods of silence in response to current time, are carried out to the shape of the mouth as one speaks of scene role static Display.

Technical solution through this embodiment can only to show scene when scene role speaks in session operational scenarios The image of role's shape of the mouth as one speaks dynamic change, and show that scene role's shape of the mouth as one speaks is remained unchanged when session operational scenarios Scene role is silent Image so that the conversation audio of each stage Scene role of session operational scenarios can be matched with mouth shape image, Obtain more life-like dialogue display effect.

In Application Scenarios-Example two, the first equipment to session operational scenarios when dividing the period according to the division period in advance Generate the video image file of scene role shape of the mouth as one speaks display mode in configured each period, the second equipment when session operational scenarios are run It only needs to play video image file, without removing identification Speech time section and time periods of silence again.Wherein, the first equipment for example may be used Be technical staff write application program terminal device, provide application program server apparatus or user installation application journey The terminal device of sequence client, the second equipment for example can be the terminal devices of user installation application client.Specifically, The embodiment of Application Scenarios-Example two, the method for displaying image that may refer to cooperation audio in the present invention shown in Fig. 4 are another The flow chart of embodiment, the present embodiment for example can specifically include following steps：

Step 401, the first equipment divide instruction in response to the period to session operational scenarios in application program, obtain dialogue field The audio volume control information of scape.

Step 402, the first equipment according to the audio volume control information by session operational scenarios be divided into Speech time section and it is mute when Between section.

Step 403, the first equipment generate regarding for session operational scenarios according to the Speech time section and time periods of silence of session operational scenarios Frequency image file, and be saved in video image file is corresponding with session operational scenarios in application program.

Wherein, in video image file, the shape of the mouth as one speaks of static status display scene role in time periods of silence, in Speech time section The shape of the mouth as one speaks of Dynamic Announce scene role.Further, if Speech time section is further divided into the different pronunciation sounds of each correspondence The Speech time section of section is respectively utilized as each pronunciation syllable in the Speech time section of pronunciation syllable and matches then in video image file The shape of the mouth as one speaks shape image Dynamic Announce scene role set.

Step 404, the second equipment call the video image text of session operational scenarios in response to the triggering command of operation session operational scenarios Part plays out.

Example devices

After describing the method for exemplary embodiment of the invention, next, with reference to figure 5 to the exemplary reality of the present invention Apply mode, for coordinating the image display of audio to be introduced.

Referring to Fig. 5, the structure chart for one embodiment of image display for coordinating audio in the present invention is shown.In this implementation Example, the equipment for example can specifically include：

Module 501 is run, for running session operational scenarios；

Dynamic display module 502, for when the session operational scenarios operate in Speech time section, to the shape of the mouth as one speaks of scene role Carry out Dynamic Announce；

Static status display module 503, for when the session operational scenarios operate in time periods of silence, to the shape of the mouth as one speaks of scene role Carry out static status display；

Wherein, the Speech time section and the time periods of silence are the waveforms according to audio corresponding to the session operational scenarios Obtained from information divides the session operational scenarios, wherein the waveform of the shape information in the Speech time section Amplitude is more than the first amplitude threshold, and the amplitude of wave form of the shape information is less than the second amplitude threshold in the time periods of silence Value, wherein first amplitude threshold is not less than second amplitude threshold.

Optionally, in some embodiments of the present embodiment, the Speech time section of the session operational scenarios and described mute Period can for example be recorded in advance as in the time segment information of session operational scenarios configuration；

The dynamic display module 502, be specifically used in response to during running the session operational scenarios according to Time segment information and determine that presently described session operational scenarios operate in Speech time section, it is aobvious into Mobile state to the shape of the mouth as one speaks of scene role Show；

The static status display module 503, be specifically used in response to during running the session operational scenarios according to the time Segment information and determine that presently described session operational scenarios operate in time periods of silence, static status display is carried out to the shape of the mouth as one speaks of scene role.

Optionally, in other embodiments of the present embodiment, the shape of the mouth as one speaks to scene role carries out Dynamic Announce, With, the static presentation of shape of the mouth as one speaks progress to scene role, such as may each be by the process for running the session operational scenarios Middle broadcasting is realized for the video image file of session operational scenarios configuration in advance；The video image file is in Speech time In image in section, the shape of the mouth as one speaks dynamic change of scene role；In image of the video image file in time periods of silence, field The shape of the mouth as one speaks static state of scape role is constant.

Optionally, in the other embodiment of the present embodiment, the dynamic display module 502 for example can specifically wrap It includes：

First Dynamic Announce submodule, for when the session operational scenarios operate in the first Speech time section, using first The shape of the mouth as one speaks carries out Dynamic Announce to the shape of the mouth as one speaks of scene role；

Second Dynamic Announce submodule, for when the session operational scenarios operate in the second Speech time section, using second The shape of the mouth as one speaks carries out Dynamic Announce to the shape of the mouth as one speaks of scene role；

Wherein, the first Speech time section and the second Speech time section are according to the Speech time section audio Obtained from speech syllable divides the Speech time section, wherein the voice in the first Speech time section Syllable is the first pronunciation syllable, and the speech syllable is the second pronunciation syllable in the second Speech time section；

Wherein, first shape of the mouth as one speaks is different from the shape of the mouth as one speaks shape of second shape of the mouth as one speaks.

Optionally, in some other embodiments of the present embodiment, the Speech time section of the session operational scenarios and it is mute when Between section for example can be not less than preset minimum interval.

It should be noted that although be referred in above-detailed cooperation audio image display several modules or Submodule, but this division is only not enforceable.In fact, according to the embodiment of the present invention, above-described two The feature and function of a or more module can embody in a module.Conversely, the feature of an above-described module It can be further divided into function and be embodied by multiple modules.

In addition, although the operation of the method for the present invention is described with particular order in the accompanying drawings, this do not require that or Hint must execute these operations according to the particular order, or have to carry out shown in whole operation could realize it is desired As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one Step is decomposed into execution of multiple steps.

Although by reference to several spirit and principle that detailed description of the preferred embodimentsthe present invention has been described, it should be appreciated that, this It is not limited to the specific embodiments disclosed for invention, does not also mean that the feature in these aspects cannot to the division of various aspects Combination is this to divide the convenience merely to statement to be benefited.The present invention is directed to cover appended claims spirit and Included various modifications and equivalent arrangements in range.

Claims

1. a kind of method for displaying image of cooperation audio, including：

In response to running the triggering command of session operational scenarios, session operational scenarios are run；

According to the correspondence of session operational scenarios and time segment information, the time segment information of session operational scenarios is called；

According to the time segment information of session operational scenarios, determine that current time belongs to Speech time section or time periods of silence in real time； The Speech time section and the time periods of silence are recorded in advance as in the time segment information of session operational scenarios configuration；

When the session operational scenarios operate in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role, is specifically included：It rings Ying Yu determines presently described session operational scenarios and operates in during running the session operational scenarios according to the time segment information Speech time section carries out Dynamic Announce to the shape of the mouth as one speaks of scene role；

Wherein, when the session operational scenarios operate in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role, including：

When the session operational scenarios operate in the first Speech time section, using first shape of the mouth as one speaks to the shape of the mouth as one speaks of scene role into Mobile state Display；

When the session operational scenarios operate in the second Speech time section, using second shape of the mouth as one speaks to the shape of the mouth as one speaks of scene role into Mobile state Display；

Wherein, the first Speech time section and the second Speech time section are to correspond to audio according to the Speech time section Obtained from speech syllable divides the Speech time section, wherein the voice in the first Speech time section Syllable is the first pronunciation syllable, and the speech syllable is the second pronunciation syllable in the second Speech time section；

Wherein, first shape of the mouth as one speaks is different from the shape of the mouth as one speaks shape of second shape of the mouth as one speaks；

When the session operational scenarios operate in time periods of silence, static status display is carried out to the shape of the mouth as one speaks of scene role, is specifically included：It rings It is mute that Ying Yu determines that presently described session operational scenarios operate in during running the session operational scenarios according to time segment information Period carries out static status display to the shape of the mouth as one speaks of scene role；

Wherein, the Speech time section and the time periods of silence are the shape informations according to audio corresponding to the session operational scenarios Obtained from being divided to the session operational scenarios, wherein the amplitude of wave form of the shape information in the Speech time section More than the first amplitude threshold, the amplitude of wave form of the shape information is less than the second amplitude threshold in the time periods of silence, In, first amplitude threshold is not less than second amplitude threshold.

2. according to the method described in claim 1, wherein, the Speech time section and time periods of silence of the session operational scenarios be not small In preset minimum interval.

3. a kind of image display of cooperation audio, including：

Module is run, for running session operational scenarios；

Calling module, the time segment information for calling session operational scenarios；

Determining module determines that current time belongs to Speech time section also in real time for the time segment information according to session operational scenarios It is time periods of silence；The Speech time section and the time periods of silence were recorded in advance as the time of session operational scenarios configuration In segment information；

Dynamic display module, for when the session operational scenarios operate in Speech time section, to the shape of the mouth as one speaks of scene role into action State is shown, is specifically used in response to determining current institute according to the time segment information during running the session operational scenarios It states session operational scenarios and operates in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role；

Wherein, the dynamic display module includes：

First Dynamic Announce submodule, for when the session operational scenarios operate in the first Speech time section, using first shape of the mouth as one speaks Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role；

Second Dynamic Announce submodule, for when the session operational scenarios operate in the second Speech time section, using second shape of the mouth as one speaks Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role；

Wherein, the first Speech time section and the second Speech time section are the voices according to the Speech time section audio Obtained from syllable divides the Speech time section, wherein the speech syllable in the first Speech time section For the first pronunciation syllable, the speech syllable is the second pronunciation syllable in the second Speech time section；

Static status display module, for when the session operational scenarios operate in time periods of silence, being carried out to the shape of the mouth as one speaks of scene role quiet State is shown, is specifically used in response to determining presently described right according to time segment information during running the session operational scenarios Words scene operates in time periods of silence, and static status display is carried out to the shape of the mouth as one speaks of scene role；

4. equipment according to claim 3, wherein the Speech time section and time periods of silence of the session operational scenarios be not small In preset minimum interval.