CN105611215A

CN105611215A - Video call method and device

Info

Publication number: CN105611215A
Application number: CN201511022238.5A
Authority: CN
Inventors: 张怀畅
Original assignee: Zhang Ying Information Technology (shanghai) Co Ltd
Current assignee: Zhang Ying Information Technology (shanghai) Co Ltd
Priority date: 2015-12-30
Filing date: 2015-12-30
Publication date: 2016-05-25

Abstract

The invention discloses a video call method and device, and belongs to the field of video. The video call method comprises the steps of obtaining a virtual image; obtaining a current video frame; matching the virtual image with the current video frame to obtain a matched current video frame; sending the matched current video frame to multiple video call devices; receiving multiple matched current video frames from the multiple video call devices; and displaying the received multiple matched current video frames. Through displaying the current video frames obtained after the current video frame and the virtual image are matched in a multi-user instant video interaction process, the video call method, compared with the traditional multi-user instant video display method, adds a multi-user instant video display mode, satisfies individual demands of users in the multi-user instant video interaction process, increases interactivity of multi-user instant video participators in the interaction process and improves interaction experience.

Description

Video call method and device

Technical Field

The present invention relates to the field of video, and in particular, to a video call method and apparatus.

Background

Due to the real-time performance and high interactive experience of video calls, more and more users can select the instant video to meet the requirement of multi-person conversation interaction.

However, in the existing multi-user instant video interaction technology, the video pictures of users participating in multi-user instant video are often displayed on the video call device of any party through a rectangular window, the display mode cannot meet the diversified requirements of users in the multi-user instant video process, and the display mode is single in form, so that the requirement that the user experience is improved by increasing the instant video display mode cannot be met in the instant video process, therefore, when the instant video is realized by using the existing multi-user instant video interaction technology, the user experience of the user is poor, and especially the interaction experience in the interaction process is poor.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a video call method and apparatus. The technical scheme is as follows:

in a first aspect, a video call method is provided, where the method includes:

acquiring a virtual image;

acquiring a current video frame;

matching the virtual image with the current video frame to obtain a matched current video frame;

sending the matched current video frame to a plurality of video call devices;

receiving a plurality of matched current video frames from the plurality of video call devices; and

and displaying the received plurality of matched current video frames.

With reference to the first aspect, in a first possible implementation manner, the matching the avatar with the current video frame to obtain a matched current video frame includes:

recognizing and acquiring a face part in the current video frame, and matching the face part with the virtual image to obtain a matched current video frame; or

And cutting and/or reducing the current video frame according to the virtual image, and matching the cutting and/or reducing result with the virtual image to obtain the matched current video frame.

With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, the displaying the received multiple matched current video frames includes:

displaying the received plurality of matched current video frames according to a default configuration; or

Acquiring a first user instruction;

and displaying the received plurality of matched current video frames according to the first user instruction.

With reference to the first aspect to any one of the second possible implementation manners of the first aspect, in a third possible implementation manner, the method further includes:

acquiring a second user instruction input by a user, wherein the second user instruction comprises a gesture and voice triggered by the user on the virtual image and a first key;

and displaying the received plurality of matched current video frames according to the second user instruction.

With reference to the first aspect to any one of the third possible implementation manners of the first aspect, in a fourth possible implementation manner, the method further includes:

and if the user triggers a default event, sending the default event to the plurality of video call devices, wherein the default event comprises that the user closes a camera and/or exits from a multi-person session.

With reference to the first aspect to any one of the fourth possible implementation manners of the first aspect, in a fifth possible implementation manner, the method further includes:

receiving at least one default event from at least one of the plurality of video telephony devices;

displaying the received plurality of matched current video frames according to the special effect corresponding to the at least one default event.

With reference to any one of the first aspect to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner, the method further includes:

acquiring a third user instruction triggered by a user, wherein the third user instruction comprises that the user clicks the virtual image and/or triggers a second key;

and sending the current video frame to the plurality of video call devices.

With reference to any one of the first aspect to the sixth possible implementation manner of the first aspect, in a seventh possible implementation manner, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

transmitting identifiers of the actions and/or expressions of the avatar to a plurality of video call devices;

receiving an identifier of an action and/or expression of an avatar from at least one of the plurality of video call devices;

acquiring the corresponding action and/or expression of the at least one virtual character according to the received identifier of the action and/or expression of the at least one virtual character; and

displaying the actions and/or expressions of the at least one avatar.

In a second aspect, a video call method is provided, the method including:

acquiring an identifier of the virtual image;

acquiring a current video frame;

transmitting the identifier of the avatar and the current video frame to a plurality of video telephony devices;

receiving current video frames and identifiers of corresponding avatars from the plurality of video call devices, respectively;

acquiring a corresponding virtual image according to the identifier of the virtual image;

matching the virtual image with the received current video frame to obtain a plurality of matched current video frames; and

and displaying the plurality of matched current video frames.

With reference to the second aspect, in a first possible implementation manner, the matching the avatar with the received current video frame to obtain a plurality of matched current video frames includes:

With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner, the displaying the plurality of matched current video frames includes:

displaying the plurality of matched current video frames according to a default configuration; or

Acquiring a first user instruction;

and displaying the plurality of matched current video frames according to the first user instruction.

With reference to the second aspect to any one of the second possible implementation manners of the second aspect, in a third possible implementation manner, the method further includes:

acquiring a second user instruction input by a user, wherein the second user instruction comprises a gesture and voice triggered by the user on the virtual image and a first key; and

and displaying the plurality of matched current video frames according to the second user instruction.

With reference to the second aspect to any one of the third possible implementation manners of the second aspect, in a fourth possible implementation manner, the method further includes:

With reference to the second aspect to any one of the fourth possible implementation manners of the second aspect, in a fifth possible implementation manner, the method further includes:

receiving at least one default event from at least one of the plurality of video telephony devices; and

and displaying the matched current video frames according to the special effect corresponding to the at least one default event.

With reference to the second aspect to any one of the fifth possible implementation manners of the second aspect, in a sixth possible implementation manner, the method further includes:

acquiring a third user instruction triggered by a user, wherein the third user instruction comprises that the user clicks the virtual image and/or triggers a second key; and

and sending the current video frame to the plurality of video call devices.

With reference to the second aspect to any one of the sixth possible implementation manners of the second aspect, in a seventh possible implementation manner, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

displaying the actions and/or expressions of the at least one avatar.

In a third aspect, a video call device is provided, the device comprising:

the virtual image acquisition module is used for acquiring a virtual image;

the current video frame acquisition module is used for acquiring a current video frame;

the matching module is used for matching the virtual image with the current video frame to obtain a matched current video frame;

the sending module is used for sending the matched current video frame to a plurality of video call devices;

a receiving module, configured to receive a plurality of matched current video frames from the plurality of video call devices; and

and the display module is used for displaying the received multiple matched current video frames.

With reference to the third aspect, in a first possible implementation manner, the matching module is specifically configured to:

With reference to the third aspect or the first possible implementation manner of the third aspect, in a second possible implementation manner, the display module is specifically configured to:

Acquiring a first user instruction;

With reference to any one of the second possible implementation manners of the third aspect to the third aspect, in a third possible implementation manner,

the device further comprises a user instruction acquisition module used for acquiring a second user instruction input by the user, wherein the second user instruction comprises a gesture and voice triggered by the user on the virtual image and a first key;

the display module is further configured to display the received plurality of matched current video frames according to the second user indication.

With reference to any one of the third to the third possible implementation manners of the third aspect, in a fourth possible implementation manner,

the sending module is further configured to send a default event to the plurality of video call devices when the user triggers the default event, where the default event includes that the user closes a camera and/or exits a multi-person session.

With reference to any one of the fourth possible implementation manners of the third aspect to the third aspect, in a fifth possible implementation manner,

the receiving module is further configured to receive at least one default event from at least one of the plurality of video telephony devices;

the display module is further configured to display the received plurality of matched current video frames according to a special effect corresponding to the at least one default event.

With reference to any one of the fifth possible implementation manner of the third aspect to the fifth aspect, in a sixth possible implementation manner,

the user indication module is further used for obtaining a third user indication triggered by the user, and the third user indication comprises that the user clicks the virtual image and/or triggers a second key;

the sending module is further configured to send the current video frame to the plurality of video call devices.

With reference to any one of the sixth possible implementation manners of the third aspect to the third aspect, in a seventh possible implementation manner,

the device also comprises an identifier acquisition module of the action and/or expression of the virtual image, and the identifier acquisition module is used for acquiring the identifier of the action and/or expression of the virtual image;

the sending module is further used for sending the identifiers of the actions and/or expressions of the virtual images to a plurality of video call devices;

the receiving module is further used for receiving identifiers of actions and/or expressions of the virtual image from at least one video call device in the plurality of video call devices;

the device also comprises an avatar action and/or expression acquisition module, which is used for acquiring the corresponding action and/or expression of at least one avatar according to the received identifier of the action and/or expression of at least one avatar; and

the display module is further used for displaying the action and/or expression of the at least one virtual image.

In a fourth aspect, there is provided a video call apparatus, the apparatus comprising:

the identifier acquisition module of the virtual image is used for acquiring the identifier of the virtual image;

a sending module for sending the identifier of the avatar and the current video frame to a plurality of video call devices;

a receiving module, configured to receive current video frames and identifiers of corresponding avatars from the plurality of video call devices, respectively, and obtain the corresponding avatars;

the virtual image acquisition module is used for acquiring the corresponding virtual image according to the identifier of the virtual image;

the matching module is used for matching the virtual image with the received current video frame to obtain a plurality of matched current video frames; and

and the display module is used for displaying the matched current video frames.

With reference to the fourth aspect, in a first possible implementation manner, the matching module is specifically configured to:

With reference to the fourth aspect or the first possible implementation manner of the fourth aspect, in a second possible implementation manner, the display module is specifically configured to:

Acquiring a first user instruction;

With reference to any one of the second possible implementation manners of the fourth aspect to the fourth aspect, in a third possible implementation manner,

the device further comprises a user instruction acquisition module used for acquiring a second user instruction input by the user, wherein the second user instruction comprises a gesture and voice triggered by the user on the virtual image and a first key; and

the display module is further configured to display the plurality of matched current video frames according to the second user indication.

With reference to any one of the third possible implementation manners of the fourth aspect to the fourth aspect, in a fourth possible implementation manner,

With reference to any one of the fourth possible implementation manners of the fourth aspect to the fourth aspect, in a fifth possible implementation manner,

the receiving module is further configured to receive at least one default event from at least one of the plurality of video call devices; and

the display module is further configured to display the plurality of matched current video frames according to a special effect corresponding to the at least one default event.

With reference to any one of the fifth possible implementation manner of the fourth aspect to the fourth aspect, in a sixth possible implementation manner,

the user indication acquisition module is further used for acquiring a third user indication triggered by the user, wherein the third user indication comprises that the user clicks the virtual image and/or triggers the second key; and

With reference to any one of the sixth possible implementation manner of the fourth aspect to the fourth aspect, in a seventh possible implementation manner,

In a fifth aspect, a video call device is provided, where the device includes a camera, a touch display screen, a sending/receiving module, a memory, and a processor connected to the camera, the touch display screen, the sending/receiving module, and the memory, where the memory is used to store a set of program codes, and the processor calls the program codes stored in the memory to perform the following operations:

acquiring a virtual image;

controlling the camera to acquire a current video frame;

controlling the sending/receiving module to send the matched current video frame to a plurality of video call devices;

controlling the sending/receiving module to receive a plurality of matched current video frames from the plurality of video call devices; and

and controlling the touch display screen to display the received multiple matched current video frames.

With reference to the fifth aspect, in a first possible implementation manner, the processor calls the program code stored in the memory to specifically perform the following operations:

With reference to the fifth aspect or the first possible implementation manner of the fifth aspect, in a second possible implementation manner, the processor calls the program code stored in the memory to specifically perform the following operations:

Acquiring a first user instruction;

and controlling the touch display screen to display the received multiple matched current video frames according to the first user instruction.

With reference to any one of the second possible implementation manner of the fifth aspect to the fifth aspect, in a third possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

and controlling the touch display screen to display the received multiple matched current video frames according to the second user instruction.

With reference to any one of the third possible implementation manners of the fifth aspect to the fifth aspect, in a fourth possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

and if the user triggers a default event, controlling the sending/receiving module to send the default event to the plurality of video call devices, wherein the default event comprises that the user closes a camera and/or exits from a multi-person session.

With reference to any one of the fourth possible implementation manners of the fifth aspect to the fifth aspect, in a fifth possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

control the transmit/receive module to receive at least one default event from at least one of the plurality of video telephony devices;

and controlling the touch display screen to display the received multiple matched current video frames according to the special effect corresponding to the at least one default event.

With reference to any one of the fifth possible implementation manners of the fifth aspect to the fifth aspect, in a sixth possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

controlling the sending/receiving module to send the current video frame to the plurality of video call devices.

With reference to any one of the sixth possible implementation manners of the fifth aspect to the fifth aspect, in a seventh possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

obtaining identifiers of actions and/or expressions of the avatar;

controlling the transmitting/receiving module to transmit the identifiers of the motions and/or expressions of the avatar to a plurality of video call devices;

controlling the transmitting/receiving module to receive an identifier of an action and/or expression of an avatar from at least one of the plurality of video call devices;

and controlling the touch display screen to display the action and/or expression of the at least one virtual character.

In a sixth aspect, a video call device is provided, where the device includes a camera, a touch display screen, a sending/receiving module, a memory, and a processor connected to the camera, the touch display screen, the sending/receiving module, and the memory, where the memory is used to store a set of program codes, and the processor calls the program codes stored in the memory to perform the following operations:

acquiring an identifier of the virtual image;

controlling the camera to acquire a current video frame;

controlling the transmitting/receiving module to transmit the identifier of the avatar and the current video frame to a plurality of video call devices;

controlling the transmitting/receiving module to receive current video frames and identifiers of corresponding avatars from the plurality of video call devices, respectively;

and controlling the touch display screen to display the plurality of matched current video frames.

With reference to the sixth aspect, in a first possible implementation manner, the processor calls the program code stored in the memory to specifically perform the following operations:

With reference to the sixth aspect or the first possible implementation manner of the sixth aspect, in a second possible implementation manner, the processor calls the program code stored in the memory to specifically perform the following operations:

Acquiring a first user instruction;

and controlling the touch display screen to display the plurality of matched current video frames according to the first user instruction.

With reference to any one of the second possible implementation manner of the sixth aspect to the sixth aspect, in a third possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

and controlling the touch display screen to display the matched current video frames according to the second user instruction.

With reference to any one of the third possible implementation manner of the sixth aspect to the sixth aspect, in a fourth possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

With reference to any one of the fourth possible implementation manner of the sixth aspect to the sixth aspect, in a fifth possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

controlling the sending/receiving module to receive at least one default event from at least one of a plurality of video call devices; and

and controlling the touch display screen to display the matched current video frames according to the special effect corresponding to the at least one default event.

With reference to any one of the fifth possible implementation manner of the sixth aspect to the sixth aspect, in a sixth possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

With reference to any one of the sixth possible implementation manners of the sixth aspect to the sixth aspect, in a seventh possible implementation manner, the processor calls the program code stored in the memory to further perform the following operations:

obtaining identifiers of actions and/or expressions of the avatar;

The embodiment of the invention provides a video call method and a video call device, wherein the method comprises the following steps: acquiring a virtual image; acquiring a current video frame; matching the virtual image with the current video frame to obtain a matched current video frame; sending the matched current video frame to a plurality of video call devices; receiving a plurality of matched current video frames from a plurality of video call devices; and displaying the received plurality of matched current video frames. Compared with the traditional multi-user instant video display method, the method increases the display mode of the multi-user instant video, meets the individual requirements of users in the multi-user instant video interaction process, simultaneously increases the interactivity of multi-user instant video participants in the interaction process, and improves the interaction experience.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a video call method according to an embodiment of the present invention;

fig. 2 is a flowchart of a video call method according to an embodiment of the present invention;

fig. 3 is a flowchart of a video call method according to an embodiment of the present invention;

FIG. 4 is a schematic view of an interface provided by an embodiment of the present invention;

FIG. 5 is a schematic view of an interface provided by an embodiment of the present invention;

fig. 6 is a flowchart of a video call method according to an embodiment of the present invention;

fig. 7 is a flowchart of a video call method according to an embodiment of the present invention;

FIG. 8 is a schematic view of an interface provided by an embodiment of the present invention;

FIG. 9 is a schematic view of an interface provided by an embodiment of the present invention;

FIG. 10 is a schematic view of an interface provided by an embodiment of the present invention;

fig. 11 is a flowchart of a video call method according to an embodiment of the present invention;

FIG. 12 is a schematic view of an interface provided by an embodiment of the present invention;

FIG. 13 is a schematic view of an interface provided by an embodiment of the present invention;

FIG. 14 is a schematic view of an interface provided by an embodiment of the present invention;

fig. 15 is a flowchart of a video call method according to an embodiment of the present invention;

FIG. 16 is a schematic view of an interface provided by an embodiment of the present invention;

FIG. 17 is a schematic view of an interface provided by an embodiment of the present invention;

fig. 18 is a flowchart of a video call method according to an embodiment of the present invention;

FIG. 19 is a schematic view of an interface provided by an embodiment of the present invention;

fig. 20 is a flowchart of a video call method according to an embodiment of the present invention;

FIG. 21 is a schematic view of an interface provided by an embodiment of the present invention;

fig. 22 is a schematic structural diagram of a video call device according to an embodiment of the present invention;

fig. 23 is a schematic structural diagram of a video call device according to an embodiment of the present invention;

fig. 24 is a schematic structural diagram of a video call device according to an embodiment of the present invention;

fig. 25 is a schematic structural diagram of a video call device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a video call method, which is used for multi-person video call in an instant video interaction scene, wherein instant video interaction can be realized by a participant of multi-person conversation through running an application program on electronic equipment, the electronic equipment can be any one of a smart phone, a tablet personal computer and wearable equipment, and the embodiment of the invention does not limit specific electronic equipment; in addition, in the process of the instant video interaction, data transmission including the instant video may be implemented in a point-to-point manner, or may be implemented in a server transfer manner.

An embodiment of the present invention provides a video call method, which is shown in fig. 1 and includes:

101. and acquiring the virtual image.

102. And acquiring the current video frame.

It is to be noted that step 101 and step 102 may be executed according to the described sequence, or step 102 may be executed first and then step 101 may be executed, or may be executed simultaneously, and the specific execution sequence is not limited in the embodiment of the present invention.

103. And matching the virtual image with the current video frame to obtain the matched current video frame.

Specifically, a face part in the current video frame is identified and acquired, and the face part is matched with the virtual image to obtain a matched current video frame; or

104. And sending the matched current video frame to a plurality of video call devices.

105. A plurality of matched current video frames from a plurality of video call devices is received.

106. And displaying the received plurality of matched current video frames.

Specifically, a plurality of received matched current video frames are displayed according to default configuration; or

Acquiring a first user instruction;

Optionally, the method further includes:

acquiring a second user instruction input by the user, wherein the second user instruction comprises a gesture and voice triggered by the user on the virtual image and a first key;

Optionally, the method further includes:

and if the user triggers a default event, sending the default event to the plurality of video call devices, wherein the default event comprises that the user closes the camera and/or exits the multi-person conversation.

Optionally, the method further includes:

and displaying the received plurality of matched current video frames according to the special effect corresponding to the at least one default event.

Optionally, the method further includes:

acquiring a third user instruction triggered by the user, wherein the third user instruction comprises that the user clicks the virtual image and/or triggers a second key;

the current video frame is sent to the plurality of video call devices.

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

transmitting an identifier of an action and/or expression of the avatar to the plurality of video call devices;

acquiring the corresponding action and/or expression of at least one virtual character according to the received identifier of the action and/or expression of at least one virtual character; and

the actions and/or expressions of at least one avatar are displayed.

Compared with the traditional multi-user instant video display method, the method increases the display mode of the multi-user instant video, meets the individual requirements of users in the multi-user instant video interaction process, simultaneously increases the interactivity of multi-user instant video participants in the interaction process, and improves the interaction experience.

The second embodiment is a video call method provided in the embodiment of the present invention, and as shown in fig. 2, the method includes:

201. an identifier of the avatar is obtained.

202. And acquiring the current video frame.

It is to be noted that step 201 and step 202 may be executed according to the described sequence, or step 202 may be executed first and then step 201 may be executed, or may be executed simultaneously, and the specific execution sequence is not limited in the embodiment of the present invention.

203. An identifier of the avatar and a current video frame are transmitted to the plurality of video call devices.

It should be noted that, in step 203, the identifier of the avatar and the current video frame may be transmitted separately or simultaneously, and the embodiment of the present invention is not limited thereto.

204. An identifier of a current video frame and a corresponding avatar from a plurality of video telephony devices, respectively, is received.

205. And acquiring the corresponding virtual image according to the identifier of the virtual image.

206. And matching the virtual image with the received current video frame to obtain a plurality of matched current video frames.

207. Displaying a plurality of matched current video frames.

Specifically, a plurality of matched current video frames are displayed according to default configuration; or

Acquiring a first user instruction;

and displaying a plurality of matched current video frames according to the first user instruction.

Optionally, the method further comprises:

acquiring a second user instruction input by the user, wherein the second user instruction comprises a gesture and voice triggered by the user on the virtual image and a first key; and

and displaying a plurality of matched current video frames according to the second user instruction.

Optionally, the method further includes:

and displaying a plurality of matched current video frames according to the special effect corresponding to the at least one default event.

Optionally, the method further includes:

acquiring a third user instruction triggered by the user, wherein the third user instruction comprises that the user clicks the virtual image and/or triggers a second key; and

the current video frame is sent to the plurality of video call devices.

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

the actions and/or expressions of at least one avatar are displayed.

In an embodiment of the present invention, a virtual image is matched with a current video frame to obtain a matched current video frame, and the matched current video frame is sent, where referring to fig. 3, the method includes:

301. and acquiring the virtual image.

Specifically, the avatar may be obtained through a selection instruction input by the user and/or a default configuration of the system, and the process may be:

acquiring a selection instruction input by a user and/or an identifier of an avatar corresponding to system default configuration;

searching the virtual image corresponding to the identifier in the locally stored virtual images according to the identifier of the virtual image;

if the virtual image corresponding to the identifier does not exist locally, the virtual image corresponding to the identifier is downloaded from the server.

The user can input a selection instruction on a preset avatar selection interface, wherein the selection instruction can be any one of a gesture, voice and a preset selection key;

for example, assuming that a preset avatar selection interface is shown in fig. 4 a, and the selection instruction is a click gesture, the corresponding avatar when the user clicks on the interface may be shown in fig. 4 b.

302. And acquiring the current video frame.

Specifically, the current video frame input through the camera may be acquired through a video frame acquisition instruction preset by the system.

It is to be noted that step 301 and step 302 may be executed according to the described sequence, or step 302 may be executed first and then step 301 may be executed, or may be executed simultaneously, and the specific execution sequence is not limited in the embodiment of the present invention.

303. And matching the virtual image with the current video frame to obtain the matched current video frame.

Specifically, the current video frame is cut and/or reduced according to the virtual image, and the cutting and/or reducing result is matched with the virtual image to obtain the matched current video frame, and the process may be:

cutting and/or reducing the current video frame according to the position of the virtual image in the screen;

wherein, the cropping the current video frame according to the position of the avatar in the screen may be:

dividing the current video frame into an area including the avatar and an area not including the avatar according to the position of the avatar, wherein the area including the avatar is a cutting result;

the sizes of the two regions may be preset, or may be dynamically adjusted according to the size of the avatar, and the specific sizes are not limited in the embodiment of the present invention.

In addition to cropping the current video frame according to the position of the avatar in the screen, the process of cropping the current video frame according to the avatar may be implemented in the following manner:

dividing a current video frame into at least one video subframe according to a preset division rule;

and obtaining a cutting result from at least one video subframe.

In practical application, the preset division rule may be to divide the current video frame in a squared figure manner, where the center part of the squared figure is the clipping result.

The current video frame is cut and/or reduced according to the position of the virtual image in the screen, and the cutting and/or reducing result is matched with the virtual image to obtain the matched current video frame, so that the combination of the video picture part of the user and the preset image part in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

Besides the above implementation manner, the face part in the current video frame may be identified and acquired, and the face part is matched with the avatar to obtain the matched current video frame, so as to realize matching the avatar with the current video frame to obtain the matched current video frame, and the process may be:

identifying a face in a current video frame, wherein the identification process can be realized by performing significance detection on the current video frame and/or according to feature points describing the face, and the embodiment of the invention does not limit the specific detection process;

and acquiring a face part at least comprising a face in the current video frame, wherein the size of the face part can be dynamically adjusted according to the face in the current video frame.

Because the attention of the user to the face part is higher than that of other parts in the video in the instant video interaction process, the face part in the current video frame is identified and obtained, the face part is matched with the virtual image, and the matched current video frame is obtained, so that the combination of the face part of the user in the matched current video frame and the preset image part is more natural, the display effect of the matched current video frame is improved, the requirement of the user on the high attention of the user to the face part in the instant video process is met, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

304. And sending the matched current video frame to a plurality of video call devices.

Specifically, the matched current video frame may be sent to the multiple video call devices according to the network address of the multi-person session participant, and the specific sending manner is not limited in the embodiment of the present invention.

305. A plurality of matched current video frames from a plurality of video call devices is received.

306. And displaying the received plurality of matched current video frames.

Acquiring a first user instruction;

It should be noted that, in the multi-person video session, the step of the video call device generating and sending the matched current video frame to the video call devices is relatively independent from the step of receiving and displaying the matched current video frames from the video call devices, and correspondingly, the processes described in steps 301 to 304 and the processes described in steps 305 to 306 may be executed in the order described, or the processes described in steps 305 to 306 may be executed first, then the processes described in steps 301 to 304 may be executed, and the processes described in steps 301 to 304 and the processes described in steps 305 to 306 may also be notified to be executed, and the specific execution order is not limited in the embodiment of the present invention.

Optionally, the method further comprises:

Optionally, the method further includes:

the current video frame is sent to the plurality of video call devices.

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

the actions and/or expressions of at least one avatar are displayed.

To further illustrate the beneficial effects achieved by the method according to the embodiment of the present invention, assuming that the current video frame picture of the multi-person conversation participant is shown with reference to a in fig. 5, and the default avatar is shown with reference to b in fig. 4, after the method according to the embodiment of the present invention is executed, the current video frame picture of any one of the users may be shown with reference to b in fig. 5.

Compared with the traditional multi-user instant video display method, the method increases the display mode of the multi-user instant video, meets the individual requirements of users in the multi-user instant video interaction process, simultaneously increases the interactivity of multi-user instant video participants in the interaction process, and improves the interaction experience. In addition, the current video frame is cut and/or reduced according to the position of the virtual image in the screen, and the cutting and/or reducing result is matched with the virtual image to obtain the matched current video frame, so that the combination of the video picture part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, in the instant video interaction process, the attention degree of the user to the face part is higher than that of other parts in the video, so that the face part in the current video frame is identified and obtained, the face part is matched with the virtual image, and the matched current video frame is obtained, so that the combination of the face part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the requirement of the user on the high attention degree of the face part in the instant video process is met, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

In an embodiment of the present invention, an avatar and a current video frame are sent instead of a result of matching the avatar with the current video frame, as shown in fig. 6, the method includes:

601. an identifier of the avatar is obtained.

Specifically, the selection instruction input by the user and/or the identifier of the avatar corresponding to the system default configuration are obtained, and in addition, the selection instruction input by the user and/or the identifier of the avatar corresponding to the system default configuration may be obtained in other manners.

602. And acquiring the current video frame.

Specifically, the step is the same as step 302, and is not described herein again.

It is to be noted that step 601 and step 602 may be executed according to the described sequence, or step 602 may be executed first and then step 601 may be executed, or may be executed simultaneously, and the specific execution sequence is not limited in the embodiment of the present invention.

603. An identifier of the avatar and a current video frame are transmitted to the plurality of video call devices.

Specifically, the identifier of the avatar and the current video frame may be sent to the plurality of video call devices according to the network address of the multi-person session participant, and the specific sending manner is not limited in the embodiment of the present invention.

It is noted that, in step 603, the identifier of the avatar and the current video frame may be transmitted to a plurality of video call devices simultaneously or respectively.

604. An identifier of a current video frame and a corresponding avatar from a plurality of video telephony devices, respectively, is received.

It should be noted that if the current video frame and the identifier of the corresponding avatar are transmitted simultaneously, the current video frame and the identifier of the corresponding avatar are received simultaneously, and if the current video frame and the identifier of the corresponding avatar are transmitted separately, the current video frame and the identifier of the corresponding avatar are received separately.

605. And acquiring the corresponding virtual image according to the identifier of the virtual image.

Specifically, according to the identifier of the virtual image, the virtual image corresponding to the identifier in the locally stored virtual image is searched;

606. And matching the virtual image with the received current video frame to obtain a plurality of matched current video frames.

Specifically, a face part in the current video frame is identified and acquired, and the face part is matched with the virtual image to obtain a matched current video frame;

In addition to the way, the process may be:

The process in step 606 is the same as that in step 303, and is not described here again.

607. Displaying a plurality of matched current video frames.

Acquiring a first user instruction;

Optionally, the method further comprises:

Optionally, the method further includes:

the current video frame is sent to the plurality of video call devices.

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

the actions and/or expressions of at least one avatar are displayed.

Fifth embodiment is a video call method provided in an embodiment of the present invention, in the embodiment of the present invention, a matched current video frame is displayed according to a default configuration or a first user instruction, as shown in fig. 7, the method includes:

701. and acquiring the virtual image.

Specifically, the step is the same as step 301, and is not described herein again.

702. And acquiring the current video frame.

It is to be noted that step 701 and step 702 may be executed according to the described sequence, or step 702 may be executed first and then step 701 may be executed, or may be executed simultaneously, and the specific execution sequence is not limited in the embodiment of the present invention.

703. And matching the virtual image with the current video frame to obtain the matched current video frame.

The step is the same as step 303, and is not described herein again.

704. And sending the matched current video frame to a plurality of video call devices.

Specifically, the step is the same as step 304, and is not described herein again.

705. A plurality of matched current video frames from a plurality of video call devices are received and after step 705 either of steps 706 or 707 are performed.

Specifically, the step is the same as step 305, and is not described herein again.

706. And displaying the received multiple matched current video frames according to the default configuration, and ending.

Specifically, the default configuration may be a display special effect corresponding to the number of matched current video frames, and the process may be:

acquiring a corresponding display special effect locally according to the number of the matched current video frames; the display special effect can be bubble display, tree display and other special effect display modes, and the embodiment of the invention does not limit the specific display special effect.

And displaying a plurality of matched current video frames according to the display special effect corresponding to the number of the matched current video frames.

For example, assuming that the display effect is a bubble display, as shown in a in fig. 8, each bubble includes an avatar, the bubble randomly flutters in the screen, and the two bubbles do not overlap with each other, after displaying the received multiple matched current video frames according to the default configuration, the video call interface may be as shown in b in fig. 8. Assuming that the display effect is a tree display, as shown in a in fig. 9, a flower on the tree graph includes an avatar, after displaying a plurality of received matched current video frames according to a default configuration, the video call interface may be as shown in b in a.

Because the default configuration comprises a special effect display, the received matched current video frames are displayed through the default configuration, so that in the process of the multi-person instant video, the virtual image containing part of the video frames of the multi-person instant video participants can be displayed through the special effect, the display mode of the instant video is further increased, the individual requirements of users in the process of instant video interaction are met, and the interaction experience of the users in the process of the instant video interaction is improved.

707. A first user indication is obtained and after step 707, step 708 is performed.

Specifically, the first user instruction includes any one of a gesture, a preset key and a default event;

the gesture comprises a gesture track used for describing a user-defined display special effect or a gesture track respectively corresponding to a plurality of display special effects stored by the system;

the preset keys comprise preset keys corresponding to a plurality of display special effects stored in the system on the interface;

the default event includes that the user shakes the physical device where the video call device is located, the user blows air into the microphone and the like, the default event corresponds to a plurality of display special effects stored in the system respectively, exemplarily, the physical device where the user shakes the video call device may correspond to the display special effect of tree-shaped display, and the user blows air into the microphone may correspond to the display special effect of bubble display.

708. And displaying the received multiple matched current video frames according to the first user instruction, and ending.

Specifically, a display special effect corresponding to a first user instruction is obtained;

and displaying the received plurality of matched current video frames according to the display special effect corresponding to the first user indication.

The first user indication corresponds to the display special effect, and the received multiple matched current video frames are displayed through the first user indication, so that the virtual image containing the partial video frames of the multi-person instant video participants can be displayed through the special effect in the multi-person instant video process, and the virtual image containing the partial video frames of the multi-person instant video participants can be displayed according to the special effect indicated by the user through the first user indication, so that the display mode of the instant video is further increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

It should be noted that the process described in step 706 is a process for displaying the received multiple matched current video frames, and besides, the processes described in steps 707 to step 708 are also processes for displaying the received multiple matched current video frames, and besides the two manners, the processes for displaying the received multiple matched current video frames may also be implemented in other manners, and the specific manner is not limited in the embodiment of the present invention.

Optionally, the method further comprises:

Optionally, the method further includes:

the current video frame is sent to the plurality of video call devices.

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

the actions and/or expressions of at least one avatar are displayed.

It should be noted that the method described in the embodiment is implemented in a scene generated by a current video call device, and the process of displaying a plurality of received matched current video frames is also applicable to a scene generated by another video call device, where the matched video frames are generated by another video call device, and the process of generating the matched video frames by another video call device is described with reference to the second embodiment and the fourth embodiment.

To further illustrate the beneficial effects achieved by the method according to the embodiment of the present invention, assuming that the current video frame picture of the multi-person conversation participant is shown in a in fig. 5, after the method according to the embodiment of the present invention is executed, the default configuration display special effect is shown in a in fig. 8, and after the method according to the embodiment of the present invention is executed, the current video frame picture of any one of the users may be shown in a in fig. 10. A display special effect corresponding to the first user instruction is shown in fig. 10 b, where the first user instruction includes a gesture track for describing a user-defined display special effect, and after the method according to the embodiment of the present invention is executed, a current video frame picture of any one of the users may be shown in fig. 10 c.

Compared with the traditional multi-user instant video display method, the method increases the display mode of the multi-user instant video, meets the individual requirements of users in the multi-user instant video interaction process, simultaneously increases the interactivity of multi-user instant video participants in the interaction process, and improves the interaction experience. In addition, the current video frame is cut and/or reduced according to the position of the virtual image in the screen, and the cutting and/or reducing result is matched with the virtual image to obtain the matched current video frame, so that the combination of the video picture part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, in the instant video interaction process, the attention degree of the user to the face part is higher than that of other parts in the video, so that the face part in the current video frame is identified and obtained, the face part is matched with the virtual image, and the matched current video frame is obtained, so that the combination of the face part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the requirement of the user on the high attention degree of the face part in the instant video process is met, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, the default configuration comprises a display special effect, so that the received multiple matched current video frames are displayed through the default configuration, the virtual image containing part of the video frames of the multi-person instant video participants can be displayed through the special effect in the multi-person instant video process, the display mode of the instant video is further increased, the personalized requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, because the first user indication corresponds to the display special effect, the received multiple matched current video frames are displayed through the first user indication, so that the virtual image containing the partial video frames of the multi-person instant video participants can be displayed through the special effect in the multi-person instant video process, and the virtual image containing the partial video frames of the multi-person instant video participants can be displayed according to the special effect indicated by the user through the first user indication, so that the display mode of the instant video is further increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

Sixth embodiment is a video call method provided in an embodiment of the present invention, in the embodiment of the present invention, a plurality of received matched current video frames are displayed according to a second user instruction, and as shown in fig. 11, the method includes:

1101. and acquiring the virtual image.

1102. And acquiring the current video frame.

It is to be noted that step 1101 and step 1102 may be executed according to the described sequence, or step 1102 may be executed first and then step 1101 may be executed, or step 1101 may be executed simultaneously, and the specific execution sequence is not limited in the embodiment of the present invention.

1103. And matching the virtual image with the current video frame to obtain the matched current video frame.

The step is the same as step 303, and is not described herein again.

1104. And sending the matched current video frame to a plurality of video call devices.

1105. A plurality of matched current video frames from a plurality of video call devices is received.

1106. And displaying the received plurality of matched current video frames.

Acquiring a first user instruction;

The step is the same as step 306, and is not described herein again.

1107. And acquiring a second user instruction input by the user, wherein the second user instruction comprises a gesture, voice and a first key triggered by the user on the virtual image.

Specifically, the gesture triggered by the user on the avatar includes expanding or compressing the avatar, dragging the avatar, replacing the avatar, and other operations, and the embodiment of the present invention does not limit the specific avatar; the process of obtaining the second user indication of the user input may be:

a gesture triggered by the user on the avatar is identified.

The voice includes at least voice information for instructing to enlarge or compress the avatar and to replace the avatar, and for example, the voice information for instructing to enlarge the avatar may be voice information including at least "enlarge the avatar", the voice information for instructing to reduce the avatar may be voice information including at least "reduce the avatar", and the voice information for instructing to replace the avatar may be voice information including at least "replace the avatar"; the process of obtaining the second user indication of the user input may be:

voice information for instructing to enlarge or compress the avatar and to replace the avatar is recognized from the voice recognized by the user.

The first key may correspond to enlarging or compressing the avatar or replacing the avatar, and the process of obtaining the second user indication input by the user may be:

whether a user triggers the first key is detected.

1108. And displaying the received plurality of matched current video frames according to the second user instruction.

Specifically, the received multiple matched current video frames are displayed according to an operation indicated by a second user instruction, wherein the operation indicated by the second user instruction comprises expanding or compressing the avatar, dragging the avatar, replacing the avatar, and other operations.

The second user instruction describes the operation of the user on the virtual image, so that the received multiple matched current video frames are displayed according to the second user instruction, and the user can realize interaction with other multi-person video call participants by operating the virtual image in the multi-person video call process, so that the display mode of the instant video is further increased, the personalized requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

It should be noted that, after the second user indicates that the indicated operation is to replace the avatar, the current avatar is replaced with another avatar configured by default in the system or an avatar selection interface is displayed, and the current avatar is replaced with the avatar selected by the user on the avatar selection interface.

For example, assuming that the second user indication is a gesture, the video call interface is shown in b in fig. 5, the gesture is a gesture triggered by the user on the avatar to enlarge the avatar, and the gesture may be shown in a in fig. 12, then according to the second user indication, the video call interface after displaying the received multiple matched current video frames may be shown in b in fig. 12; assuming that the user indicates a gesture, which is a gesture of zooming out the avatar triggered by the user on the avatar, and the gesture may be as shown in fig. 12 c, according to the second user indication, the video call interface after displaying the received multiple matched current video frames may be as shown in fig. 12 d;

assuming that the second user indication is a gesture, the video call interface is shown in b in fig. 5, the gesture is a gesture triggered by the user on the avatar to drag the avatar, the gesture may be shown in a in fig. 13, the dragging track may be shown in a dashed line in a in fig. 13, the video call interface after displaying the received multiple matched current video frames may be shown in b in fig. 13, and the moving track of the avatar may be shown in a dashed line in b in fig. 13 according to the second user indication;

assuming that the second user indication is a gesture, the video call interface is shown in b in fig. 5, the gesture is a gesture triggered by the user to replace the avatar, and the gesture can be shown in a in fig. 14, the video call interface after displaying the received multiple matched current video frames according to the second user indication can be shown in b in fig. 14, wherein the replaced avatar is another avatar configured by default in the system.

Optionally, the method further includes:

the current video frame is sent to the plurality of video call devices.

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

the actions and/or expressions of at least one avatar are displayed.

Compared with the traditional multi-user instant video display method, the method increases the display mode of the multi-user instant video, meets the individual requirements of users in the multi-user instant video interaction process, simultaneously increases the interactivity of multi-user instant video participants in the interaction process, and improves the interaction experience. In addition, the current video frame is cut and/or reduced according to the position of the virtual image in the screen, and the cutting and/or reducing result is matched with the virtual image to obtain the matched current video frame, so that the combination of the video picture part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, in the instant video interaction process, the attention degree of the user to the face part is higher than that of other parts in the video, so that the face part in the current video frame is identified and obtained, the face part is matched with the virtual image, and the matched current video frame is obtained, so that the combination of the face part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the requirement of the user on the high attention degree of the face part in the instant video process is met, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, the second user instruction describes the operation of the user on the virtual image, so that the received multiple matched current video frames are displayed according to the second user instruction, and the user can realize interaction with other multi-person video call participants by operating the virtual image in the multi-person video call process, so that the display mode of the instant video is further increased, the personalized requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

An embodiment seventh is a video call method provided in an embodiment of the present invention, and as shown in fig. 15, the method includes:

1501. and acquiring the virtual image.

1502. And acquiring the current video frame.

1503. And matching the virtual image with the current video frame to obtain the matched current video frame.

The step is the same as step 303, and is not described herein again.

1504. And sending the matched current video frame to a plurality of video call devices.

1505. A plurality of matched current video frames from a plurality of video call devices is received.

1506. And displaying the received plurality of matched current video frames.

Acquiring a first user instruction;

Optionally, the method further includes:

1507. and if the user triggers a default event, sending the default event to the plurality of video call devices, wherein the default event comprises that the user closes the camera and/or exits the multi-person conversation.

1508. At least one default event is received from at least one of the plurality of video telephony devices.

Specifically, the default event includes the user turning off the camera and/or exiting the multi-person session.

1509. And displaying the received plurality of matched current video frames according to the special effect corresponding to the at least one default event.

The received matched current video frames are displayed according to the special effect corresponding to the at least one default event, so that the display effect is increased when at least one of the multi-person video call participants closes the camera or quits the multi-person video call, the display mode of the instant video is further increased, the personalized requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

For example, if the display special effect is a bubble display, the special effect corresponding to the default event may be that only an avatar is displayed in the bubble, and if the user exits the multi-person conversation, the bubble is broken or disappears; if the display special effect is tree display, the special effect corresponding to the default event can be that only the virtual image is displayed in a part of the tree structure, and if the user quits the multi-person conversation, the part of the tree structure is closed or disappears.

Assuming that the video call interface before receiving at least one default event is shown in a of fig. 16, a special effect corresponding to the default event may be bubble breaking or disappearing, assuming that the video call interface after receiving a second user-triggered default event may be shown in b of fig. 16;

assuming that the video call interface before receiving at least one default event is shown in a of fig. 17, a special effect corresponding to the default event may be a partial tree structure closing or disappearing, and assuming that the video call interface after receiving the default event triggered by the second user is shown in b of fig. 17.

Optionally, the method further comprises:

Optionally, the method further includes:

the current video frame is sent to the plurality of video call devices.

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

the actions and/or expressions of at least one avatar are displayed.

Compared with the traditional multi-user instant video display method, the method increases the display mode of the multi-user instant video, meets the individual requirements of users in the multi-user instant video interaction process, simultaneously increases the interactivity of multi-user instant video participants in the interaction process, and improves the interaction experience. In addition, the current video frame is cut and/or reduced according to the position of the virtual image in the screen, and the cutting and/or reducing result is matched with the virtual image to obtain the matched current video frame, so that the combination of the video picture part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, in the instant video interaction process, the attention degree of the user to the face part is higher than that of other parts in the video, so that the face part in the current video frame is identified and obtained, the face part is matched with the virtual image, and the matched current video frame is obtained, so that the combination of the face part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the requirement of the user on the high attention degree of the face part in the instant video process is met, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, the received multiple matched current video frames are displayed according to the special effect corresponding to the at least one default event, so that the display effect is increased when at least one of the multiple video call participants closes the camera or quits the multiple video call, the display mode of the instant video is further increased, the personalized requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

An eighth embodiment of the present invention is a video call method, shown in fig. 18, the method including:

1801. and acquiring the virtual image.

1802. And acquiring the current video frame.

1803. And matching the virtual image with the current video frame to obtain the matched current video frame.

The step is the same as step 303, and is not described herein again.

1804. And sending the matched current video frame to a plurality of video call devices.

1805. A plurality of matched current video frames from a plurality of video call devices is received.

1806. And displaying the received plurality of matched current video frames.

Acquiring a first user instruction;

The step is the same as step 306, and is not described herein again.

1807. And acquiring a third user instruction triggered by the user, wherein the third user instruction comprises that the user clicks the virtual image and/or triggers the second key.

Specifically, the third user instruction includes that the user clicks the avatar, and the process of obtaining the third user instruction triggered by the user may be:

acquiring a position parameter of a click point when a user clicks a video call interface;

and determining the virtual image indicated by the user according to the position parameter of the click point and the position parameter of at least one virtual image in the video call interface.

If the third user indication includes that the user triggers the second key, the process of obtaining the third user indication triggered by the user may be:

acquiring a third user indication triggered by the user on a second key on the video call interface; or,

and acquiring a third user indication triggered by the key pressing of the user on the video call device.

1808. The current video frame is sent to the plurality of video call devices.

Specifically, the embodiment of the present invention does not limit the specific transmission method.

The third user indication comprises the fact that the user clicks the virtual image and/or triggers the second key, the current video frames are sent to the video call devices after the third user indication triggered by the user is obtained, and therefore in the multi-user video call process, the current video frames can be displayed in a full screen mode by triggering the third user indication, the display mode of the instant video is further increased, the personalized requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

Assuming that the video call interface of any one of the plurality of video call devices is shown in fig. 16 a before the current video frame is transmitted to the plurality of video call devices, the video call interface of any one of the plurality of video call devices is shown in fig. 19 after the current video frame is transmitted to the plurality of video call devices.

Optionally, the method further comprises:

Optionally, the method further includes:

obtaining identifiers of actions and/or expressions of the avatar;

the actions and/or expressions of at least one avatar are displayed.

Compared with the traditional multi-user instant video display method, the method increases the display mode of the multi-user instant video, meets the individual requirements of users in the multi-user instant video interaction process, simultaneously increases the interactivity of multi-user instant video participants in the interaction process, and improves the interaction experience. In addition, the current video frame is cut and/or reduced according to the position of the virtual image in the screen, and the cutting and/or reducing result is matched with the virtual image to obtain the matched current video frame, so that the combination of the video picture part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, in the instant video interaction process, the attention degree of the user to the face part is higher than that of other parts in the video, so that the face part in the current video frame is identified and obtained, the face part is matched with the virtual image, and the matched current video frame is obtained, so that the combination of the face part and the preset image part of the user in the matched current video frame is more natural, the display effect of the matched current video frame is improved, the requirement of the user on the high attention degree of the face part in the instant video process is met, the display mode of the instant video is increased, the individual requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved. In addition, the third user indication comprises that the user clicks the virtual image and/or triggers the second key, so that the current video frame is sent to the plurality of video call devices after the third user indication triggered by the user is obtained, the current video frame can be displayed in a full screen mode by triggering the third user indication in the multi-user video call process, the display mode of the instant video is further increased, the personalized requirements of the user in the instant video interaction process are met, and the interaction experience of the user in the instant video interaction process is improved.

Ninth embodiment is a video call method provided in an embodiment of the present invention, and as shown in fig. 20, the method includes:

2001. an identifier of the action and/or expression of the avatar is obtained.

Specifically, the selection instruction input by the user and/or the identifier of the action and/or expression of the avatar corresponding to the system default configuration are obtained, and in addition, the selection instruction input by the user and/or the identifier of the action and/or expression of the avatar corresponding to the system default configuration may be obtained in other manners.

2002. An identifier of the action and/or expression of the avatar is transmitted to the plurality of video call devices.

Specifically, the identifiers of the actions and/or expressions of the avatar may be sent to the plurality of video call devices according to the network addresses of the participants of the multi-person conversation.

2003. An identifier of an action and/or expression of an avatar from at least one of the plurality of video call devices is received.

Specifically, the identifier of the action and/or expression of the avatar from at least one of the plurality of video call devices may be received according to the network address of the multi-person session participant, and the specific receiving manner is not limited in the embodiment of the present invention.

2004. And acquiring the action and/or expression of the corresponding at least one virtual character according to the received identifier of the action and/or expression of the at least one virtual character.

Specifically, according to the identifier of the action and/or expression of the avatar, the action and/or expression of the avatar corresponding to the identifier in the locally stored action and/or expression of the avatar is searched;

and if the action and/or expression of the virtual image corresponding to the identifier is not stored locally, downloading the action and/or expression of the virtual image corresponding to the identifier from the server.

2005. The actions and/or expressions of at least one avatar are displayed.

Specifically, the actions and/or expressions of at least one avatar are displayed according to a default configuration; or

Acquiring a first user instruction;

displaying an action and/or expression of the at least one avatar according to the first user indication.

The process is the same as the process from step 706 to step 708 except that the avatar and avatar have different actions and/or expressions, and will not be described herein again.

Optionally, the method further comprises:

and displaying the action and/or expression of the at least one avatar according to the second user indication.

The process is the same as the process described in steps 1107 to 1108 except that the avatar and the avatar have different actions and/or expressions, and will not be described again here.

Optionally, the method further includes:

and displaying the action and/or expression of the at least one virtual image according to the special effect corresponding to the at least one default event.

The process is the same as the process described in steps 1507 to 1509 except that the avatar and avatar have different motions and/or expressions, and will not be described again.

Optionally, the method further includes:

the current video frame is sent to the plurality of video call devices.

The process is the same as the process described in step 1807 to step 1808 except that the avatar and the avatar have different actions and/or expressions, and will not be described herein again.

For example, assuming that a video call interface is shown in a in fig. 21, after the method according to the embodiment of the present invention is performed, the video call interface may be shown in b in fig. 21.

The embodiment of the invention provides a video call method, which is characterized in that the action and/or expression of a virtual image are displayed in the multi-user instant video interaction process, compared with the traditional multi-user instant video display method, the display mode of a multi-user instant video is increased, the individual requirements of users in the multi-user instant video interaction process are met, meanwhile, the interactivity of multi-user instant video participants in the interaction process is increased, and the interaction experience is improved. In addition, under the scene that the user cannot perform instant video interaction, such as when the user closes the camera, the network environment is poor, the user is inconvenient to video and the like, interaction can be continuously realized by displaying the action and/or expression of the virtual image, so that the display mode of the multi-user instant video is further increased, the personalized requirements of the user in the multi-user instant video interaction process are met, meanwhile, the interactivity of the multi-user instant video participants in the interaction process is increased, and the interaction experience is improved.

Tenth embodiment is a video call apparatus 22 according to an embodiment of the present invention, and as shown in fig. 22, the apparatus includes:

an avatar acquisition module 2201 for acquiring an avatar;

a current video frame acquisition module 2202, configured to acquire a current video frame;

a matching module 2203, configured to match the avatar with the current video frame to obtain a matched current video frame;

a sending module 2204, configured to send the matched current video frame to multiple video call devices;

a receiving module 2205, configured to receive a plurality of matched current video frames from a plurality of video call devices; and

a display module 2206, configured to display the received plurality of matched current video frames.

Optionally, the matching module 2203 is specifically configured to:

recognizing and acquiring a face part in a current video frame, and matching the face part with the virtual image to obtain a matched current video frame; or

Optionally, the display module 2206 is specifically configured to:

Acquiring a first user instruction;

Alternatively to this, the first and second parts may,

the apparatus further comprises a user indication acquiring module 2207, configured to acquire a second user indication input by the user, where the second user indication comprises a gesture, voice triggered by the user on the avatar, and triggering of the first key;

the display module 2206 is further configured to display the received plurality of matched current video frames according to a second user indication.

Alternatively to this, the first and second parts may,

the sending module 2204 is further configured to send a default event to the plurality of video call devices when the user triggers the default event, where the default event includes the user closing the camera and/or exiting the multi-person session.

Alternatively to this, the first and second parts may,

the receiving module 2205 is further configured to receive at least one default event from at least one of the plurality of video call devices;

the display module 2206 is further configured to display the received plurality of matched current video frames according to the special effect corresponding to the at least one default event.

Alternatively to this, the first and second parts may,

the user indication module 2207 is further configured to obtain a third user indication triggered by the user, where the third user indication includes that the user clicks the avatar and/or triggers the second button;

the sending module 2204 is further configured to send the current video frame to the plurality of video call devices.

Alternatively to this, the first and second parts may,

the apparatus further comprises an identifier obtaining module 2208 of the action and/or expression of the avatar, for obtaining the identifier of the action and/or expression of the avatar;

the sending module 2204 is further configured to send the identifiers of the actions and/or expressions of the avatar to the plurality of video call devices;

the receiving module 2205 is further configured to receive an identifier of an action and/or expression of the avatar from at least one of the plurality of video call devices;

the apparatus further includes an avatar motion and/or expression obtaining module 2209, configured to obtain the motion and/or expression of the corresponding at least one avatar according to the received identifier of the motion and/or expression of the at least one avatar; and

the display module 2206 is also used for displaying the actions and/or expressions of at least one avatar.

The embodiment of the invention provides a video call device, which displays a matched current video frame obtained by matching a current video frame with a virtual image in a multi-user instant video interaction process.

Eleventh embodiment referring to fig. 23, a video call device 23 according to an embodiment of the present invention includes a camera 2301, a touch display screen 2302, a transmitting/receiving module 2303, a memory 2304, and a processor 2305 connected to the camera 2301, the touch display screen 2302, the transmitting/receiving module 2303, and the memory 2304, where the memory 2304 is used to store a set of program codes, and the processor 2305 calls the program codes stored in the memory 2304 to perform the following operations:

acquiring a virtual image;

acquiring a current video frame;

controlling the transmitting/receiving module 2303 to transmit the matched current video frame to the plurality of video call devices;

controlling the transmitting/receiving module 2303 to receive a plurality of matched current video frames from a plurality of video call devices; and

and controlling the touch display screen 2302 to display the received plurality of matched current video frames.

Optionally, the processor 2305 calls the program code stored in the memory 2304 to specifically perform the following operations:

controlling the touch display screen 2302 to display the received plurality of matched current video frames according to a default configuration; or

Acquiring a first user instruction;

controlling the touch display screen 2302 to display the received plurality of matched current video frames according to the first user indication.

Optionally, the processor 2305 calls the program code stored by the memory 2304 to perform the following operations:

and controlling the touch display screen 2302 to display the received plurality of matched current video frames according to the second user instruction.

if the user triggers a default event, the control transmitting/receiving module 2303 transmits the default event to the plurality of video call devices, where the default event includes the user closing the camera and/or exiting the multi-person session.

controlling the transmitting/receiving module 2303 to receive at least one default event from at least one of the plurality of video call devices;

and controlling the touch display screen 2302 to display the received plurality of matched current video frames according to the special effect corresponding to the at least one default event.

the control transmission/reception module 2303 transmits the current video frame to the plurality of video call devices.

obtaining identifiers of actions and/or expressions of the avatar;

controlling the transmitting/receiving module 2303 to transmit the identifiers of the motions and/or expressions of the avatar to the plurality of video call devices;

the control transmission/reception module 2303 receives an identifier of an action and/or expression of an avatar from at least one of the plurality of video call devices;

controls the touch screen display 2302 to display the actions and/or expressions of at least one avatar.

Twelfth embodiment is a video call device 24 according to an embodiment of the present invention, as shown in fig. 24, the device includes:

an identifier of avatar obtaining module 2401, configured to obtain an identifier of avatar;

a current video frame acquiring module 2402, configured to acquire a current video frame;

a transmitting module 2403, configured to transmit an identifier of an avatar and a current video frame to a plurality of video call devices;

a receiving module 2404, configured to receive current video frames and identifiers of corresponding avatars from multiple video call devices, respectively, and obtain the corresponding avatars;

an avatar acquisition module 2405, configured to acquire a corresponding avatar according to an identifier of the avatar;

a matching module 2406, configured to match the avatar with the received current video frame to obtain multiple matched current video frames; and

a display module 2407, configured to display the plurality of matched current video frames.

Optionally, the matching module 2406 is specifically configured to:

Optionally, the display module 2407 is specifically configured to:

displaying a plurality of matched current video frames according to a default configuration; or

Acquiring a first user instruction;

Alternatively to this, the first and second parts may,

the device further comprises a user indication acquisition module 2408, configured to acquire a second user indication input by the user, where the second user indication includes a gesture, a voice, and a first key triggered by the user on the avatar; and

the display module 2407 is further configured to display the plurality of matched current video frames according to a second user instruction.

Alternatively to this, the first and second parts may,

the sending module 2403 is further configured to send a default event to the multiple video call devices when the user triggers the default event, where the default event includes the user turning off the camera and/or exiting the multi-person session.

Alternatively to this, the first and second parts may,

the receiving module 2404 is further configured to receive at least one default event from at least one of the plurality of video telephony devices; and

the display module 2407 is further configured to display a plurality of matched current video frames according to a special effect corresponding to the at least one default event.

Alternatively to this, the first and second parts may,

the user instruction obtaining module 2408 is further configured to obtain a third user instruction triggered by the user, where the third user instruction includes that the user clicks the avatar and/or triggers the second key; and

the sending module 2403 is further configured to send the current video frame to the plurality of video call devices.

Alternatively to this, the first and second parts may,

the apparatus further comprises an identifier acquisition module 2409 of the action and/or expression of the avatar for acquiring the identifier of the action and/or expression of the avatar;

the sending module 2403 is further configured to send identifiers of actions and/or expressions of the avatar to the plurality of video call devices;

the receiving module 2404 is further configured to receive an identifier of an action and/or expression of an avatar from at least one of the plurality of video call devices;

the apparatus further includes an avatar motion and/or expression obtaining module 2410, configured to obtain a motion and/or expression of the corresponding at least one avatar according to the received identifier of the motion and/or expression of the at least one avatar; and

the display module 2407 is further configured to display the actions and/or expressions of at least one avatar.

Thirteenth embodiment is a video call device provided by an embodiment of the present invention, and as shown in fig. 25, the device includes a camera 2501, a touch display screen 2502, a sending/receiving module 2503, a memory 2504, and a processor 2505 connected to the camera 2501, the touch display screen 2502, the sending/receiving module 2503, and the memory 2504, where the memory 2504 is used to store a set of program codes, and the processor 2505 calls the program codes stored in the memory 2504 to perform the following operations:

acquiring an identifier of the virtual image;

acquiring a current video frame;

control the transmission/reception module 2503 to transmit the identifier of the avatar and the current video frame to the plurality of video call devices;

control the transmitting/receiving module 2503 to receive the current video frame and the identifier of the corresponding avatar respectively from the plurality of video call devices;

and controlling the touch display screen 2502 to display a plurality of matched current video frames.

Optionally, the processor 2505 calls the program code stored in the memory 2504 to specifically perform the following operations:

controlling the touch display screen 2502 to display a plurality of matched current video frames according to a default configuration; or

Acquiring a first user instruction;

the touch display screen 2502 is controlled to display a plurality of matched current video frames according to the first user instruction.

Optionally, the program code stored by the processor 2505 calling the memory 2504 is further configured to perform the following operations:

the touch display screen 2502 is controlled to display a plurality of matched current video frames according to a second user instruction.

if the user triggers a default event, the control transmit/receive module 2503 transmits the default event to the plurality of video call devices, where the default event includes the user turning off the camera and/or exiting the multi-person session.

control the transmit/receive module 2503 to receive at least one default event from at least one of the plurality of video call devices; and

the touch display screen 2502 is controlled to display a plurality of matched current video frames according to a special effect corresponding to at least one default event.

the transmission/reception module 2503 is controlled to transmit the current video frame to the plurality of video call devices.

obtaining identifiers of actions and/or expressions of the avatar;

control the transmission/reception module 2503 to transmit the identifiers of the motions and/or expressions of the avatar to the plurality of video call apparatuses;

control the transmission/reception module 2503 to receive an identifier of an action and/or expression of an avatar from at least one of the plurality of video call devices;

the touch display screen 2502 is controlled to display the motion and/or expression of at least one avatar.

An embodiment fourteen is a video call system provided in an embodiment of the present invention, where the video call system includes a plurality of video call devices described in the tenth embodiment or the eleventh embodiment; alternatively, the system includes a plurality of video call devices according to the twelfth or thirteenth embodiment.

The embodiment of the invention provides a video call system, which displays a matched current video frame obtained by matching a current video frame with a virtual image in a multi-user instant video interaction process.

The interface described in the embodiment of the present invention is only exemplary, and is used for further illustrating the beneficial effects achieved by the method described in the present invention, and is not specific, and the specific interface is not limited in the embodiment of the present invention. In addition, the terms "first", "second" and "third" are used herein only to distinguish the differences among the three, and are not specifically defined.

All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.

It should be noted that: in the video call device and the video call system provided in the above embodiments, when the video call method is executed, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structures of the device and the system are divided into different functional modules to complete all or part of the above described functions. In addition, the video call device and the system provided by the above embodiments belong to the same concept as the video call method embodiments, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for video telephony, the method comprising:

acquiring a virtual image;

acquiring a current video frame;

sending the matched current video frame to a plurality of video call devices;

and displaying the received plurality of matched current video frames.

2. The method of claim 1, wherein matching the avatar with the current video frame, resulting in a matched current video frame comprises:

3. The method of claim 1 or 2, wherein displaying the received plurality of matched current video frames comprises:

Acquiring a first user instruction;

4. A method according to any one of claims 1 to 3, characterized in that the method further comprises:

obtaining identifiers of actions and/or expressions of the avatar;

displaying the actions and/or expressions of the at least one avatar.

5. A method for video telephony, the method comprising:

acquiring an identifier of the virtual image;

acquiring a current video frame;

and displaying the plurality of matched current video frames.

6. The method of claim 5, wherein said matching the avatar with the received current video frame to obtain a plurality of matched current video frames comprises:

7. The method of claim 5 or 6, wherein displaying the plurality of matched current video frames comprises:

Acquiring a first user instruction;

8. The method according to any one of claims 5 to 7, further comprising:

obtaining identifiers of actions and/or expressions of the avatar;

displaying the actions and/or expressions of the at least one avatar.

9. A video call apparatus, the apparatus comprising:

the virtual image acquisition module is used for acquiring a virtual image;

10. A video call apparatus, the apparatus comprising:

and the display module is used for displaying the matched current video frames.