[go: up one dir, main page]

CN108320331B - A method and device for generating augmented reality video information of a user scene - Google Patents

A method and device for generating augmented reality video information of a user scene Download PDF

Info

Publication number
CN108320331B
CN108320331B CN201710032139.8A CN201710032139A CN108320331B CN 108320331 B CN108320331 B CN 108320331B CN 201710032139 A CN201710032139 A CN 201710032139A CN 108320331 B CN108320331 B CN 108320331B
Authority
CN
China
Prior art keywords
video
information
user
scene
video stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710032139.8A
Other languages
Chinese (zh)
Other versions
CN108320331A (en
Inventor
胡晨鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zongzhang Technology Group Co.,Ltd.
Original Assignee
Shanghai Zhangmen Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhangmen Science and Technology Co Ltd filed Critical Shanghai Zhangmen Science and Technology Co Ltd
Priority to CN201710032139.8A priority Critical patent/CN108320331B/en
Publication of CN108320331A publication Critical patent/CN108320331A/en
Application granted granted Critical
Publication of CN108320331B publication Critical patent/CN108320331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请的目的是提供一种生成用户场景的增强现实视频信息的方法与设备;本申请通过网络设备的图像匹配识别与用户设备的图像校准识别相结合的方法,突破了现有技术中由于移动设备的计算能力与存储容量受限,只能实现简单的人脸识别的局限,从而将可识别对象范围有效地扩大到用户场景中的任意场景对象,在本申请中,由于用户设备对应的任意场景对象都可以被识别和重新合成,因此本申请所生成的增强现实视频信息,相比于传统的视频应用、或是现有的增强现实的用户视频聊天应用,视觉突破将十分明显,用户所见的增强现实视频信息变化性将大大增强,从而提升了用户的交互趣味性,优化了用户的智能化视频体验。

Figure 201710032139

The purpose of this application is to provide a method and device for generating augmented reality video information of a user scene; this application combines the image matching recognition of network equipment with the image calibration recognition of user equipment, which breaks through the existing technology due to mobile The computing power and storage capacity of the device are limited, and only the limitations of simple face recognition can be realized, thereby effectively expanding the range of identifiable objects to any scene objects in the user scene. All scene objects can be identified and re-synthesized. Therefore, the augmented reality video information generated by this application will have obvious visual breakthroughs compared to traditional video applications or existing augmented reality user video chat applications. The variability of the augmented reality video information you see will be greatly enhanced, thereby enhancing the user's interactive fun and optimizing the user's intelligent video experience.

Figure 201710032139

Description

Method and equipment for generating augmented reality video information of user scene
Technical Field
The present application relates to the field of communications, and in particular, to a technique for generating augmented reality video information of a user scene.
Background
With the development of augmented reality technology, mobile application products for beautifying chat videos around face recognition technology appear, and the functions of the mobile application products are basically: after the mobile device identifies and models the human face in the video, virtual objects are added to the human head/face picture through the augmented reality technology so as to realize human face beautification. Because the computing power and the storage capacity of the mobile device are limited, the mobile device can only realize simple face recognition, the reality enhancement of the face recognition is monotonous in an interaction mode, the face is not beautiful, such as face deformation, and a few virtual ornaments are added to the head of a user, so that compared with the traditional video application, the existing augmented reality-based user video chat application has no obvious visual breakthrough, and the intelligent video experience of the user is not rich.
Disclosure of Invention
The application aims to provide a method and equipment for presenting a user scene video based on augmented reality.
According to an aspect of the present application, a method for generating augmented reality video information of a user scene at a user equipment end is provided, which includes:
sending a video key frame of a first video stream corresponding to a user scene to corresponding network equipment;
acquiring scene object related information corresponding to the video key frame, which is determined by the network equipment based on image matching identification;
based on the scene object related information, carrying out image calibration identification on a target frame of a second video stream acquired by user equipment;
and synthesizing the corresponding virtual object and the second video stream into augmented reality video information based on the image calibration recognition result.
According to another aspect of the present application, a method for generating augmented reality video information of a user scene at a network device is provided, which includes:
the method comprises the steps of obtaining a video key frame corresponding to a user scene of user equipment, wherein the video key frame is determined based on a first video stream corresponding to a scene object collected by the user equipment;
performing image matching identification on the video key frames to determine scene object related information corresponding to the video key frames;
and sending the scene object related information to the user equipment.
According to another aspect of the present application, there is also provided a user equipment for generating augmented reality video information of a user scene, including:
the video key frame sending device is used for sending the video key frame of the first video stream corresponding to the user scene to the corresponding network equipment;
scene object related information acquisition means for acquiring scene object related information corresponding to the video key frame, which is determined by the network device based on image matching identification;
the image calibration identification device is used for carrying out image calibration identification on a target frame of the second video stream acquired by the user equipment based on the scene object related information;
and the synthesizing device is used for synthesizing the corresponding virtual object and the second video stream into the augmented reality video information based on the image calibration identification result.
According to another aspect of the present application, there is also provided a network device for generating augmented reality video information of a user scene, including:
the video key frame acquisition device is used for acquiring a video key frame corresponding to a user scene of user equipment, wherein the video key frame is determined based on a first video stream corresponding to a scene object acquired by the user equipment;
the image matching identification device is used for carrying out image matching identification on the video key frames and determining the scene object related information corresponding to the video key frames;
and the scene object related information sending device is used for sending the scene object related information to the user equipment.
According to yet another aspect of the present application, there is also provided a system for generating augmented reality video information of a user scene, wherein the system comprises: according to another aspect of the present application, a user device for presenting a video of a user scene based on augmented reality is provided, and according to yet another aspect of the present application, a network device for presenting a video of a user scene based on augmented reality is provided.
Compared with the prior art, the method and the device have the advantages that the video key frames corresponding to the scene objects are sent to the corresponding network devices, the scene object related information which is determined by the user devices based on image matching identification and corresponds to the video key frames, such as attribute information, position information, surface information and the like of the scene objects, is obtained, then the user devices perform image calibration identification on each target frame in the second video stream acquired by the current user devices in real time in combination with the scene object related information obtained from the network devices, and the corresponding virtual objects and the second video stream are synthesized into the augmented reality video information based on the image calibration identification result. The method combines the image matching identification of the network equipment with the image calibration identification of the user equipment, breaks through the limitation that only simple face identification can be realized due to the limited computing capability and storage capacity of the mobile equipment in the prior art, and effectively expands the range of the identifiable object to any scene object in a user scene, wherein on one hand, the core information for identifying the scene object, such as attribute information, position information, surface information and the like of the scene object can be effectively determined by utilizing the stronger computing capability and storage capability of the network equipment compared with the user equipment to perform the image matching identification on a video key frame; on the other hand, the user equipment may further perform image calibration recognition aiming at offset correction on a video stream updated in real time in the user equipment, such as a target frame of a second video stream, based on a result of the image matching recognition of the network equipment, so that accurate recognition of a scene object in each frame image of the current user equipment can be realized; then, based on the result of the image calibration recognition, the corresponding virtual object is rendered as augmented reality video information by synthesizing with the second video stream, and can be presented to the user. In the application, because any scene object corresponding to the user equipment can be identified and resynthesized, the augmented reality video information presented in the application is compared with the traditional video application or the existing video chatting application of the augmented reality user, the visual breakthrough is very obvious, the variability of the augmented reality video information seen by the user is greatly enhanced, the interaction interestingness of the user is improved, and the intelligent video experience of the user is optimized.
Meanwhile, only a small number of video key frames or scene object related information corresponding to the video key frames need to be transmitted between the user equipment and the corresponding network equipment, so that the transmission data volume is small, the network delay is small, the burden on data communication is small, and the user experience is not influenced.
Further, in an implementation manner, the augmented reality video information may also be provided to one or more other user devices corresponding to the user device. Here, the user scene video presentation based on augmented reality according to the present application may be a scene video presentation of a single user, such as a single user video mode; each user can share own user scene video to other users during interaction of multiple users, for example, a video chat mode of multiple users. Under the multi-user interaction mode, based on the application, the interaction interestingness of each user can be improved, and the intelligent video experience of each interactive user is optimized.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 illustrates a system diagram for generating augmented reality video information of a user scene in accordance with an aspect of the subject application;
fig. 2 shows a flowchart of a method for generating augmented reality video information of a user scene at a user device side and a network device side according to another aspect of the present application.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present application is described in further detail below with reference to the attached figures.
In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
In one implementation of the present application, a user device for generating augmented reality video information of a user scene is provided; in an implementation manner of the application, a network device for generating augmented reality video information of a user scene is also provided; further, in an implementation manner of the present application, a system for generating augmented reality video information of a user scene is also provided, where the system includes the one or more user devices and the network device. The user device may include, but is not limited to, various mobile devices such as a smartphone, a tablet, a smart wearable device, and the like. In one implementation, the user equipment includes an acquisition module, such as a microphone, which may perform image video acquisition, such as a camera, or perform video acquisition. The network device may include, but is not limited to, a computer, a network host, a single network server, multiple network server sets, or a cloud server, wherein the cloud server is a virtual supercomputer operating in a distributed system and composed of a group of loosely coupled computer sets, which is used to realize a simple, efficient, secure, reliable, and processing power scalable computing service. In the present application, the user equipment may be referred to as user equipment 1, and the network equipment may be referred to as network equipment 2 (refer to fig. 1).
FIG. 1 illustrates a system diagram for generating augmented reality video information of a user scene in accordance with an aspect of the subject application. The system comprises a user equipment 1 and a network device 2. The user equipment 1 comprises a video key frame sending device 11, a scene object related information acquiring device 12, an image calibration identifying device 13 and a synthesizing device 14; the network device 2 comprises video key frame acquisition means 21, image matching recognition means 22 and scene object related information transmission means 23.
The video key frame sending device 11 may send the video key frame of the first video stream corresponding to the user scene to the corresponding network device 2; correspondingly, the video key frame obtaining device 21 may obtain a video key frame corresponding to a user scene of the user equipment 1; then, the image matching identification device 22 may perform image matching identification on the video key frame to determine the scene object related information corresponding to the video key frame; then, the scene object related information sending means 23 may send the scene object related information to the user equipment 1; correspondingly, the scene object related information obtaining device 12 may obtain the scene object related information corresponding to the video key frame, which is determined by the network device 2 based on the image matching identification; then, the image calibration recognition device 13 may perform image calibration recognition on the target frame of the second video stream acquired by the user equipment 1 based on the scene object related information; then, the synthesizing device 14 may synthesize the corresponding virtual object and the second video stream into the augmented reality video information based on the result of the image calibration recognition.
In the application, the generated augmented reality video information of the user scene may be applied to scene video presentation of a single user, such as a single user video mode, and also may be seen by each user sharing the augmented reality video information of the user scene of the user to other users when multiple users interact, for example, a multiple user video chat mode. In addition, any other mode that can be applied to the augmented reality video information of the user scene can be taken as the application scene of the present application and is included in the protection scope of the present application.
Specifically, the video key frame transmitting device 11 may transmit the video key frame of the first video stream corresponding to the user scene to the corresponding network device. Then, correspondingly, the video key frame acquiring means 21 may acquire a video key frame corresponding to the user scene of the user device 1.
In one implementation, the user equipment 1 further includes a capturing device (not shown) configured to capture a first video stream corresponding to a user scene. Here, the collecting device is used for collecting video information, namely the video stream, of a corresponding user during video recording or interaction with other users. In this application, the first video stream may be a video stream at any time. In one implementation, the capturing of the first video stream of the user scene may be performed by various types of cameras, or a combination of cameras, on the user device 1. Here, the first video stream corresponds to a plurality of consecutive frames, each frame corresponds to corresponding image information, and each object in the image information is a scene object in the user scene. In one implementation, the user equipment 1 may acquire, in real time, a first video stream corresponding to the scene object.
The user equipment 1 then further comprises video key frame determination means (not shown), where the video key frame determination means may determine video key frames from the first video stream. Here, the video key frame may be one or more frames in the first video stream, and the confirmation criteria of the video key frame may be customized based on different scene needs. In one implementation, when image information of a frame of the first video stream is changed greatly compared with image information of a previous frame, for example, a scene object is increased or decreased, and if the scene object moves obviously and reaches another preset image information change threshold, the frame is determined to be a video key frame; then, the video key frame sending device 11 may send the video key frame corresponding to the scene object to the corresponding network device 2, so as to perform image matching identification on the video key frame in the network device 2, where the image matching identification is used to effectively determine core information used for identifying the scene object, such as attribute information, position information, and surface information of the scene object. Furthermore, for a frame whose image information does not change much compared to the previous frame, it may be determined as a non-video key frame, and it is set that uploading is not required, and further, in actual operation, it may be selected for the non-video key frame to ignore the frame, or it may also be selected to perform image recognition on the user equipment 1 through image calibration recognition. In the application, only a small amount of video key frames need to be transmitted between the user equipment 1 and the corresponding network equipment 2, so that the transmission data volume is small, the network delay is small, the burden on data communication is small, the user experience cannot be influenced, and meanwhile, the defects that the user equipment 1 cannot perform a large amount of complex image recognition operation can be effectively overcome due to the strong computing capacity and storage capacity of the network equipment 2.
In one implementation, an information transmission channel may be established between the network device 2 and one or more user devices, and between multiple user devices that interact with each other through video, where the information transmission channel may include a signaling channel and a data channel, where the signaling channel is responsible for transmitting contents such as a control instruction with a small data volume, and the data channel is responsible for transmitting contents such as a video key frame, a video stream with a large data volume, and a virtual object set.
In one implementation, the user equipment 1 may acquire a video stream corresponding to the scene object in real time. Further, there may be video key frames in each video stream. For example, one or more key frames may be present in both the first video stream and the subsequent second video stream. Furthermore, in one implementation, the video key frame may be determined in real time, and the video key frame may be set to be sent to the corresponding network device 2. For example, the determination and uploading of video key frames in the first video stream may be performed as described above; in another example, the determination and uploading of the video key frame may also be performed on the subsequent second video stream.
Then, the image matching identification device 22 may perform image matching identification on the video key frame to determine the scene object related information corresponding to the video key frame; then, the scene object related information sending means 23 may send the scene object related information to the user equipment 1; in correspondence therewith, the scene object related information obtaining means 12 may obtain scene object related information corresponding to the video key frame, which is determined by the network device 2 based on image matching recognition. In one implementation, the image matching recognition may be performed on the video key frames through a scene object database preset or callable in the network device 2, or through a large number of trained image recognition models preset in the network device 2 and determined through machine learning, so as to recognize one or more scene objects of the video key frames, and match corresponding scene object related information for the one or more scene objects.
In one implementation, the scene object related information includes at least any one of: the method comprises the steps of firstly, attribute information of a scene object, secondly, position information of the scene object and thirdly, surface information of the scene object. For example, it is necessary to identify a table image in a video keyframe as a table object and identify the position coordinates of the table in the image, as well as the orientation of the table surface, e.g., the top surface orientation of the table, in order to subsequently place a virtual object on the table and provide interaction.
Specifically, in one implementation, the attribute information of the scene object may include what the scene object is, and here, fuzzy matching may be implemented: if the scene object is a building, furniture, plant, etc.; further, more accurate matching may also be achieved, such as the scene object being a tower, a table, a tree, etc. In one implementation, the position information of the scene object may include image position information of the scene object in the video key frame, and may include coordinate information, such as contour coordinate information of a tower, position coordinates of a table, and the like. In one implementation, the surface information of the scene object may include surface contour information of an object, where a surface contour of the scene object to be identified may be set, for example, an upper surface of a table needs to be identified for subsequently adding a virtual object on the table top, and thus, the identified surface information mainly includes the table upper surface information.
Here, those skilled in the art should understand that the attribute information of the scene object, the position information of the scene object, and the surface information of the scene object are only examples, and the information related to the scene object, which may be present or may appear in the future, as applicable to the present application, should also be included in the scope of the present application and included by reference.
Then, the image calibration recognition device 13 may perform image calibration recognition on the target frame of the second video stream acquired by the user equipment 1 based on the scene object related information. Here, the image calibration identification is a supplement to the image matching identification of the network device 2, and the image calibration identification is only image information identification performed on the video key frame, but for the user device 1, in the user video process, for example, in the user video process, or in the user video chat or other interaction processes, the collecting device, such as a camera, collects the video stream in real time, that is, collects continuous multiple frames in real time, and the picture information of each frame may have changes compared with the previous frame, such as the previous frame, which may be slight, and may also be identified without complex image matching operation, and at this time, the image calibration identification may be adopted in cooperation. Here, the image calibration recognition may be performed on the identified scene object related information corresponding to the video key frame, such as attribute information, position information, surface information, and the like of the scene object, on the basis of the image matching recognition, and the target frame of the second video stream, which is a new video stream currently acquired by the user equipment 1, is performed, the image calibration recognition aims to determine the scene object related information of the target frame, and particularly, to identify the slight change information of the position information, the surface information, and the like of the scene object therein, so that the second video stream may be rendered to have the augmented reality effect by performing the overlay synthesis of the virtual object on the basis of the scene object related information of the target frame determined by the recognition result. In one implementation, each frame in the second video stream may be set as the target frame, or one or more frames in the second video stream may also be set as the target frame.
Then, the synthesizing device 14 may synthesize the corresponding virtual object and the second video stream into the augmented reality video information based on the result of the image calibration recognition. In one implementation, one or more target frames in the second video that are subject to image alignment identification may be respectively composited with corresponding virtual objects. For example, image information of one target frame is superimposed with image information corresponding to a virtual object or a model, thereby synthesizing augmented reality image information corresponding to the image information of the target frame. The augmented reality video information corresponding to the second video stream may include one or more frames of augmented reality image information, for example, consecutive frames of the video stream are corresponding to the augmented reality image information. In one implementation, the image information of the target frame of the second video stream may be replaced with the augmented reality image information. In addition, in one implementation, the virtual object may be a set of virtual objects acquired from the network device 1 or other third-party devices, such as various virtual article images or models; in another implementation, the virtual object may also be extracted from the user equipment 1, for example, a picture in a picture application of the user equipment 1, such as a photo in a mobile phone album. In addition, in one implementation, the corresponding virtual object may be a single virtual object, or may be a combination of multiple virtual objects, for example, a virtual photo frame determined from a virtual object set is combined with a photo in a user's mobile phone album to form a photo frame photo.
Herein, the video key frame corresponding to the scene object is sent to the corresponding network device 2, and scene object related information, such as attribute information, position information, surface information, and the like of the scene object, which is determined by the user device 1 based on image matching identification and corresponds to the video key frame, is acquired, then, the user device 1 performs image calibration identification on each target frame in a second video stream acquired by the current user device 1 in real time in combination with the scene object related information acquired from the network device 2, and synthesizes a corresponding virtual object and the second video stream into augmented reality video information based on an image calibration identification result. Here, the method of combining image matching recognition of the network device 2 with image calibration recognition of the user device 1 breaks through the limitation that only simple face recognition can be realized due to the limited computing power and storage capacity of the mobile device in the prior art, so that the range of recognizable objects can be effectively expanded to any scene object in the user scene, wherein, on one hand, the core information for identifying the scene object, such as attribute information, position information, surface information and the like of the scene object can be effectively determined by utilizing the stronger computing power and storage power of the network device 2 compared with the user device 1 to perform image matching recognition on the video key frame; on the other hand, the user equipment 1 may further perform image calibration recognition aiming at deviation correction on a video stream updated in real time in the user equipment 1, such as a target frame of a second video stream, based on a result of image matching recognition of the network equipment 2, so that accurate recognition of a scene object in each frame of image of the current user equipment 1 can be realized; then, based on the result of the image calibration recognition, the corresponding virtual object is rendered as augmented reality video information by synthesizing with the second video stream, and can be presented to the user. In the application, because arbitrary scene object that user equipment 1 corresponds can all be discerned and resynthesized, therefore the augmented reality video information that this application presented compares in traditional video application or current augmented reality's user video chats application, and visual breakthrough will be very obvious, and the augmented reality video information variability that the user sees will strengthen greatly to user's interactive interest has been promoted, user's intelligent video experience has been optimized.
Meanwhile, only a small amount of video key frames or scene object related information corresponding to the video key frames need to be transmitted between the user equipment 1 and the corresponding network equipment 2, so that the transmission data volume is small, the network delay is small, the burden on data communication is small, and the user experience is not influenced.
In one implementation, the image calibration identification device 13 includes a first image calibration identification unit (not shown), a first determination unit (not shown). The first image calibration identification unit may perform image calibration identification on a first target frame of a second video stream acquired by the user equipment 1 based on the scene object related information; the first determination unit may determine scene object related information corresponding to the first target frame based on image calibration recognition performed on the first target frame.
In particular, in this implementation, a target frame in the second video stream, such as the first target frame, may perform the image alignment recognition with reference to scene object related information of a video key frame of the first video stream. First, comparing the image information of the first target frame with the image information of the video key frame to determine the difference between the two image information, such as comparing the outline of the scene object, comparing the position of the scene object, etc., and further, based on the scene object related information of the known video key frame, such as the attribute information, the position information, the surface information, etc. of the scene object, the data of each specific scene object related information corresponding to the first target frame is calculated, for example, the first target frame is compared with the video key frame, when the image position of one scene object table moves, the actual position coordinates of the table in the first target frame can be determined by combining the known position coordinates of the table in the video key frame based on the fact that the attribute information identified in the two frames calculated by comparison is the position offset of the object of the table. In one implementation, any target frame in the second video stream may be the first target frame, such that one or more first target frames may be identified based on scene object related information with reference to a video key frame of the first video stream.
Then, the synthesizing device 14 may synthesize the corresponding virtual object and the first target frame into first augmented reality image information based on the scene object related information corresponding to the first target frame; then, augmented reality video information is generated based on the first augmented reality image information. In an implementation manner, the image information included in the augmented reality video information may be all augmented reality image information similar to or identical to the first augmented reality image information, or may be common image information including a part of no augmented reality effect.
Further, in one implementation, the image calibration identification device 13 further includes a second image calibration identification unit (not shown), and a second determination unit (not shown). The second image calibration identification unit may perform image calibration identification on a second target frame of a second video stream acquired by the user equipment 1 based on scene object related information corresponding to the first target frame; next, the second determining unit may determine scene object related information corresponding to the second target frame based on image calibration recognition performed on the second target frame.
In particular, in this implementation, a target frame in the second video stream, such as the second target frame, may perform the image alignment recognition with reference to the scene object related information of the first target frame. In one implementation, the second target frame may be a frame in a second video stream that is sequentially subsequent to the first target frame. At this time, the appearance time of the first target frame is closer to the second target frame than the video key frame of the first video stream, so that it can be reasonably understood that the probability that the image information of the first target frame is more similar to the image information in the second target frame is relatively high.
Further, in an implementation manner, if the user equipment 1 acquires a new video key frame after the video key frame of the first video stream and the new video key frame appears in an order after the first target frame, the probability that the image information of the target frame of the new video key frame is higher in approximation degree than the image information in the second target frame is relatively higher than that of the first target frame, and at this time, the new video key frame may be preferentially used as a reference for identifying the image information of the second target frame.
Then, the synthesizing device 14 may synthesize the corresponding virtual object and the second target frame into second augmented reality image information based on the scene object related information corresponding to the second target frame; then, augmented reality video information is generated based on the first augmented reality image information and the second augmented reality image information. In an implementation manner, the image information included in the augmented reality video information may be all augmented reality image information similar to or the same as the first augmented reality image information or the second augmented reality image information, or may include part of common image information without augmented reality effect.
In one implementation, the user equipment 1 further comprises a presentation device (not shown); the presenting means may present the augmented reality video information corresponding to the second video stream.
Specifically, the user equipment 1 may play the augmented reality video information in real time on a corresponding device display screen, for example, in a process of taking a picture and recording by the user equipment 1 such as a mobile phone, the application is used to perform augmented reality effect processing on a video stream acquired in real time, and the corresponding augmented reality video information is presented on the mobile phone in real time; for another example, when the user performs a video chat with another user through the user equipment 1, for example, the mobile phone of the user may present a video picture with an augmented reality effect, and further, the mobile phone of another user interacting with the user may also view the augmented reality video information.
In one implementation, the user equipment 1 further includes a user interaction device (not shown), which may provide the augmented reality video information to one or more other user equipments corresponding to the user equipment 1. In the application, the user scene video presentation based on the augmented reality may be not only a scene video presentation of a single user, such as a single user video recording mode, but also a user scene video sharing mode in which each user shares its own user scene video with other users during interaction of multiple users, such as a multiple user video chat mode. In an implementation manner, the augmented reality video information, for example, an augmented reality video stream, may be sent to a corresponding network device by the user equipment 1, such as the network device 1, and then the network device 1 forwards the augmented reality video information to a corresponding other user equipment. In another implementation, the user equipment 1 and other user equipments may also directly interact with their respective augmented reality video information without the intermediary of the network equipment 1.
In one implementation, the user equipment 1 further includes a scene interaction device (not shown), and the scene interaction device can obtain operation instruction information of a user on the virtual object; and executing corresponding operation based on the operation instruction information. For example, a user may control a video scene or a virtual object in a video chat scene by touching or speaking the virtual object in the video chat scene, for example, a virtual pet may be placed on a table surface in a real environment, and a user who records the video or participates in the chat may control the virtual pet to perform a series of actions by touching, speaking, and the like. In one implementation, the interaction with the virtual object in the augmented reality video information may be performed by a user corresponding to the user equipment 1, and in another implementation, if the user interacts with another user, such as multi-user video chat, the interaction with the virtual object may also be implemented by the other user based on the interactive augmented reality video information.
Further, in one implementation, the scene interaction device includes at least any one of the following: a first scene interaction unit (not shown), which may acquire touch screen operation information of a user and determine operation instruction information of the user on a virtual object based on the touch screen operation information; for example, if the virtual object is a pet puppy, the user may instruct the puppy in the video to perform a corresponding reaction by clicking a preset region of the screen, such as a region where the puppy is located, and for example, the virtual puppy may operate a tail based on the clicked screen of the user. For another example, if the virtual object is a photo set in a mobile phone of the user, switching between photos may be performed through a sliding operation on the touch screen. A second scene interaction unit (not shown), which may obtain gesture information of the user through the user equipment camera device, and determine operation instruction information of the user on the virtual object based on the gesture information, for example, the user shoots a hand motion through a camera, extracts gesture information from a cluster, such as tapping, clicking, and the like, and then determines the operation instruction information based on a preset corresponding relationship between the gesture information and the operation instruction information. And a third scene interaction unit (not shown), which may acquire voice information of the user, and determine operation instruction information of the user on the virtual object based on the voice information, where the voice information of the user may be acquired through a microphone built in the user equipment 1, and the operation instruction information is determined based on a corresponding relationship between preset voice information and operation instruction information. Therefore, the interaction experience of the user can be further enriched through the interaction between the user and the virtual object in the augmented reality video information.
In one implementation, the user equipment 1 further includes a virtual object set obtaining device (not shown) and a target virtual object determining device (not shown), and the network device 2 further includes a virtual object set sending device (not shown). Specifically, the virtual object set sending means may send the virtual object set matching the scene object related information corresponding to the video key frame to the user equipment 1, and the virtual object set is obtained by the virtual object set obtaining means correspondingly. For example, the network device 2 may screen out a set of virtual objects matching the determined scene object in the video key frame based on the attribute information of the scene object in the video key frame, and if the scene object is a tree, screen out a set of virtual objects including various virtual small animals based on the determination required by the user scene. For another example, the filtering parameters such as the size of the virtual object may be set in combination with scene object related information such as position information and surface information of the scene object. Then, the target virtual object determining means may determine one or more target virtual objects from the set of virtual objects, and the synthesizing means 14 may synthesize the target virtual objects and the second video stream into augmented reality video information based on the result of the image alignment recognition. Here, this implementation can enrich the composite rendering effect of the augmented reality video information by matching the corresponding virtual object set for the user equipment 1, and at the same time, can optimize the intelligent experience of the user.
Fig. 2 shows a flowchart of a method for generating augmented reality video information of a user scene at a user device side and a network device side according to another aspect of the present application. The method includes step S301, step S302, step S303, step S304, step S401, step S402, and step S403.
In step S301, the user equipment 1 may send a video key frame of a first video stream corresponding to a user scene to a corresponding network device 2; correspondingly, in step S401, the network device 2 may obtain a video key frame corresponding to a user scene of the user device 1; next, in step S402, the network device 2 may perform image matching recognition on the video key frame to determine scene object related information corresponding to the video key frame; next, in step S403, the network device 2 may send the scene object related information to the user device 1; correspondingly, in step S302, the user equipment 1 may obtain scene object related information corresponding to the video key frame, which is determined by the network equipment 2 based on image matching identification; next, in step S303, the user equipment 1 may perform image calibration recognition on a target frame of the second video stream acquired by the user equipment 1 based on the scene object related information; next, in step S304, the user equipment 1 may synthesize the corresponding virtual object and the second video stream into augmented reality video information based on the result of the image calibration recognition.
In the application, the generated augmented reality video information of the user scene may be applied to scene video presentation of a single user, such as a single user video mode, and also may be seen by each user sharing the augmented reality video information of the user scene of the user to other users when multiple users interact, for example, a multiple user video chat mode. In addition, any other mode that can be applied to the augmented reality video information of the user scene can be taken as the application scene of the present application and is included in the protection scope of the present application.
Specifically, in step S301, the user equipment 1 may send a video key frame of a first video stream corresponding to a user scene to a corresponding network device. In step S401, the network device 2 may obtain a video key frame corresponding to the user scene of the user device 1.
In one implementation, the method further includes step S306 (not shown), and in step S306, the user equipment 1 may capture a first video stream corresponding to a user scene. Here, the collecting device is used for collecting video information, namely the video stream, of a corresponding user during video recording or interaction with other users. In this application, the first video stream may be a video stream at any time. In one implementation, the capturing of the first video stream of the user scene may be performed by various types of cameras, or a combination of cameras, on the user device 1. Here, the video stream corresponds to a plurality of consecutive frames, each frame corresponds to corresponding image information, and each object in the image information is a scene object in the user scene. In one implementation, the user equipment 1 may acquire, in real time, a first video stream corresponding to the scene object.
Next, the method further comprises step S307 (not shown), where in step S307 the user equipment 1 may determine video key frames from the first video stream. Here, the video key frame may be one or more frames in the first video stream, and the confirmation criteria of the video key frame may be customized based on different scene needs. In one implementation, when image information of a frame of the first video stream is changed greatly compared with image information of a previous frame, for example, a scene object is increased or decreased, and if the scene object moves obviously and reaches another preset image information change threshold, the frame is determined to be a video key frame; next, in step S301, the user equipment 1 may send the video key frame corresponding to the scene object to the corresponding network equipment 2, so as to perform image matching recognition on the video key frame in the network equipment 2, where the image matching recognition is used to effectively determine core information used for identifying the scene object, such as attribute information, position information, and surface information of the scene object. Furthermore, for a frame whose image information does not change much compared to the previous frame, it may be determined as a non-video key frame, and it is set that uploading is not required, and further, in actual operation, it may be selected for the non-video key frame to ignore the frame, or it may also be selected to perform image recognition on the user equipment 1 through image calibration recognition. In the application, only a small amount of video key frames need to be transmitted between the user equipment 1 and the corresponding network equipment 2, so that the transmission data volume is small, the network delay is small, the burden on data communication is small, the user experience cannot be influenced, and meanwhile, the defects that the user equipment 1 cannot perform a large amount of complex image recognition operation can be effectively overcome due to the strong computing capacity and storage capacity of the network equipment 2.
In one implementation, an information transmission channel may be established between the network device 2 and one or more user devices, and between multiple user devices that interact with each other through video, where the information transmission channel may include a signaling channel and a data channel, where the signaling channel is responsible for transmitting contents such as a control instruction with a small data volume, and the data channel is responsible for transmitting contents such as a video key frame, a video stream with a large data volume, and a virtual object set.
In one implementation, the user equipment 1 may acquire a video stream corresponding to the scene object in real time. Further, there may be video key frames in each video stream. For example, one or more key frames may be present in both the first video stream and the subsequent second video stream. Furthermore, in one implementation, the video key frame may be determined in real time, and the video key frame may be set to be sent to the corresponding network device 2. For example, the determination and uploading of video key frames in the first video stream may be performed as described above; in another example, the determination and uploading of the video key frame may also be performed on the subsequent second video stream.
Next, in step S402, the network device 2 may perform image matching recognition on the video key frame to determine scene object related information corresponding to the video key frame; next, in step S403, the network device 2 may send the scene object related information to the user device 1; correspondingly, in step S302, the user equipment 1 may acquire scene object related information corresponding to the video key frame, which is determined by the network equipment 2 based on image matching identification. In one implementation, the image matching recognition may be performed on the video key frames through a scene object database preset or callable in the network device 2, or through a large number of trained image recognition models preset in the network device 2 and determined through machine learning, so as to recognize one or more scene objects of the video key frames, and match corresponding scene object related information for the one or more scene objects.
In one implementation, the scene object related information includes at least any one of: the method comprises the steps of firstly, attribute information of a scene object, secondly, position information of the scene object and thirdly, surface information of the scene object. For example, it is necessary to identify a table image in a video keyframe as a table object and identify the position coordinates of the table in the image, as well as the orientation of the table surface, e.g., the top surface orientation of the table, in order to subsequently place a virtual object on the table and provide interaction.
Specifically, in one implementation, the attribute information of the scene object may include what the scene object is, and here, fuzzy matching may be implemented: if the scene object is a building, furniture, plant, etc.; further, more accurate matching may also be achieved, such as the scene object being a tower, a table, a tree, etc. In one implementation, the position information of the scene object may include image position information of the scene object in the video key frame, and may include coordinate information, such as contour coordinate information of a tower, position coordinates of a table, and the like. In one implementation, the surface information of the scene object may include surface contour information of an object, where a surface contour of the scene object to be identified may be set, for example, an upper surface of a table needs to be identified for subsequently adding a virtual object on the table top, and thus, the identified surface information mainly includes the table upper surface information.
Here, those skilled in the art should understand that the attribute information of the scene object, the position information of the scene object, and the surface information of the scene object are only examples, and the information related to the scene object, which may be present or may appear in the future, as applicable to the present application, should also be included in the scope of the present application and included by reference.
Next, in step S303, the user equipment 1 may perform image calibration recognition on the target frame of the second video stream acquired by the user equipment 1 based on the scene object related information. Here, the image calibration identification is a supplement to the image matching identification of the network device 2, and the image calibration identification is only image information identification performed on the video key frame, but for the user device 1, in the user video process, for example, in the user video process, or in the user video chat or other interaction processes, the collecting device, such as a camera, collects the video stream in real time, that is, collects continuous multiple frames in real time, and the picture information of each frame may have changes compared with the previous frame, such as the previous frame, which may be slight, and may also be identified without complex image matching operation, and at this time, the image calibration identification may be adopted in cooperation. Here, the image calibration recognition may be performed on the identified scene object related information corresponding to the video key frame, such as attribute information, position information, surface information, and the like of the scene object, on the basis of the image matching recognition, and the target frame of the second video stream, which is a new video stream currently acquired by the user equipment 1, is performed, the image calibration recognition aims to determine the scene object related information of the target frame, and particularly, to identify the slight change information of the position information, the surface information, and the like of the scene object therein, so that the second video stream may be rendered to have the augmented reality effect by performing the overlay synthesis of the virtual object on the basis of the scene object related information of the target frame determined by the recognition result. In one implementation, each frame in the second video stream may be set as the target frame, or one or more frames in the second video stream may also be set as the target frame.
Next, in step S304, the user equipment 1 may synthesize the corresponding virtual object and the second video stream into augmented reality video information based on the result of the image calibration recognition. In one implementation, one or more target frames in the second video that are subject to image alignment identification may be respectively composited with corresponding virtual objects. For example, image information of one target frame is superimposed with image information corresponding to a virtual object or a model, thereby synthesizing augmented reality image information corresponding to the image information of the target frame. The augmented reality video information corresponding to the second video stream may include one or more frames of augmented reality image information, for example, consecutive frames of the video stream are corresponding to the augmented reality image information. In one implementation, the image information of the target frame of the second video stream may be replaced with the augmented reality image information. In addition, in one implementation, the virtual object may be a set of virtual objects acquired from the network device 1 or other third-party devices, such as various virtual article images or models; in another implementation, the virtual object may also be extracted from the user equipment 1, for example, a picture in a picture application of the user equipment 1, such as a photo in a mobile phone album. In addition, in one implementation, the corresponding virtual object may be a single virtual object, or may be a combination of multiple virtual objects, for example, a virtual photo frame determined from a virtual object set is combined with a photo in a user's mobile phone album to form a photo frame photo.
Herein, the video key frame corresponding to the scene object is sent to the corresponding network device 2, and scene object related information, such as attribute information, position information, surface information, and the like of the scene object, which is determined by the user device 1 based on image matching identification and corresponds to the video key frame, is acquired, then, the user device 1 performs image calibration identification on each target frame in a second video stream acquired by the current user device 1 in real time in combination with the scene object related information acquired from the network device 2, and synthesizes a corresponding virtual object and the second video stream into augmented reality video information based on an image calibration identification result. Here, the method of combining image matching recognition of the network device 2 with image calibration recognition of the user device 1 breaks through the limitation that only simple face recognition can be realized due to the limited computing power and storage capacity of the mobile device in the prior art, so that the range of recognizable objects can be effectively expanded to any scene object in the user scene, wherein, on one hand, the core information for identifying the scene object, such as attribute information, position information, surface information and the like of the scene object can be effectively determined by utilizing the stronger computing power and storage power of the network device 2 compared with the user device 1 to perform image matching recognition on the video key frame; on the other hand, the user equipment 1 may further perform image calibration recognition aiming at deviation correction on a video stream updated in real time in the user equipment 1, such as a target frame of a second video stream, based on a result of image matching recognition of the network equipment 2, so that accurate recognition of a scene object in each frame of image of the current user equipment 1 can be realized; then, based on the result of the image calibration recognition, the corresponding virtual object is rendered as augmented reality video information by synthesizing with the second video stream, and can be presented to the user. In the application, because arbitrary scene object that user equipment 1 corresponds can all be discerned and resynthesized, therefore the augmented reality video information that this application presented compares in traditional video application or current augmented reality's user video chats application, and visual breakthrough will be very obvious, and the augmented reality video information variability that the user sees will strengthen greatly to user's interactive interest has been promoted, user's intelligent video experience has been optimized.
Meanwhile, only a small amount of video key frames or scene object related information corresponding to the video key frames need to be transmitted between the user equipment 1 and the corresponding network equipment 2, so that the transmission data volume is small, the network delay is small, the burden on data communication is small, and the user experience is not influenced.
In one implementation, the step S303 includes a step S3031 (not shown), and a step S3032 (not shown). In step S3031, the user equipment 1 may perform image calibration recognition on a first target frame of a second video stream acquired by the user equipment 1 based on the scene object related information; in step S3032, the user equipment 1 may determine scene object related information corresponding to the first target frame based on image calibration identification performed on the first target frame.
In particular, in this implementation, a target frame in the second video stream, such as the first target frame, may perform the image alignment recognition with reference to scene object related information of a video key frame of the first video stream. First, comparing the image information of the first target frame with the image information of the video key frame to determine the difference between the two image information, such as comparing the outline of the scene object, comparing the position of the scene object, etc., and further, based on the scene object related information of the known video key frame, such as the attribute information, the position information, the surface information, etc. of the scene object, the data of each specific scene object related information corresponding to the first target frame is calculated, for example, the first target frame is compared with the video key frame, when the image position of one scene object table moves, the actual position coordinates of the table in the first target frame can be determined by combining the known position coordinates of the table in the video key frame based on the fact that the attribute information identified in the two frames calculated by comparison is the position offset of the object of the table. In one implementation, any target frame in the second video stream may be the first target frame, such that one or more first target frames may be identified based on scene object related information with reference to a video key frame of the first video stream.
Next, in step S304, the user equipment 1 may synthesize a corresponding virtual object and the first target frame into first augmented reality image information based on the scene object related information corresponding to the first target frame; then, augmented reality video information is generated based on the first augmented reality image information. In an implementation manner, the image information included in the augmented reality video information may be all augmented reality image information similar to or identical to the first augmented reality image information, or may be common image information including a part of no augmented reality effect.
Further, in one implementation, the step S303 further includes a step S3033 (not shown), and a step S3034 (not shown). In step S3033, the user equipment 1 may perform image calibration identification on a second target frame of a second video stream acquired by the user equipment 1 based on the scene object related information corresponding to the first target frame; next, in step S3034, the user equipment 1 may determine scene object related information corresponding to the second target frame based on image calibration identification performed on the second target frame.
In particular, in this implementation, a target frame in the second video stream, such as the second target frame, may perform the image alignment recognition with reference to the scene object related information of the first target frame. In one implementation, the second target frame may be a frame in a second video stream that is sequentially subsequent to the first target frame. At this time, the appearance time of the first target frame is closer to the second target frame than the video key frame of the first video stream, so that it can be reasonably understood that the probability that the image information of the first target frame is more similar to the image information in the second target frame is relatively high.
Further, in an implementation manner, if the user equipment 1 acquires a new video key frame after the video key frame of the first video stream and the new video key frame appears in an order after the first target frame, the probability that the image information of the target frame of the new video key frame is higher in approximation degree than the image information in the second target frame is relatively higher than that of the first target frame, and at this time, the new video key frame may be preferentially used as a reference for identifying the image information of the second target frame.
Next, in step S304, the user equipment 1 may synthesize a corresponding virtual object and the second target frame into second augmented reality image information based on the scene object related information corresponding to the second target frame; then, augmented reality video information is generated based on the first augmented reality image information and the second augmented reality image information. In an implementation manner, the image information included in the augmented reality video information may be all augmented reality image information similar to or the same as the first augmented reality image information or the second augmented reality image information, or may include part of common image information without augmented reality effect.
In one implementation, the method further includes step S305 (not shown); in step S305, the user equipment 1 may present the augmented reality video information corresponding to the second video stream.
Specifically, the user equipment 1 may play the augmented reality video information in real time on a corresponding device display screen, for example, in a process of taking a picture and recording by the user equipment 1 such as a mobile phone, the application is used to perform augmented reality effect processing on a video stream acquired in real time, and the corresponding augmented reality video information is presented on the mobile phone in real time; for another example, when the user performs a video chat with another user through the user equipment 1, for example, the mobile phone of the user may present a video picture with an augmented reality effect, and further, the mobile phone of another user interacting with the user may also view the augmented reality video information.
In one implementation, the method further includes S308 (not shown), and in step S308, the user device 1 may provide the augmented reality video information to one or more other user devices corresponding to the user device 1. In the application, the user scene video presentation based on the augmented reality may be not only a scene video presentation of a single user, such as a single user video recording mode, but also a user scene video sharing mode in which each user shares its own user scene video with other users during interaction of multiple users, such as a multiple user video chat mode. In an implementation manner, the augmented reality video information, for example, an augmented reality video stream, may be sent to a corresponding network device by the user equipment 1, such as the network device 1, and then the network device 1 forwards the augmented reality video information to a corresponding other user equipment. In another implementation, the user equipment 1 and other user equipments may also directly interact with their respective augmented reality video information without the intermediary of the network equipment 1.
In one implementation, the method further includes S309 (not shown), in step S309, the user equipment 1 may acquire operation instruction information of the virtual object by the user; and executing corresponding operation based on the operation instruction information. For example, a user may control a video scene or a virtual object in a video chat scene by touching or speaking the virtual object in the video chat scene, for example, a virtual pet may be placed on a table surface in a real environment, and a user who records the video or participates in the chat may control the virtual pet to perform a series of actions by touching, speaking, and the like. In one implementation, the interaction with the virtual object in the augmented reality video information may be performed by a user corresponding to the user equipment 1, and in another implementation, if the user interacts with another user, such as multi-user video chat, the interaction with the virtual object may also be implemented by the other user based on the interactive augmented reality video information.
Further, in an implementation manner, the step S309 further includes at least any one of the steps S3091 (not shown), S3092 (not shown), and S3093 (not shown): in step S3091, the user equipment 1 may obtain touch screen operation information of a user, and determine operation instruction information of the user on the virtual object based on the touch screen operation information; for example, if the virtual object is a pet puppy, the user may instruct the puppy in the video to perform a corresponding reaction by clicking a preset region of the screen, such as a region where the puppy is located, and for example, the virtual puppy may operate a tail based on the clicked screen of the user. For another example, if the virtual object is a photo set in a mobile phone of the user, switching between photos may be performed through a sliding operation on the touch screen. In step S3092, the user equipment 1 may obtain gesture information of the user through the user equipment camera device, and determine operation instruction information of the user on the virtual object based on the gesture information, for example, the user takes a hand motion through a camera, extracts gesture information such as tapping, clicking and the like from a cluster, and then determines the operation instruction information based on a preset corresponding relationship between the gesture information and the operation instruction information. In step S3093, the user equipment 1 may acquire voice information of a user, and determine operation instruction information of the user on the virtual object based on the voice information, where the voice information of the user may be acquired through a microphone built in the user equipment 1, and the operation instruction information is determined based on a preset correspondence between the voice information and the operation instruction information. Therefore, the interaction experience of the user can be further enriched through the interaction between the user and the virtual object in the augmented reality video information.
In one implementation, the method further includes step S310 (not shown), step S311 (not shown), and step S404 (not shown).
Specifically, in step S404, the network device 2 may send the set of virtual objects matching the scene object related information corresponding to the video key frame to the user device 1, and correspondingly, in step S310, obtain the set of virtual objects by the user device 1. For example, the network device 2 may screen out a set of virtual objects matching the determined scene object in the video key frame based on the attribute information of the scene object in the video key frame, and if the scene object is a tree, screen out a set of virtual objects including various virtual small animals based on the determination required by the user scene. For another example, the filtering parameters such as the size of the virtual object may be set in combination with scene object related information such as position information and surface information of the scene object. Next, in step S311, one or more target virtual objects may be determined from the set of virtual objects by the user equipment 1, so that, in step S304, the target virtual objects and the second video stream may be synthesized into augmented reality video information by the user equipment 1 based on the result of the image calibration recognition. Here, this implementation can enrich the composite rendering effect of the augmented reality video information by matching the corresponding virtual object set for the user equipment 1, and at the same time, can optimize the intelligent experience of the user.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (23)

1.一种在用户设备端生成用户场景的增强现实视频信息的方法,其中,所述方法包括:1. A method for generating augmented reality video information of a user scene at a user equipment side, wherein the method comprises: 将用户场景对应的第一视频流的视频关键帧发送至对应的网络设备;其中,所述视频关键帧为所述第一视频流中的一帧或多帧,且所述视频关键帧基于所述用户场景而设定;Send the video key frame of the first video stream corresponding to the user scene to the corresponding network device; wherein, the video key frame is one or more frames in the first video stream, and the video key frame is based on the set according to the user scenario; 获取网络设备基于图像匹配识别确定的、与所述视频关键帧对应的场景对象相关信息,其中,所述场景对象相关信息包括场景对象的属性信息、位置信息及表面信息;acquiring the scene object related information determined by the network device based on the image matching identification and corresponding to the video key frame, wherein the scene object related information includes attribute information, position information and surface information of the scene object; 基于所述场景对象相关信息,对用户设备采集到的第二视频流的目标帧进行图像校准识别;Perform image calibration and identification on the target frame of the second video stream collected by the user equipment based on the scene object related information; 基于所述图像校准识别的结果,将相应的虚拟对象与所述第二视频流合成为增强现实视频信息;synthesizing the corresponding virtual object and the second video stream into augmented reality video information based on the result of the image calibration recognition; 其中,所述第一视频流和所述第二视频流均为所述用户场景对应用户进行录像、或是与其他用户进行交互过程中的视频流,所述视频流对应多个连续的帧,所述第二视频流为所述第一视频流后续的视频流;Wherein, the first video stream and the second video stream are both video streams recorded by the user corresponding to the user scene, or in the process of interacting with other users, and the video stream corresponds to a plurality of consecutive frames, the second video stream is a subsequent video stream of the first video stream; 所述网络设备与一个或多个用户设备之间建立信息传输通道,所述信息传输通道包括信令通道与数据通道,所述信令通道用于传输控制指令,所述数据通道用于传输虚拟对象集中的一个或多个虚拟对象、视频关键帧及视频流。An information transmission channel is established between the network device and one or more user equipment, the information transmission channel includes a signaling channel and a data channel, the signaling channel is used to transmit control instructions, and the data channel is used to transmit virtual One or more virtual objects, video keyframes, and video streams in an object set. 2.根据权利要求1所述的方法,其中,所述方法还包括:2. The method of claim 1, wherein the method further comprises: 采集用户场景对应的第一视频流;collecting the first video stream corresponding to the user scene; 从所述第一视频流中确定视频关键帧;determining video keyframes from the first video stream; 其中,所述将用户场景对应的第一视频流的视频关键帧发送至对应的网络设备包括:Wherein, the sending the video key frame of the first video stream corresponding to the user scene to the corresponding network device includes: 将所述视频关键帧发送至对应的网络设备。Send the video key frame to the corresponding network device. 3.根据权利要求1或2所述的方法,其中,所述基于所述场景对象相关信息,对用户设备采集到的第二视频流的目标帧进行图像校准识别包括:3. The method according to claim 1 or 2, wherein, based on the scene object related information, performing image calibration and identification on the target frame of the second video stream collected by the user equipment comprises: 基于所述场景对象相关信息,对用户设备采集到的第二视频流的第一目标帧进行图像校准识别;Perform image calibration and identification on the first target frame of the second video stream collected by the user equipment based on the scene object related information; 基于对所述第一目标帧进行的图像校准识别,确定所述第一目标帧对应的场景对象相关信息;determining the scene object-related information corresponding to the first target frame based on the image calibration identification performed on the first target frame; 其中,所述基于所述图像校准识别的结果,将相应的虚拟对象与所述第二视频流合成为增强现实视频信息包括:Wherein, synthesizing the corresponding virtual object and the second video stream into augmented reality video information based on the result of the image calibration and recognition includes: 基于所述第一目标帧对应的场景对象相关信息,将相应的虚拟对象与所述第一目标帧合成为第一增强现实图像信息;Based on the scene object related information corresponding to the first target frame, combining the corresponding virtual object and the first target frame into first augmented reality image information; 基于所述第一增强现实图像信息生成增强现实视频信息。Augmented reality video information is generated based on the first augmented reality image information. 4.根据权利要求3所述的方法,其中,所述基于所述场景对象相关信息,对用户设备采集到的第二视频流的目标帧进行图像校准识别还包括:4. The method according to claim 3, wherein the performing image calibration and identification on the target frame of the second video stream collected by the user equipment based on the scene object related information further comprises: 基于所述第一目标帧对应的场景对象相关信息,对用户设备采集到的第二视频流的第二目标帧进行图像校准识别;Perform image calibration and identification on the second target frame of the second video stream collected by the user equipment based on the scene object-related information corresponding to the first target frame; 基于对所述第二目标帧进行的图像校准识别,确定所述第二目标帧对应的场景对象相关信息;Determine the scene object-related information corresponding to the second target frame based on the image calibration recognition performed on the second target frame; 其中,所述基于所述图像校准识别的结果,将相应的虚拟对象与所述第二视频流合成为增强现实视频信息还包括:Wherein, synthesizing the corresponding virtual object and the second video stream into augmented reality video information based on the result of the image calibration recognition further includes: 基于所述第二目标帧对应的场景对象相关信息,将相应的虚拟对象与所述第二目标帧合成为第二增强现实图像信息;Based on the scene object-related information corresponding to the second target frame, combining the corresponding virtual object and the second target frame into second augmented reality image information; 基于所述第一增强现实图像信息和所述第二增强现实图像信息生成增强现实视频信息。Augmented reality video information is generated based on the first augmented reality image information and the second augmented reality image information. 5.根据权利要求1至4中任一项所述的方法,其中,所述方法还包括:5. The method of any one of claims 1 to 4, wherein the method further comprises: 呈现第二视频流对应的所述增强现实视频信息。The augmented reality video information corresponding to the second video stream is presented. 6.根据权利要求1至5中任一项所述的方法,其中,所述方法还包括:6. The method of any one of claims 1 to 5, wherein the method further comprises: 将所述增强现实视频信息提供至与所述用户设备对应的一个或多个其他用户设备。The augmented reality video information is provided to one or more other user equipment corresponding to the user equipment. 7.根据权利要求1至6中任一项所述的方法,其中,所述方法还包括:7. The method of any one of claims 1 to 6, wherein the method further comprises: 获取用户对虚拟对象的操作指令信息,并基于所述操作指令信息,执行相应操作。Obtain operation instruction information of the user on the virtual object, and perform corresponding operations based on the operation instruction information. 8.根据权利要求7所述的方法,其中,所述获取用户对虚拟对象的操作指令信息,并基于所述操作指令信息,执行相应操作包括以下至少任一项:8. The method according to claim 7, wherein the acquiring the user's operation instruction information on the virtual object, and based on the operation instruction information, performing the corresponding operation comprises at least any one of the following: 获取用户的触屏操作信息,基于所述触屏操作信息确定用户对虚拟对象的操作指令信息;Acquiring the user's touch screen operation information, and determining the user's operation instruction information on the virtual object based on the touch screen operation information; 通过用户设备摄像装置获取用户的手势信息,基于所述手势信息确定用户对虚拟对象的操作指令信息;Obtain the user's gesture information through the user equipment camera, and determine the user's operation instruction information on the virtual object based on the gesture information; 获取用户的语音信息,并基于所述语音信息确定用户对虚拟对象的操作指令信息。Acquire the user's voice information, and determine the user's operation instruction information on the virtual object based on the voice information. 9.根据权利要求1所述的方法,其中,所述方法还包括:9. The method of claim 1, wherein the method further comprises: 获取与所述视频关键帧对应的场景对象相关信息相匹配的虚拟对象集;acquiring a virtual object set that matches the scene object-related information corresponding to the video key frame; 从所述虚拟对象集中确定目标虚拟对象;determining a target virtual object from the set of virtual objects; 其中,所述基于所述图像校准识别的结果,将相应的虚拟对象与所述第二视频流合成为增强现实视频信息包括:Wherein, synthesizing the corresponding virtual object and the second video stream into augmented reality video information based on the result of the image calibration and recognition includes: 基于所述图像校准识别的结果,将所述目标虚拟对象与所述第二视频流合成为增强现实视频信息。Based on the result of the image calibration recognition, the target virtual object and the second video stream are synthesized into augmented reality video information. 10.一种在网络设备端生成用户场景的增强现实视频信息的方法,其中,所述方法包括:10. A method for generating augmented reality video information of a user scene on a network device side, wherein the method comprises: 获取用户设备的用户场景对应的视频关键帧,其中,所述视频关键帧是基于用户设备采集的用户场景对应的第一视频流确定的,所述视频关键帧为所述第一视频流中的一帧或多帧,且所述视频关键帧基于所述用户场景而设定;Acquire a video key frame corresponding to the user scene of the user equipment, wherein the video key frame is determined based on the first video stream corresponding to the user scene collected by the user equipment, and the video key frame is the first video stream in the first video stream. one or more frames, and the video keyframe is set based on the user scene; 对所述视频关键帧进行图像匹配识别,用以确定与所述视频关键帧对应的场景对象相关信息,其中,所述场景对象相关信息包括场景对象的属性信息、位置信息及表面信息;Perform image matching and identification on the video key frame to determine scene object related information corresponding to the video key frame, wherein the scene object related information includes attribute information, position information and surface information of the scene object; 将所述场景对象相关信息发送至所述用户设备,以使所述用户设备基于所述场景对象相关信息,对所述用户设备采集到的第二视频流的目标帧进行图像校准识别后,基于所述图像校准识别的结果将相应的虚拟对象与所述第二视频流合成为增强现实视频信息;Send the scene object related information to the user equipment, so that the user equipment performs image calibration and recognition on the target frame of the second video stream collected by the user equipment based on the scene object related information, based on The result of the image calibration recognition synthesizes the corresponding virtual object and the second video stream into augmented reality video information; 其中,所述第一视频流和所述第二视频流均为所述用户场景对应用户进行录像、或是与其他用户进行交互过程中的视频流,所述视频流对应多个连续的帧,所述第二视频流为所述第一视频流后续的视频流;Wherein, the first video stream and the second video stream are both video streams recorded by the user corresponding to the user scene, or in the process of interacting with other users, and the video stream corresponds to a plurality of consecutive frames, the second video stream is a subsequent video stream of the first video stream; 所述网络设备与一个或多个用户设备之间建立信息传输通道,所述信息传输通道包括信令通道与数据通道,所述信令通道用于传输控制指令,所述数据通道用于传输虚拟对象集中的一个或多个虚拟对象、视频关键帧及视频流。An information transmission channel is established between the network device and one or more user equipment, the information transmission channel includes a signaling channel and a data channel, the signaling channel is used to transmit control instructions, and the data channel is used to transmit virtual One or more virtual objects, video keyframes, and video streams in an object set. 11.根据权利要求10所述的方法,其中,所述方法还包括:11. The method of claim 10, wherein the method further comprises: 将与所述视频关键帧对应的场景对象相关信息相匹配的虚拟对象集发送至所述用户设备。Send a virtual object set matching the scene object related information corresponding to the video key frame to the user equipment. 12.一种生成用户场景的增强现实视频信息的用户设备,其中,所述设备包括:12. A user equipment for generating augmented reality video information of a user scene, wherein the equipment comprises: 视频关键帧发送装置,用于将用户场景对应的第一视频流的视频关键帧发送至对应的网络设备;其中,所述视频关键帧为所述第一视频流中的一帧或多帧,且所述视频关键帧基于所述用户场景而设定;A video key frame sending device, configured to send the video key frame of the first video stream corresponding to the user scene to the corresponding network device; wherein, the video key frame is one or more frames in the first video stream, and the video key frame is set based on the user scene; 场景对象相关信息获取装置,用于获取网络设备基于图像匹配识别确定的、与所述视频关键帧对应的场景对象相关信息,其中,所述场景对象相关信息包括场景对象的属性信息、位置信息及表面信息;A scene object related information acquisition device is used to acquire the scene object related information determined by the network device based on the image matching identification and corresponding to the video key frame, wherein the scene object related information includes attribute information, location information and surface information; 图像校准识别装置,用于基于所述场景对象相关信息,对用户设备采集到的第二视频流的目标帧进行图像校准识别;an image calibration and identification device, configured to perform image calibration and identification on the target frame of the second video stream collected by the user equipment based on the scene object-related information; 合成装置,用于基于所述图像校准识别的结果,将相应的虚拟对象与所述第二视频流合成为增强现实视频信息;a synthesizing device for synthesizing the corresponding virtual object and the second video stream into augmented reality video information based on the result of the image calibration and recognition; 其中,所述第一视频流和所述第二视频流均为所述用户场景对应用户进行录像、或是与其他用户进行交互过程中的视频流,所述视频流对应多个连续的帧,所述第二视频流为所述第一视频流后续的视频流;Wherein, the first video stream and the second video stream are both video streams recorded by the user corresponding to the user scene, or in the process of interacting with other users, and the video stream corresponds to a plurality of consecutive frames, the second video stream is a subsequent video stream of the first video stream; 所述网络设备与一个或多个用户设备之间建立信息传输通道,所述信息传输通道包括信令通道与数据通道,所述信令通道用于传输控制指令,所述数据通道用于传输虚拟对象集中的一个或多个虚拟对象、视频关键帧及视频流。An information transmission channel is established between the network device and one or more user equipment, the information transmission channel includes a signaling channel and a data channel, the signaling channel is used to transmit control instructions, and the data channel is used to transmit virtual One or more virtual objects, video keyframes, and video streams in an object set. 13.根据权利要求12所述的设备,其中,所述设备还包括:13. The apparatus of claim 12, wherein the apparatus further comprises: 采集装置,用于采集用户场景对应的第一视频流;a collection device, configured to collect the first video stream corresponding to the user scene; 视频关键帧确定装置,用于从所述第一视频流中确定视频关键帧;a video key frame determining device for determining a video key frame from the first video stream; 其中,所述视频关键帧发送装置用于:Wherein, the video key frame sending device is used for: 将场景对象对应的所述视频关键帧发送至对应的网络设备。Send the video key frame corresponding to the scene object to the corresponding network device. 14.根据权利要求12或13所述的设备,其中,所述图像校准识别装置包括:14. The apparatus according to claim 12 or 13, wherein the image calibration identification means comprises: 第一图像校准识别单元,用于基于所述场景对象相关信息,对用户设备采集到的第二视频流的第一目标帧进行图像校准识别;a first image calibration and identification unit, configured to perform image calibration and identification on the first target frame of the second video stream collected by the user equipment based on the scene object related information; 第一确定单元,用于基于对所述第一目标帧进行的图像校准识别,确定所述第一目标帧对应的场景对象相关信息;a first determining unit, configured to determine the scene object-related information corresponding to the first target frame based on the image calibration recognition performed on the first target frame; 其中,所述合成装置用于:Wherein, the synthesis device is used for: 基于所述第一目标帧对应的场景对象相关信息,将相应的虚拟对象与所述第一目标帧合成为第一增强现实图像信息;Based on the scene object related information corresponding to the first target frame, combining the corresponding virtual object and the first target frame into first augmented reality image information; 基于所述第一增强现实图像信息生成增强现实视频信息。Augmented reality video information is generated based on the first augmented reality image information. 15.根据权利要求14所述的设备,其中,所述图像校准识别装置还包括:15. The apparatus of claim 14, wherein the image calibration identification means further comprises: 第二图像校准识别单元,用于基于所述第一目标帧对应的场景对象相关信息,对用户设备采集到的第二视频流的第二目标帧进行图像校准识别;a second image calibration and identification unit, configured to perform image calibration and identification on the second target frame of the second video stream collected by the user equipment based on the scene object-related information corresponding to the first target frame; 第二确定单元,用于基于对所述第二目标帧进行的图像校准识别,确定所述第二目标帧对应的场景对象相关信息;a second determining unit, configured to determine the scene object-related information corresponding to the second target frame based on the image calibration recognition performed on the second target frame; 其中,所述合成装置还用于:Wherein, the synthetic device is also used for: 基于所述第二目标帧对应的场景对象相关信息,将相应的的虚拟对象与所述第二目标帧合成为第二增强现实图像信息;Based on the scene object related information corresponding to the second target frame, combining the corresponding virtual object and the second target frame into second augmented reality image information; 基于所述第一增强现实图像信息和所述第二增强现实图像信息生成增强现实视频信息。Augmented reality video information is generated based on the first augmented reality image information and the second augmented reality image information. 16.根据权利要求12至15中任一项所述的设备,其中,所述设备还包括:16. The apparatus of any one of claims 12 to 15, wherein the apparatus further comprises: 呈现装置,用于呈现第二视频流对应的所述增强现实视频信息。A presentation device, configured to present the augmented reality video information corresponding to the second video stream. 17.根据权利要求12至16中任一项所述的设备,其中,所述设备还包括:17. The apparatus of any one of claims 12 to 16, wherein the apparatus further comprises: 用户交互装置,用于将所述增强现实视频信息提供至与所述用户设备对应的一个或多个其他用户设备。User interaction means for providing the augmented reality video information to one or more other user equipments corresponding to the user equipment. 18.根据权利要求12至17中任一项所述的设备,其中,所述设备还包括:18. The apparatus of any one of claims 12 to 17, wherein the apparatus further comprises: 场景互动装置,用于获取用户对虚拟对象的操作指令信息;并基于所述操作指令信息,执行相应操作。The scene interaction device is used to obtain the user's operation instruction information on the virtual object; and based on the operation instruction information, perform corresponding operations. 19.根据权利要求18所述的设备,其中,所述场景互动装置包括以下至少任一项:19. The apparatus according to claim 18, wherein the scene interaction means comprises at least any one of the following: 第一场景互动单元,用于获取用户的触屏操作信息,并基于所述触屏操作信息确定用户对虚拟对象的操作指令信息;a first scene interaction unit, configured to acquire the user's touch screen operation information, and determine the user's operation instruction information on the virtual object based on the touch screen operation information; 第二场景互动单元,用于通过用户设备摄像装置获取用户的手势信息,并基于所述手势信息确定用户对虚拟对象的操作指令信息;a second scene interaction unit, configured to obtain the user's gesture information through the user equipment camera, and determine the user's operation instruction information on the virtual object based on the gesture information; 第三场景互动单元,用于获取用户的语音信息,并基于所述语音信息确定用户对虚拟对象的操作指令信息。The third scene interaction unit is configured to acquire the user's voice information, and determine the user's operation instruction information on the virtual object based on the voice information. 20.根据权利要求12所述的设备,其中,所述设备还包括:20. The apparatus of claim 12, wherein the apparatus further comprises: 虚拟对象集获取装置,用于获取与所述视频关键帧对应的场景对象相关信息相匹配的虚拟对象集;a virtual object set acquiring device, configured to acquire a virtual object set matching the scene object-related information corresponding to the video key frame; 目标虚拟对象确定装置,用于从所述虚拟对象集中确定目标虚拟对象;a target virtual object determination device for determining a target virtual object from the virtual object set; 其中,所述合成装置用于:Wherein, the synthesis device is used for: 基于所述图像校准识别的结果,将所述目标虚拟对象与所述第二视频流合成为增强现实视频信息。Based on the result of the image calibration recognition, the target virtual object and the second video stream are synthesized into augmented reality video information. 21.一种生成用户场景的增强现实视频信息的网络设备,其中,所述设备包括:21. A network device for generating augmented reality video information of a user scene, wherein the device comprises: 视频关键帧获取装置,用于获取用户设备的用户场景对应的视频关键帧,其中,所述视频关键帧是基于用户设备采集的用户场景对应的第一视频流确定的;其中,所述视频关键帧为所述第一视频流中的一帧或多帧,且所述视频关键帧基于所述用户场景而设定;A video key frame obtaining device, configured to obtain a video key frame corresponding to a user scene of the user equipment, wherein the video key frame is determined based on the first video stream corresponding to the user scene collected by the user equipment; wherein, the video key frame The frame is one or more frames in the first video stream, and the video key frame is set based on the user scene; 图像匹配识别装置,用于对所述视频关键帧进行图像匹配识别,用以确定与所述视频关键帧对应的场景对象相关信息,其中,所述场景对象相关信息包括场景对象的属性信息、位置信息及表面信息;An image matching and identification device, configured to perform image matching and identification on the video key frame, so as to determine scene object related information corresponding to the video key frame, wherein the scene object related information includes attribute information and location of the scene object. information and surface information; 场景对象相关信息发送装置,用于将所述场景对象相关信息发送至所述用户设备,以使所述用户设备基于所述场景对象相关信息,对所述用户设备采集到的第二视频流的目标帧进行图像校准识别后,基于所述图像校准识别的结果将相应的虚拟对象与所述第二视频流合成为增强现实视频信息;An apparatus for sending scene object-related information, configured to send the scene object-related information to the user equipment, so that the user equipment can, based on the scene object-related information, send the second video stream collected by the user equipment to the user equipment. After the target frame is subjected to image calibration and identification, based on the result of the image calibration and identification, the corresponding virtual object and the second video stream are synthesized into augmented reality video information; 其中,所述第一视频流和所述第二视频流均为所述用户场景对应用户进行录像、或是与其他用户进行交互过程中的视频流,所述视频流对应多个连续的帧,所述第二视频流为所述第一视频流后续的视频流;Wherein, the first video stream and the second video stream are both video streams recorded by the user corresponding to the user scene, or in the process of interacting with other users, and the video stream corresponds to a plurality of consecutive frames, the second video stream is a subsequent video stream of the first video stream; 所述网络设备与一个或多个用户设备之间建立信息传输通道,所述信息传输通道包括信令通道与数据通道,所述信令通道用于传输控制指令,所述数据通道用于传输虚拟对象集中的一个或多个虚拟对象、视频关键帧及视频流。An information transmission channel is established between the network device and one or more user equipment, the information transmission channel includes a signaling channel and a data channel, the signaling channel is used to transmit control instructions, and the data channel is used to transmit virtual One or more virtual objects, video keyframes, and video streams in an object set. 22.根据权利要求21所述的设备,其中,所述设备还包括:22. The apparatus of claim 21, wherein the apparatus further comprises: 虚拟对象集发送装置,用于将与所述视频关键帧对应的场景对象相关信息相匹配的虚拟对象集发送至所述用户设备。A virtual object set sending device, configured to send a virtual object set matching the scene object related information corresponding to the video key frame to the user equipment. 23.一种生成用户场景的增强现实视频信息的系统,其中,所述系统包括:权利要求12至20中任一项所述的用户设备,及权利要求21或22中所述的网络设备。23. A system for generating augmented reality video information of a user scene, wherein the system comprises: the user equipment according to any one of claims 12 to 20, and the network equipment according to claim 21 or 22.
CN201710032139.8A 2017-01-17 2017-01-17 A method and device for generating augmented reality video information of a user scene Active CN108320331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710032139.8A CN108320331B (en) 2017-01-17 2017-01-17 A method and device for generating augmented reality video information of a user scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710032139.8A CN108320331B (en) 2017-01-17 2017-01-17 A method and device for generating augmented reality video information of a user scene

Publications (2)

Publication Number Publication Date
CN108320331A CN108320331A (en) 2018-07-24
CN108320331B true CN108320331B (en) 2021-10-22

Family

ID=62891077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710032139.8A Active CN108320331B (en) 2017-01-17 2017-01-17 A method and device for generating augmented reality video information of a user scene

Country Status (1)

Country Link
CN (1) CN108320331B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112955850B (en) * 2018-09-20 2024-07-19 苹果公司 Method and apparatus for attenuating joint user interactions in a Simulated Reality (SR) space
CN109377553A (en) * 2018-10-26 2019-02-22 三星电子(中国)研发中心 A cloud control method and system for intelligent biological objects
CN110162667A (en) * 2019-05-29 2019-08-23 北京三快在线科技有限公司 Video generation method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331929A (en) * 2014-10-29 2015-02-04 深圳先进技术研究院 Crime scene reduction method based on video map and augmented reality
CN105023266A (en) * 2014-04-29 2015-11-04 高德软件有限公司 Method and device for implementing augmented reality (AR) and terminal device
CN105229719A (en) * 2013-03-15 2016-01-06 奇跃公司 Display system and method
CN106101115A (en) * 2009-07-30 2016-11-09 Sk普兰尼特有限公司 For providing the method for augmented reality, server and portable terminal device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106101115A (en) * 2009-07-30 2016-11-09 Sk普兰尼特有限公司 For providing the method for augmented reality, server and portable terminal device
CN105229719A (en) * 2013-03-15 2016-01-06 奇跃公司 Display system and method
CN105023266A (en) * 2014-04-29 2015-11-04 高德软件有限公司 Method and device for implementing augmented reality (AR) and terminal device
CN104331929A (en) * 2014-10-29 2015-02-04 深圳先进技术研究院 Crime scene reduction method based on video map and augmented reality

Also Published As

Publication number Publication date
CN108320331A (en) 2018-07-24

Similar Documents

Publication Publication Date Title
CN108140263B (en) AR display system and method applied to image or video
US11482192B2 (en) Automated object selection and placement for augmented reality
US9497416B2 (en) Virtual circular conferencing experience using unified communication technology
WO2020211385A1 (en) Image special effect processing method, device, and live video streaming terminal
US20190253474A1 (en) Media production system with location-based feature
US20160027209A1 (en) Real-time immersive mediated reality experiences
CN107633441A (en) Commodity in track identification video image and the method and apparatus for showing merchandise news
CN112199016B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109729420A (en) Image processing method and device, mobile terminal and computer readable storage medium
CN106060470A (en) Video monitoring method and system
CN112492231B (en) Remote interaction method, device, electronic equipment and computer readable storage medium
CN202998337U (en) Video program identification system
CN109743584B (en) Panoramic video synthesis method, server, terminal device and storage medium
CN108320331B (en) A method and device for generating augmented reality video information of a user scene
CN105933637A (en) Video communication method and system
WO2017066736A1 (en) Media-production system with social media content interface feature
CN114139491A (en) Data processing method, device and storage medium
KR100901111B1 (en) Image Provision System Using 3D Virtual Space Contents
JP2024513640A (en) Virtual object action processing method, device, and computer program
CN112131431A (en) Data processing method, data processing equipment and computer readable storage medium
KR102800520B1 (en) Apparatus and method for providing a video call service using augmented reality
CN118764693A (en) Method, device, equipment and storage medium for generating video blog
KR20170127354A (en) Apparatus and method for providing video conversation using face conversion based on facial motion capture
US20230056531A1 (en) Methods and Systems for Utilizing Live Embedded Tracking Data within a Live Sports Video Stream
US20190116214A1 (en) Method and system for taking pictures on real time dynamic basis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 80536, Shanghai

Patentee after: Shanghai Zongzhang Technology Group Co.,Ltd.

Country or region after: China

Address before: Room 80536, Shanghai

Patentee before: SHANGHAI ZHANGMEN SCIENCE AND TECHNOLOGY Co.,Ltd.

Country or region before: China