[go: up one dir, main page]

CN119583744A - Teleconference method, teleconference device, electronic equipment and computer-readable storage medium - Google Patents

Teleconference method, teleconference device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN119583744A
CN119583744A CN202411569805.8A CN202411569805A CN119583744A CN 119583744 A CN119583744 A CN 119583744A CN 202411569805 A CN202411569805 A CN 202411569805A CN 119583744 A CN119583744 A CN 119583744A
Authority
CN
China
Prior art keywords
conference
scene
target
user
selection image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411569805.8A
Other languages
Chinese (zh)
Inventor
秦仙魁
欧阳琼林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thunderbird Innovation Technology Shenzhen Co ltd
Original Assignee
Thunderbird Innovation Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thunderbird Innovation Technology Shenzhen Co ltd filed Critical Thunderbird Innovation Technology Shenzhen Co ltd
Priority to CN202411569805.8A priority Critical patent/CN119583744A/en
Publication of CN119583744A publication Critical patent/CN119583744A/en
Pending legal-status Critical Current

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例公开了一种远程会议方法、装置、电子设备及计算机可读存储介质,该方法包括:接收会议接入设备发送的针对会议场景的位置选择图像,位置选择图像显示在扩展现实设备的虚拟显示屏上;响应于目标用户的选择指令,确定选择指令在位置选择图像中对应的目标位置;发送目标位置给会议接入设备,会议接入设备根据目标位置和姿态信息生成播放内容,姿态信息通过扩展现实设备的姿态采集模组获取;接收会议接入设备发送的播放内容,并在虚拟显示屏上播放播放内容。本申请实施例提供的远程会议方法,可以给使用扩展现实设备进行远程参会的目标用户带来会议现场的视听感受,提升远程用户的参会体验。

The embodiment of the present application discloses a remote conference method, device, electronic device and computer-readable storage medium, the method comprising: receiving a location selection image for a conference scene sent by a conference access device, the location selection image being displayed on a virtual display screen of an extended reality device; responding to a selection instruction of a target user, determining a target location corresponding to the selection instruction in the location selection image; sending the target location to the conference access device, the conference access device generating playback content based on the target location and posture information, the posture information being acquired by a posture acquisition module of the extended reality device; receiving the playback content sent by the conference access device, and playing the playback content on a virtual display screen. The remote conference method provided by the embodiment of the present application can bring the audio-visual experience of the conference site to the target user who uses an extended reality device to participate in a remote meeting, thereby improving the remote user's experience of participating in the meeting.

Description

Teleconference method, teleconference device, electronic equipment and computer-readable storage medium
Technical Field
The embodiment of the application relates to the technical field of augmented reality, in particular to a teleconference method, a teleconference device, electronic equipment and a computer-readable storage medium.
Background
With the continuous development of the augmented reality device, a user can perform a virtual reality conference through the VR device, a simulated real conference scene is displayed in a display screen of the VR device by constructing a virtual space and a character image, or an augmented reality conference can be performed through the AR device, and a conference screen is displayed in a real front view of the user by superposing a video conference interface in the display screen of the AR device.
In the prior art, when users use the existing augmented reality equipment to conduct teleconference, most of the users adopt the method that the live conference recorded on the conference site is presented to participants who conduct teleconference from the view angle of the recorder.
Therefore, even if the existing teleconference participants wear the augmented reality equipment, the existing teleconference participants do not have the experience of personally meeting the scene, and the use requirement of personally meeting the scene of the user cannot be met, so that the user experience is poor.
Disclosure of Invention
The embodiment of the application provides a remote conference method, a remote conference device, electronic equipment and a computer readable storage medium, which can bring audiovisual feeling of a conference site to a target user using an augmented reality device for remote conference and promote the conference experience of the remote user.
In a first aspect, an embodiment of the present application provides a teleconferencing method, applied to an augmented reality device, including:
receiving a position selection image for a conference scene sent by conference access equipment, wherein the position selection image is displayed on a virtual display screen of the augmented reality equipment;
Responding to a selection instruction of a target user, and determining a corresponding target position of the selection instruction in the position selection image;
The target position is sent to the conference access equipment, the conference access equipment generates playing content according to the target position and gesture information, and the gesture information is acquired through a gesture acquisition module of the augmented reality equipment;
and receiving the play content sent by the conference access equipment and playing the play content on the virtual display screen.
Optionally, in some embodiments of the present application, the determining, in response to a selection instruction of a target user, a corresponding target position of the selection instruction in the conference scene includes:
Determining coordinate information of the selection instruction on the position selection image;
And determining the target position in the conference scene according to the coordinate information.
Optionally, in some embodiments of the present application, the augmented reality device includes a first shooting module for acquiring an image in front of the augmented reality device, and before the responding to the selection instruction of the target user, the method further includes:
Acquiring a gesture image of the target user through the first shooting;
determining corresponding selection information of the gesture image in the position selection image;
and generating the selection instruction according to the selection information.
Optionally, in some embodiments of the present application, the augmented reality device includes a gesture acquisition module, and before the receiving the play content sent by the conference access device, the method further includes:
acquiring the gesture information of the target user through the gesture acquisition module;
and sending the gesture information to the conference access equipment.
In a second aspect, an embodiment of the present application provides a teleconference method, which is applied to a conference access device, where the conference access device includes a plurality of second shooting modules, and the method includes:
determining position selection images corresponding to a conference scene through the plurality of second shooting modules, wherein the plurality of second shooting modules are arranged at different positions of the conference scene;
Transmitting the position selection image to an augmented reality device;
receiving a target position sent by the augmented reality equipment, wherein the target position is generated in the position selection image according to a selection instruction of a target user;
and generating playing content according to the target position, and sending the playing content to the augmented reality equipment, wherein the playing content is played on the augmented reality equipment.
Optionally, in some embodiments of the present application, the determining, by the plurality of second shooting modules, a position selection image corresponding to a conference scene includes:
Acquiring a plurality of scene images of the conference scene through the plurality of second shooting modules;
The position selection image is determined from the plurality of scene images.
Optionally, in some embodiments of the present application, the determining the position selection image according to the plurality of scene images includes:
determining a first position area and a second position area in the conference scene according to the scene images, wherein the first position area is an unselected position area of the existing participating user, and the second position area is a position area where the participating user does not exist;
and determining the position selection image according to the first position area and the second position area.
Optionally, in some embodiments of the present application, the conference access device includes a plurality of audio collection modules, and the generating playing content according to the target location includes:
Acquiring a plurality of scene images of the conference scene through the plurality of second shooting modules;
acquiring a plurality of audio data of different angles in the conference scene through the plurality of audio acquisition modules, wherein the plurality of audio acquisition modules are arranged at different positions of the conference scene;
determining a target picture according to the plurality of scene images and the target position, and determining audio content corresponding to the target picture according to the plurality of audio data and the target position;
and generating the playing content according to the target picture and the audio content.
Optionally, in some embodiments of the present application, generating the play content according to the plurality of scene images, the plurality of audio data, and the target location information includes:
receiving gesture information of the target user sent by the augmented reality equipment;
and determining the target picture according to the gesture information and the plurality of scene images, and determining the audio content corresponding to the target picture according to the gesture information and the plurality of audio data.
In a third aspect, an embodiment of the present application further provides a teleconferencing apparatus, applied to an augmented reality device, including:
the first processing module is used for receiving a position selection image for a conference scene sent by the conference access equipment, and the position selection image is displayed on a virtual display screen of the augmented reality equipment;
The first processing module is further used for responding to a selection instruction of a target user and determining a corresponding target position of the selection instruction in the position selection image;
the first sending module is used for sending the target position to the conference access equipment, the conference access equipment generates playing content according to the target position and the gesture information, and the gesture information is acquired through a gesture acquisition module of the augmented reality equipment;
the first processing module is further configured to receive the play content sent by the conference access device, and play the play content on the virtual display screen.
Optionally, in some embodiments of the present application, the first processing module is configured to:
Determining coordinate information of the selection instruction on the position selection image;
And determining the target position in the conference scene according to the coordinate information.
Optionally, in some embodiments of the present application, the first processing module is further configured to:
Acquiring a gesture image of the target user through the first shooting;
determining corresponding selection information of the gesture image in the position selection image;
and generating the selection instruction according to the selection information.
Optionally, in some embodiments of the present application, the first sending module is configured to:
acquiring the gesture information of the target user through the gesture acquisition module;
and sending the gesture information to the conference access equipment.
In a fourth aspect, an embodiment of the present application further provides a teleconference device, applied to a conference access apparatus, where the teleconference device includes:
The second processing module is used for determining position selection images corresponding to the conference scene through the plurality of second shooting modules, and the plurality of second shooting modules are arranged at different positions of the conference scene;
the second sending module is used for sending the position selection image to the augmented reality equipment;
The second processing module is further configured to receive a target position sent by the augmented reality device, where the target position is generated in the position selection image according to a selection instruction of a target user;
The second processing module is further configured to generate a play content according to the target position, and send the play content to the augmented reality device, where the play content is played on the augmented reality device.
Optionally, in some embodiments of the application, the second processing module is configured to:
Acquiring a plurality of scene images of the conference scene through the plurality of second shooting modules;
The position selection image is determined from the plurality of scene images.
Optionally, in some embodiments of the present application, the second processing module is further configured to:
determining a first position area and a second position area in the conference scene according to the scene images, wherein the first position area is an unselected position area of the existing participating user, and the second position area is a position area where the participating user does not exist;
and determining the position selection image according to the first position area and the second position area.
Optionally, in some embodiments of the present application, the second processing module is further configured to:
Acquiring a plurality of scene images of the conference scene through the plurality of second shooting modules;
acquiring a plurality of audio data of different angles in the conference scene through the plurality of audio acquisition modules, wherein the plurality of audio acquisition modules are arranged at different positions of the conference scene;
determining a target picture according to the plurality of scene images and the target position, and determining audio content corresponding to the target picture according to the plurality of audio data and the target position;
and generating the playing content according to the target picture and the audio content.
Optionally, in some embodiments of the present application, the second processing module is further configured to:
receiving gesture information of the target user sent by the augmented reality equipment;
and determining the target picture according to the gesture information and the plurality of scene images, and determining the audio content corresponding to the target picture according to the gesture information and the plurality of audio data.
In a fifth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the computer program when executed by the processor implements the steps in the teleconferencing method described above.
In a sixth aspect, embodiments of the present application further provide a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps of the teleconferencing method described above.
In a seventh aspect, embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations described in the embodiments of the present application.
In summary, the embodiment of the application receives a position selection image for a conference scene sent by conference access equipment, wherein the position selection image is displayed on a virtual display screen of the augmented reality equipment, responds to a selection instruction of a target user, determines a corresponding target position of the selection instruction in the position selection image, sends the target position to the conference access equipment, generates playing contents according to the target position and gesture information, the gesture information is acquired through a gesture acquisition module of the augmented reality equipment, receives the playing contents sent by the conference access equipment, plays the playing contents on the virtual display screen, returns the corresponding target position to the conference access equipment according to the selection instruction of the user, and then receives and plays the playing contents generated by the conference access equipment according to the target position, thereby realizing the technical effects of bringing the audiovisual experience of the conference scene to the target user using the augmented reality equipment to remotely participate and improving the participation experience of the remote user.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a scenario in which an augmented reality device according to an embodiment of the present application performs a teleconferencing method;
Fig. 2 is a schematic view of a scenario in which an augmented reality device according to an embodiment of the present application performs the teleconferencing method;
fig. 3 is a schematic structural diagram of XR glasses corresponding to a teleconferencing method according to an embodiment of the present application;
Fig. 4 is a schematic view of a scenario in which a conference access device provided in an embodiment of the present application performs a teleconference method;
Fig. 5 is a schematic view of a scenario in which a conference access device provided in an embodiment of the present application performs the teleconference method;
fig. 6 is a schematic view of a conference scenario provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of a target location selection result according to an embodiment of the present application;
Fig. 8 is a schematic diagram of the overall structure of a teleconferencing method according to an embodiment of the present application;
Fig. 9 is a schematic structural diagram of a teleconference device of an augmented reality apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a teleconference device of a conference access apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the features described above. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present application, the term "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" in this disclosure is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes have not been described in detail so as not to obscure the description of the application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
First, the terms related to the present application will be explained:
Augmented reality, augmented Reality, AR, is a technique that creates an enhanced perceived environment by combining virtual information with a real world scene.
An augmented reality device for fusing virtual content with the real world to provide an enhanced visual experience. These devices typically employ Head-Mounted displays (HMDs), smart glasses (SMART GLASSES), or other forms of wearable devices.
The embodiment of the application provides a teleconference method, a teleconference device, electronic equipment and a computer-readable storage medium. Specifically, the embodiment of the application provides a remote conference device suitable for a remote conference method, which comprises the remote conference device and a main control device of the remote conference device.
In the prior art, with the rapid development and the gradual maturity of the augmented reality display technology, the augmented reality display and the artificial intelligence large model make the augmented reality equipment have rich and colorful functional application, more and more wearable augmented reality equipment (such as AR, VR, XR glasses and the like) are sequentially marketed, and more remote conference scenes are carried out by using the wearable augmented reality equipment.
However, when a user uses an existing augmented reality device to conduct a teleconference, a method is mostly adopted in which a live conference recorded on a conference site is presented to participants who conduct the teleconference from the perspective of a recorder.
However, even if the augmented reality device is worn, the teleconference mode does not have the experience of being in the meeting place, and cannot meet the use requirement of being in the scene of the user, so that the user experience is poor.
Therefore, the existing remote conference method of the augmented reality equipment has the defect of insufficient real experience effect, and is not used for presenting play content to the remote online user at the view angle of the user in the conference site, so that the remote online user cannot be placed in the site conference, and the immersive conference feeling of the user cannot be brought.
The embodiment of the application provides a teleconference method, a teleconference device, electronic equipment and a computer-readable storage medium. The method comprises the steps of receiving a position selection image which is sent by conference access equipment and is aimed at a conference scene, displaying the position selection image on a virtual display screen of the augmented reality equipment, responding to a selection instruction of a target user, determining a corresponding target position of the selection instruction in the conference scene, enabling the selection instruction to act on the position selection image, sending the target position to the conference access equipment, receiving playing content sent by the conference access equipment, playing the playing content on the virtual display screen, and generating the playing content according to the target position by the conference access equipment.
In summary, the remote conference method in the embodiment of the application can return the corresponding target position to the conference access device by the selection instruction of the user, and then receive and play the play content generated by the conference access device according to the target position, thereby realizing the audiovisual feeling of the conference site for the target user using the augmented reality device to carry out remote conference and improving the technical effect of the conference experience of the remote user.
The following will describe in detail. It should be noted that the following description order of embodiments is not a limitation of the priority order of embodiments.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of a teleconference method provided by an embodiment of the present application, where the teleconference system may include an augmented reality device 100 and a master device 200, and the augmented reality device 100 and the master device 200 may be connected by any communication method, including but not limited to signal communication through an electronic circuit, communication through a wireless signal, and the wireless signal may be a computer network communication of a TCP/IP protocol family (TCP/IP Protocol Suite, TCP/IP), a user datagram protocol (User Datagram Protocol, UDP), and so on. The augmented reality device 100 may receive a control signal from a remote controller or a control panel, and the augmented reality device 100 may also receive instruction information sent by the main control apparatus 200, and the augmented reality device 100 may perform a corresponding operation, such as a teleconferencing method according to the present application, according to the corresponding instruction information.
In an embodiment of the present application, the augmented reality device 100 includes, but is not limited to, a Head-Mounted Display (HMD), smart glasses (SMART GLASSES), or other forms of wearable devices, etc.
It will be appreciated by those skilled in the art that the application environment shown in fig. 1 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may also include more or fewer augmented reality devices than those shown in fig. 1, for example, only 1 augmented reality device is shown in fig. 1, and is not limited thereto.
As shown in fig. 1, the main control device 200 may include any hardware device capable of performing data processing and instruction transmission, such as a CPU or a single chip microcomputer embedded in the augmented reality device, and is not limited in this particular regard.
It should be noted that, the schematic view of the teleconference system shown in fig. 1 is only an example, and the teleconference system and the scene described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of the teleconference system and the appearance of a new service scenario, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
Specifically, referring to fig. 2, fig. 2 is a schematic view of a scenario in which an augmented reality device executes the teleconference method according to an embodiment of the present application, where a specific execution procedure of the augmented reality device executing the teleconference method is as follows:
and S201, receiving a position selection image for a conference scene sent by the conference access device.
In the embodiment of the application, the position selection image is displayed on a virtual display screen of the augmented reality device, and the virtual display screen is generated after the physical display screen is subjected to multiple propagation and conversion of a plurality of optical components.
As an alternative embodiment, the position selection image may be a plurality of different view position selection images for the user to select his or her desired position within a selectable area within the image.
The selection condition may include that the seat is selected with a gap, that the seat is added without a gap, and that the seat is added with a gap.
In the embodiment of the application, the conference scene is an off-line conference scene, such as an off-line conference room scene and the like.
As an alternative embodiment, after the conference access device at the site of the offline conference room acquires the site image, the generated position selection image is sent to the augmented reality device, and the augmented reality device receives the position selection image and displays the position selection image on the virtual display screen.
In the embodiment of the application, the extended reality device can be various wearable extended reality devices such as XR glasses, etc., and the XR terminal electronic device refers to human body wearable devices applied to the XR technology (including VR, AR and MR), namely XR intelligent glasses and XR head-mounted displays.
Optionally, as shown in the schematic XR glasses structure of fig. 3, the XR smart glasses have a natural form of ordinary glasses, and the glasses body is composed of a frame and legs, and further includes a computing processor (i.e. the above-mentioned main control device 200), a sensor, a display, a shooting module, an optical component, a speaker, a communication component, and the like, in addition to the components included in the ordinary glasses.
The computing processor is used for information data processing, logic operation and the like of the XR intelligent glasses and comprises CPU, GPU, NPU units and the like, and can be a general chip system or a chip system specially designed for XR business, such as Gao Tongxiao dragon AR1 chip system and Gao Tongxiao dragon AR2 chip system.
The sensor is used for the intelligent XR glasses to collect and observe the surrounding environment information of glasses including the user, and comprises a camera for collecting video image information, a gyroscope for measuring the gesture action of the user, an accelerometer for measuring the position of the user and the equipment, a magnetic force sensor for measuring the magnetic declination of the position of the equipment, an eye tracker for tracking the point of sight falling of the user and a microphone for collecting the sound information of the user.
The display is used for displaying and presenting pictures generated by rendering of the computing processor, and comprises self-luminous active devices micro-led and micro-oled, liquid crystal display LCD and LCOS which need external light source illumination, micro electro mechanical systems MEMS and digital mirror array DMD and the like.
The optical component is used for image imaging and transmitting image light into human eyes and comprises a lens, an optical film material, an optical waveguide and the like.
The speaker is used for playing sound.
The communication component is used for communication connection between the XR glasses and external other devices such as a platform cloud server, a conference site controller, XR glasses used by other participants, and the like, and can be a cellular communication module, a WiFi module, a Bluetooth module, and the like.
In addition, XR intelligent glasses still include the power, and the power can set up in XR glasses, also can separate and the electricity is connected with the glasses body. XR glasses may also include ring, watch, wristband, console, headset, etc. for use with devices that are optionally electrically connected to the glasses body when desired. It will be appreciated that the foregoing computing processor, sensors, speakers, communication components may also be disposed outside the eyeglass body in electrical communication therewith.
In the embodiment of the application, the virtual meeting position of the user on-line off-site meeting room, namely the target position, is determined, the on-site information of the off-line on-site meeting room observed by the virtual meeting position is generated, and the on-site information of the meeting room is presented to the user.
Optionally, in a meeting preparation stage before the meeting formally starts, the XR intelligent glasses are in communication connection with a meeting access device arranged on the on-line meeting room site, the meeting access device is an electronic device integrated with environment sensing and communication functions, and possibly also integrated with part of computing processing functions, the XR intelligent glasses send instructions to the meeting access device to instruct the meeting access device to shoot and collect meeting room site picture pictures before the meeting starts through a camera, and the meeting room site picture at least comprises information such as meeting room site arrangement, participants seat and the like. The intelligent glasses present the conference room live picture photos in a virtual display mode in front of the eyes of the user.
Alternatively, the conference room live picture photograph, i.e., the location selection image, may be a photograph to which additional information is added, such as personal information, location information, etc., of other participants of the conference in a remote on-line form. The user can select his own conference location directly on the location selection image.
Note that, the setting position of the conference access device is not particularly limited, and in the embodiment of the present application, the conference access device is exemplified as being installed in a conference scene.
S202, responding to a selection instruction of a target user, determining a corresponding target position of the selection instruction in a conference scene, wherein the selection instruction acts on a position selection image;
In the embodiment of the application, after the position selection image appears on the virtual display screen of the expansion display device, the user can select the position wanted by himself in the selectable area of the position selection image, and the expansion display device determines the corresponding target position in the conference scene according to the selection instruction.
As an alternative embodiment, the location selection situation may include a free available option, a free self-addition seat, and a free or self-addition seat. For example, the conference room has an empty seat, the user selects a vacancy as the position of the conference, for example, the conference room has no empty seat, the user adds a seat as the position of the conference, for example, the conference room has an empty seat, but the vacancy is the seat of the conference speaker, and the user can add a seat as the position of the conference participant.
Alternatively, the eyeglass end obtains data (scene, person, seat, etc.) transmitted from the server, and the user selects the location based on the scene and seat data.
In some embodiments of the present application, the augmented reality device includes a first shooting module, and before responding to a selection instruction of a target user, optionally, the teleconference method provided in the embodiment of the present application further includes acquiring a gesture image of the target user through first shooting, determining selection information corresponding to the gesture image in the position selection image, and generating the selection instruction according to the selection information.
In the embodiment of the application, the XR intelligent glasses can receive the input instruction of a user taking gestures and hand actions as the selection reference positions.
As an alternative embodiment, when the virtual display screen of the XR smart glasses is a reference position selection interface, the camera of the XR smart glasses takes a gesture action picture of the user to determine a selection instruction of the user.
Optionally, the remote conference method further comprises the steps of determining coordinate information of the selection instruction on the position selection image, and determining the target position in the conference scene according to the coordinate information.
In the embodiment of the application, the XR intelligent glasses receive and analyze the instruction of selecting the meeting position by the user, calculate the meeting position coordinates corresponding to the meeting position in the meeting place in the virtual picture, and send the meeting position corresponding to the meeting place to the meeting access device.
As an optional embodiment, after the camera of the XR smart glasses takes a picture of gesture actions of the user, the main control device interprets the gesture actions to determine a position selected by the user on the virtual screen (i.e. a position selection image), inquires specific position coordinates of the selected position in the virtual screen, and determines specific position coordinates of a selected reference position in the real conference room on the virtual screen through a coordinate conversion relationship between the virtual screen and the real conference room.
Optionally, the XR smart glasses generate an initial view of the live conference room after selecting the virtual meeting location of the user in the real conference room.
S203, sending the target position to conference access equipment;
In the embodiment of the application, after the target position of the user is determined, the target position is sent to the conference access equipment, so that the conference access equipment generates the playing content according to the target position.
As an alternative embodiment, the target position may also be submitted to the conference access device for processing confirmation, e.g. after obtaining a user selection instruction, the selection instruction and the corresponding image are sent to the conference access device, so that the conference access device determines the target position.
In some embodiments of the present application, the augmented reality device includes a gesture acquisition module, and optionally, before receiving the play content sent by the conference access device, the teleconference method provided in the embodiment of the present application further includes acquiring gesture information of the target user by the gesture acquisition module, and sending the gesture information to the conference access device.
In the embodiment of the application, after the teleconference user formally enters the conference, the on-site information of the on-site conference room is synchronously presented to the user in real time, so that the gesture acquisition module can acquire the gesture information of the user in real time after confirming the target position and send the gesture information to the conference access equipment.
Optionally, after the user debugs the confirmation effect on the initial picture of the conference room, the user formally joins the conference. The XR intelligent glasses calculate the head gesture of the user through measurement of a gyroscope, an accelerometer and the like, and the eye movement instrument measures the eye drop point of the user.
Optionally, the method for generating updated conference room pictures according to the head gesture parameters and the line-of-sight falling point parameters of the user can be that the conference access device is provided with more than 3 cameras at different positions of a conference site, and the cameras shoot pictures of the conference room site pictures in real time. The conference access device sends the position coordinates of the cameras and the photos shot at the same time to the XR intelligent glasses, the XR intelligent glasses acquire the current sight falling point of the user through the eye movement instrument and calculate to obtain the observation view angle of the user, and according to the pictures of the conference room, the participation positions of the user and the sight falling point shot at the same time, the XR intelligent glasses generate the conference room scene pictures shot and observed at the attention view angle of the participation positions of the user according to the information.
Optionally, the method for generating and updating the conference room picture according to the head gesture parameter and the user sight line drop point parameter can also be that the conference access device is provided with more than 3 cameras at different positions of the conference site, the cameras shoot pictures of the conference room site in real time, a 3D scene is built according to the pictures, and after the position and the head gesture of an observer are determined, the picture under the view angle of the observer is generated. The XR smart glasses control the micro-display optical assembly to be presented in an augmented reality manner in front of the user's eye.
In the embodiment of the application, the XR intelligent glasses receive the conference site sound collected by the conference access device through the microphone, and convert the conference site sound collected by the microphone into the conference site sound heard at the virtual participation position of the user.
Optionally, the sound conversion generating method may include determining a position of a sound source, determining a position of an acquisition microphone, determining a virtual reference position of a user, and converting a characteristic parameter of sound acquired by the microphone into a characteristic parameter of sound received at the virtual reference position of the user through an algorithm based on the relative position parameter. The computing processor of the XR smart glasses controls the acoustic modules such as the speakers of the XR smart glasses to present conference room site sounds around the user. During the user speaking phase, the microphone of the XR intelligent glasses collects user sound and transmits the user sound to the conference access device at the conference site.
S204, receiving the playing content sent by the conference access equipment, playing the playing content on the virtual display screen, and generating the playing content according to the target position by the conference access equipment.
In the embodiment of the application, after the conference access device generates the play content according to the target position, the play content is sent to the augmented reality device, and the augmented reality device receives the play content sent by the conference access device and plays the play content on the virtual display screen.
As an alternative embodiment, the XR smart glasses send instructions to the conference access device, the conference access device can move to a meeting position selected by a user, and a camera and a microphone of the conference access device observe and receive meeting room field picture information and sound information at the current moment at the meeting position.
Optionally, another method for generating meeting room site information by the XR intelligent glasses is that for sound information, a microphone of the meeting access device receives and records the sound information of the meeting room site at an original position, the recorded sound information comprises sound sources, tone, loudness and the like, the position of the meeting access device and the position of a user's participation are sent to the XR intelligent glasses, the XR intelligent glasses generate meeting site sound information which is observed at the position of the user's participation according to the information, and a loudspeaker is controlled by the XR intelligent glasses to play sound according to the meeting site sound information and present the sound to the user.
Optionally, for visual information, the conference access device is provided with more than 3 cameras at different positions of the conference site, and the cameras take pictures of the conference room site in real time. The conference access device sends the position coordinates of the cameras and the photos shot at the same time to the XR intelligent glasses, the XR intelligent glasses acquire the current sight falling point of the user through the eye movement instrument and calculate to obtain the observation view angle of the user, the XR intelligent glasses generate the conference room scene pictures shot and observed at the attention view angle according to the pictures of the conference room pictures shot at the same time, the participation positions of the user and the sight falling point of the user, and the XR intelligent glasses control the display screen and the optical components to display the conference room scene pictures to the user in a virtual display mode.
In this scenario, visual picture information received by the user through the XR smart glasses will switch as the user's attention viewing angle switches, which typically follows the conference speaker switch. In this scene, XR intelligent glasses can also accept the sound information that the record user sent to the meeting access ware, meeting access ware control speaker makes the sound in meeting place user's meeting position, builds the effect that the user just talks in the scene.
Referring to fig. 4, fig. 4 is a schematic view of a scenario of a teleconference method according to an embodiment of the present application, where the teleconference system may include a conference access device 400 and a processing apparatus 500, where the conference access device 400 and the processing apparatus 500 may be connected by any communication means, including but not limited to signal communication through electronic circuits, communication through wireless signals, and the wireless signals may be computer network communication of TCP/IP protocol (TCP/IP Protocol Suite, TCP/IP), user datagram protocol (User Datagram Protocol, UDP), and so on. The conference access device 400 may receive a control signal on a remote controller or a control panel, and the conference access device 400 may also receive instruction information sent by the processing apparatus 500, and the conference access device 400 may perform a corresponding operation, such as a remote conference method in the present application, according to the corresponding instruction information.
In an embodiment of the present application, conference access device 400 includes, but is not limited to, a remote conference access device or other form of conference access device on-line to a conference room site, and the like.
It will be appreciated by those skilled in the art that the application environment shown in fig. 4 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may also include more or fewer conference access devices than those shown in fig. 4, for example, only 1 conference access device is shown in fig. 4, and in particular, the application is not limited thereto.
As shown in fig. 4, the processing apparatus 500 may include any hardware device capable of performing data processing and instruction transmission, such as a CPU or a single chip microcomputer embedded in the conference access device, and is not limited in this particular embodiment.
It should be noted that, the schematic view of the teleconference system shown in fig. 4 is only an example, and the teleconference system and the scene described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of the teleconference system and the appearance of the new service scenario, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
Specifically, referring to fig. 5, fig. 5 is a schematic view of a scenario in which a conference access device executes the teleconference method according to an embodiment of the present application, where a specific execution procedure of the conference access device executing the teleconference method is as follows:
S501, determining a position selection image corresponding to a conference scene through a plurality of second shooting modules, wherein the plurality of second shooting modules are arranged at different positions of the conference scene.
In the embodiment of the application, the scene is actually tested according to the number and the installation positions of the second shooting modules of the scene, and the effect achieved by the three-dimensional reconstruction technology can be achieved.
Optionally, the three-dimensional reconstruction technology is performed through the second shooting module, and the conference site is constructed one by one. The second shooting module acquires information and transmits the information to the processing device for unified processing to obtain related information such as conference scenes, personnel, seats and the like, and the processing device transmits the processed information to the augmented reality equipment.
Optionally, the teleconference method further comprises the steps of obtaining a plurality of scene images of the conference scene through a plurality of second shooting modules, and determining a position selection image according to the plurality of scene images.
In the embodiment of the application, as shown in a conference scene schematic diagram in fig. 6, a plurality of second shooting modules (cameras) in a conference room are arranged in advance to comprehensively capture scenes of a conference place, and corresponding pictures are transmitted to a server, wherein the server can be a processing device of equipment such as a site conference access device, intelligent glasses, a cloud server and the like.
It should be noted that the large sharing screen shown in fig. 6 is merely an example, and may be a main screen in a conference room, etc., and the positions selected by the user A, B, C, D, E, F and the installation positions of the plurality of second shooting modules are also exemplary, and may be adjusted according to actual application situations.
As an alternative embodiment, the server may use three-dimensional Reconstruction (3D Reconstruction) techniques to generate a three-dimensional model of the same conference scene as the current conference scene.
It should be noted that the three-dimensional reconstruction technique involves the steps of finally generating a three-dimensional model like a picture capturing scene 1. Data acquisition, taking pictures of the room from different angles using a plurality of cameras. These pictures need to cover every corner of the room in order to be able to capture the complete appearance of the room. 2. And extracting key feature points such as corner points, edges and the like from each picture. 3. And (3) feature matching, namely finding the same feature points among different pictures, and determining the corresponding relation among the different pictures. 4. Camera calibration-determining the internal parameters (focal length, principal point, etc.) and external parameters (position and orientation of the camera) of the camera is typically done by taking pictures of known patterns. 5. And generating a three-dimensional point cloud, namely converting points on the two-dimensional image into points in a three-dimensional space by utilizing feature matching and camera parameters to form the point cloud. 6. Surface reconstruction-recovering the surface of an object from a point cloud, which may involve techniques such as mesh generation, surface fitting, etc. 7. Texture mapping, namely mapping the texture of an original picture onto a reconstructed three-dimensional model to obtain a realistic appearance. 8. Post-processing, which may include the steps of smoothing, removing noise, optimizing the model, etc.
Optionally, the teleconference method further comprises the steps of determining a first position area and a second position area in the conference scene according to the plurality of scene images, wherein the first position area is an unselected position area of the existing participating user, the second position area is an unselected position area, and determining a position selection image according to the first position area and the second position area.
As an alternative embodiment, a selectable position area and a non-selectable position area in the conference scene are determined from the plurality of scene images, and a position selection image is generated from the selectable position area and the non-selectable position area.
S502, sending a position selection image to the augmented reality device.
In the embodiment of the application, a seat selection interface is generated according to the position data of the site conference, and the position selection image corresponding to the seat selection interface is sent to the augmented reality equipment for display, so that a user can select a seat.
S503, receiving the target position sent by the augmented reality device.
In the embodiment of the present application, as shown in the target position selection result schematic diagram in fig. 7, a user selection result, that is, a target position, sent by the augmented reality device is received, and corresponding play content is generated based on the target position.
It should be noted that the rectangle at the selected position of the glasses end shown in fig. 7 may be the orientation of the field of view of the user of the augmented reality device, that is, the field of view seen by the remote user in the augmented reality device is the direction of the field of view facing the shared large screen according to the selected position.
And S504, generating play content according to the target position, and sending the play content to the augmented reality equipment.
In the embodiment of the application, according to the seat selected by the user and the site conference three-dimensional model three-dimensionally reconstructed by the processing device, the corresponding system automatically moves, rotates, scales and the like the conference site three-dimensional model, accords with the position of a remote conference site as much as possible, and presents the conference site three-dimensional model at the glasses end.
Optionally, the corresponding three-dimensional character model is correspondingly generated and placed into the three-dimensional model of the conference site, and the three-dimensional model is bound with some conventional basic animation states (for example, idle, speaking, standing up, sitting down, turning around and clapping hands), and furthermore, the presentation of higher-level and complex actions can be realized by using the action capturing technology to drive the corresponding model.
In the embodiment of the application, according to the selected position of the glasses end, a three-dimensional model scene generated on the conference site, such as a graph, moves, rotates and places the scene, so that the participants on the glasses end can be in an on-site conference, the glasses end can be combined with 3dof (or 6 dof) of the glasses, the scene can be rotationally observed, or the scene roaming can be carried out.
The server acquires the site information in real time, transmits the site information to the glasses end, and drives the character expression, action and the like of the three-dimensional model scene according to different conference site conditions in real time.
As an alternative embodiment, a conference room picture that a user sees at a conference virtual meeting position (target position) may be obtained by inputting the virtual meeting position and an initial viewing position such as a front view conference room display large screen. As an example, the process of generating a conference scene three-dimensional model by three-dimensional reconstruction to regenerate a scene picture at a specified viewing angle at a specified location may be a NERF rendering process.
It should be noted that NERF (neural radiation field) is an emerging rendering technology, and the core objective is to solve how to generate images from some captured images under new viewing angles. Unlike conventional three-dimensional reconstruction methods, NERF does not represent the scene as an explicit form of point cloud, grid, voxel, etc., but rather models the scene as a continuous 5D radiation field that is implicitly stored in the neural network. NERF is trained by inputting sparse multi-angle images to obtain a neural radiation field model, and then a clear picture under any view angle can be rendered according to the model. The rendering process is to input the position and direction of the light rays emitted under a certain visual angle and the corresponding coordinates, then send the light rays into NERF to obtain volume density and color, and finally obtain a final image through volume rendering.
Optionally, the remote conference method further comprises the steps of receiving gesture information of a target user sent by the augmented reality device, determining a target picture according to the gesture information and the plurality of scene images, and determining audio content corresponding to the target picture according to the gesture information and the plurality of audio data.
In the embodiment of the application, after the teleconference user formally enters the conference, the on-site information of the on-site conference room can be synchronously presented to the user in real time, so that after the target position is confirmed, the gesture acquisition module can acquire the gesture information of the user in real time and send the gesture information to the conference access equipment, and the conference access equipment receives the gesture information sent by the augmented reality equipment in real time.
As an alternative embodiment, conference room presence information mainly refers to visual picture information and auditory sound information, and may include information such as smell if possible. Visual picture information of the live conference room observed by the virtual reference position (namely, the target position) is generated.
In some embodiments of the present application, the conference access device includes a plurality of audio acquisition modules, optionally, the teleconference method provided in the embodiment of the present application further includes acquiring a plurality of scene images of the conference scene through a plurality of second shooting modules, acquiring a plurality of audio data of different angles in the conference scene through a plurality of audio acquisition modules, wherein the plurality of audio acquisition modules are disposed at different positions of the conference scene, determining a target picture according to the plurality of scene images and the target position, determining audio content corresponding to the target picture according to the plurality of audio data and the target position, and generating play content according to the target picture and the audio content.
In the embodiment of the application, the surrounding environment data can be acquired through a gyroscope, a magnetic sensor, an IMU and the like carried by the intelligent glasses, the head gesture parameter of the user at the current moment is obtained by utilizing a 6Dof tracking technology, the view angle of the user at the current observation meeting room is further obtained, the virtual meeting position of the user at the scene meeting room is determined in response to the instruction of the user to select the meeting position at the current moment in the virtual picture, the view angle of the user in the three-dimensional model of the front meeting scene is updated, and then the picture of the meeting scene is also updated.
It should be noted that 6DOF (six degrees of freedom) provides a more complete description of spatial motion, including three translational degrees of freedom and three rotational degrees of freedom, X-axis translation, movement in the X-axis direction. And the Y-axis translation is movement along the Y-axis direction. Z-axis translation, movement in the Z-axis direction. Rotation about the X-axis, commonly referred to as Pitch (Pitch), rotation of the object about the X-axis. Rotating about the Y-axis, commonly referred to as Yaw (Yaw), rotation of the object about the Y-axis. Rotation about the Z-axis, commonly referred to as tumbling (Roll), rotation of the object about the Z-axis. The 6DOF allows more complex movements of objects in space, including the precise operation of the aircraft, robots in flight simulators, and the omnidirectional movements of the user's head and hands in virtual reality.
Optionally, the eye vision of the user can be tracked through an eye tracker carried by the intelligent glasses, so that the view angle of the user observing the conference room at the current moment can be obtained, the view angle of the user in the three-dimensional model of the conference scene is updated, and meanwhile, the conference scene picture is updated. The computing processor controls the display screen to display the generated picture, and the virtual picture of the conference room is presented to the user through imaging and transmission of the optical component.
By the steps, the virtual picture presented to the user changes along with the observation sight and the participating position of the remote participant user, and the virtual picture is equivalent to the picture of the conference room.
In the embodiment of the application, for the generation of the audible sound information of the live conference room observed by the virtual meeting position, the conference site sound collected by the conference access device through the microphone array (namely the audio collection module) arranged in the live conference room can be received through the XR intelligent glasses, and the conference site sound collected by the microphone is converted into the conference site sound heard by the user at the virtual meeting position.
Optionally, the sound conversion generation method may be that the position of the sound source is determined, the position of the collecting microphone is determined, the virtual reference position of the user is determined, and the characteristic parameters of the sound collected by the microphone are converted into the characteristic parameters of the sound received at the virtual reference position of the user through an algorithm based on the relative position parameters. By using the spatial audio technology, a computing processor of the XR intelligent glasses controls an acoustic module such as an XR loudspeaker to present conference room site sounds around a user.
The server can capture and analyze information such as speaking, actions and the like in real time, and then the identity, the position and the information corresponding to the three-dimensional model of the conference site are generated into data packets correspondingly and are transmitted to the eyeglass end.
After the data are acquired and analyzed by the glasses end, the corresponding character model in the corresponding conference site three-dimensional model of the driving glasses end is subjected to mouth shape and action driving, and meanwhile, the position of the sound source is moved to the position of the corresponding character model in the corresponding conference site three-dimensional model.
The user at the end of the glasses can also see the speaking person by rotating.
Optionally, the above steps are continuously repeated, so that the glasses end can see the actual conference effect similar to the scene, the action and the sound position of the person on the conference site.
With the embodiment of the application, the sound heard by the remote participant user varies with the location selected by the user, and the sound is equivalent to the live sound. The remote user can randomly switch the observation and listening positions and angles in the field conference room through the XR intelligent glasses in the form of virtual participation. The intelligent glasses control the camera to shoot the image photo of the user, the microphone is controlled to collect the sound of the user and transmit the sound to the conference access device on site, the conference access device can control the optical instrument on site to display the image of the user in a holographic manner at the virtual meeting position selected by the user, and the loudspeaker array on site can be controlled to play the sound equivalent to the sound emitted by the user at the meeting position in the conference room.
In addition, if the on-site participants use the PC client to participate in the conference, the PC client can directly acquire the identity and the voice of the speaker. If the video conference is started, the video information of the speaker can be directly obtained.
In the embodiment of the present application, as shown in the overall structure schematic diagram of the teleconference method in fig. 8, the main terminal may have a server terminal (processing device of the conference access device) for providing related services, and is responsible for management of the conference on the XR glasses line, different server services may be obtained by logging in the client under different scenes, and one or more second shooting modules on the conference site are controlled to obtain the situation of the current environmental site, so as to analyze the position layout of the conference seat, whether personnel exist on the position, and the like, and transmit the position layout to the server in real time. If the meeting participants participate in the meeting, the computer end (PC end) synchronizes the sharing information of the meeting, the camera module of the computer end can acquire and capture the head-in-body image information and transmit the information to the server in real time, and meanwhile, the virtual image of the participant at the glasses end can be projected in a holographic mode.
Specifically, the on-site participants speak through the PC client and transmit the corresponding identity, sound and video information to the server. The server acquires the three-dimensional model and character model information of the conference site according to the corresponding identity, and then the information is transmitted to the glasses terminal after the video information step to the action and expression of the character, and the information is packed into an information data packet. And after the data are acquired and analyzed by the glasses end, the corresponding character model in the corresponding conference site three-dimensional model of the driving glasses end is subjected to mouth shape and action driving, and meanwhile, the position of the sound source is moved to the position of the corresponding character model in the corresponding conference site three-dimensional model.
In the embodiment of the application, XR glasses log in a conference, firstly, a corresponding conference scene is generated on the glasses according to the site environment scanned by a server and the actual conference site layout, then, a position is preliminarily allocated to a glasses end according to the site vacant position condition, the display scene on the glasses, the position and the site conference scene are basically consistent, and a client at the glasses end can roam in the scene or sit and twist to watch a person or speak by combining Slam of the glasses. And the client at the glasses end hears the speaking in different conferences, the hearing sound is similar to the real scene according to different positions, and is the sound effect of the 3D scene, and the server end transmits related information according to the character information of the scene, and builds a corresponding scene character model in the 3D scene of the glasses end, so that the information of the actions, the expressions and the like of the scene character can be synchronized in real time.
According to the embodiment of the application, the position selection image corresponding to the conference scene is determined through the plurality of second shooting modules, the plurality of second shooting modules are arranged at different positions of the conference scene, the position selection image is sent to the augmented reality equipment, the target position sent by the augmented reality equipment is received, the playing content is generated according to the target position, the playing content is sent to the augmented reality equipment, the corresponding target position is returned to the conference access equipment according to the selection instruction of the user, and then the playing content generated by the conference access equipment according to the target position is received and played, so that the audio-visual feeling of the conference scene is brought to the target user remotely participating in by using the augmented reality equipment, and the technical effect of the participation experience of the remote user is improved.
In order to facilitate better implementation of the teleconference method, the application also provides a teleconference device based on the teleconference method. Where nouns have the same meaning as in the teleconferencing method described above, specific implementation details may be referred to in the description of the method embodiments.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a teleconference device provided in an embodiment of the present application, where the teleconference device 900 is applied to an augmented reality apparatus, and may specifically be as follows:
The first processing module 901 is configured to receive a position selection image for a conference scene sent by a conference access device, where the position selection image is displayed on a virtual display screen of an augmented reality device;
The first processing module 901 is further configured to determine a target position corresponding to the selection instruction in the position selection image in response to a selection instruction of the target user;
A first sending module 902, configured to send the target position to a conference access device, where the conference access device generates play content according to the target position and gesture information, and the gesture information is acquired by a gesture acquisition module of the augmented reality device;
the first processing module 901 is further configured to receive a play content sent by the conference access device, and play the play content on the virtual display screen.
Optionally, in some embodiments of the present application, the first processing module 901 is configured to:
determining coordinate information of a selection instruction on a position selection image;
And determining the target position in the conference scene according to the coordinate information.
Optionally, in some embodiments of the present application, the first processing module 901 is further configured to:
Acquiring a gesture image of a target user through first shooting;
Determining corresponding selection information of the gesture image in the position selection image;
And generating a selection instruction according to the selection information.
Optionally, in some embodiments of the present application, the first sending module 902 is configured to:
acquiring gesture information of a target user through a gesture acquisition module;
and sending the gesture information to the conference access equipment.
The method comprises the steps that firstly, a first processing module 901 receives a position selection image which is sent by conference access equipment and is aimed at a conference scene, the position selection image is displayed on a virtual display screen of the augmented reality equipment, then, the first processing module 901 responds to a selection instruction of a target user, a corresponding target position of the selection instruction in the position selection image is determined, then, the first sending module 902 sends the target position to the conference access equipment, the conference access equipment generates playing contents according to the target position and gesture information, the gesture information is acquired through a gesture acquisition module of the augmented reality equipment, and then, the first processing module 901 receives the playing contents sent by the conference access equipment and plays the playing contents on the virtual display screen.
The method comprises the steps of receiving a position selection image which is sent by a conference access device and is specific to a conference scene, displaying the position selection image on a virtual display screen of an augmented reality device, responding to a selection instruction of a target user, determining a corresponding target position of the selection instruction in the position selection image, sending the target position to the conference access device, generating playing contents according to the target position and gesture information by the conference access device, acquiring the gesture information through a gesture acquisition module of the augmented reality device, receiving the playing contents sent by the conference access device, playing the playing contents on the virtual display screen, returning the corresponding target position to the conference access device according to the selection instruction of the user, and then receiving and playing the playing contents generated by the conference access device according to the target position, so that the target user using the augmented reality device for remote conference brings audio-visual feeling of the conference scene, and the technical effect of the remote user is improved.
Referring to fig. 10, fig. 10 is a schematic structural diagram of a teleconference device provided in an embodiment of the present application, where the teleconference device 1000 is applied to a conference access apparatus, and may specifically be as follows:
the second processing module 1001 is configured to determine a position selection image corresponding to the conference scene through a plurality of second shooting modules, where the plurality of second shooting modules are disposed at different positions of the conference scene;
a second transmitting module 1002, configured to transmit the position selection image to the augmented reality device;
The second processing module 1001 is further configured to receive a target position sent by the augmented reality device, where the target position is generated in the position selection image according to a selection instruction of the target user;
The second processing module 1001 is further configured to generate a play content according to the target location, and send the play content to the augmented reality device, where the play content is played on the augmented reality device.
Optionally, in some embodiments of the present application, the second processing module 1001 is configured to:
acquiring a plurality of scene images of the conference scene through a plurality of second shooting modules;
A position selection image is determined from the plurality of scene images.
Optionally, in some embodiments of the present application, the second processing module 1001 is further configured to:
determining a first position area and a second position area in the conference scene according to the scene images, wherein the first position area is an unselected position area of the existing participating user, and the second position area is a position area of the non-participating user;
A position selection image is determined based on the first position area and the second position area.
Optionally, in some embodiments of the present application, the second processing module 1001 is further configured to:
acquiring a plurality of scene images of the conference scene through a plurality of second shooting modules;
acquiring a plurality of audio data of different angles in a conference scene through a plurality of audio acquisition modules, wherein the plurality of audio acquisition modules are arranged at different positions of the conference scene;
Determining a target picture according to the plurality of scene images and the target position, and determining audio content corresponding to the target picture according to the plurality of audio data and the target position;
and generating playing content according to the target picture and the audio content.
Optionally, in some embodiments of the present application, the second processing module 1001 is further configured to:
Receiving gesture information of a target user sent by an augmented reality device;
and determining a target picture according to the gesture information and the plurality of scene images, and determining audio content corresponding to the target picture according to the gesture information and the plurality of audio data.
In the embodiment of the application, first, the second processing module 1001 determines the position selection image corresponding to the conference scene through the plurality of second shooting modules, the plurality of second shooting modules are arranged at different positions of the conference scene, then the second sending module 1002 sends the position selection image to the augmented reality device, then the second processing module 1001 receives the target position sent by the augmented reality device, then the second processing module 1001 generates the play content according to the target position, and sends the play content to the augmented reality device, and the play content is played on the augmented reality device.
The embodiment of the application determines position selection images corresponding to a conference scene through a plurality of second shooting modules, the plurality of second shooting modules are arranged at different positions of the conference scene, sends the position selection images to the augmented reality equipment, receives target positions sent by the augmented reality equipment, generates play contents according to the target positions and sends the play contents to the augmented reality equipment, the corresponding target positions are returned to the conference access equipment according to the selection instructions of the users, and then receives and plays the play contents generated by the conference access equipment according to the target positions, thereby realizing the audiovisual feeling of a conference scene for target users using the augmented reality equipment to remotely participate and improving the technical effect of the participation experience of the remote users.
In addition, the present application further provides an electronic device, as shown in fig. 11, which shows a schematic structural diagram of the electronic device according to the present application, specifically:
The electronic device may include one or more processors 1101 of a processing core, memory 1102 of one or more computer readable storage media, a power supply 1103, an input unit 1104, a camera module 1105, and the like. Those skilled in the art will appreciate that the electronic device structure shown in fig. 11 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:
The processor 1101 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 1102 and invoking data stored in the memory 1102, thereby performing overall monitoring of the electronic device. Optionally, the processor 1101 may include one or more processing cores, and preferably the processor 1101 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, etc., and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1101.
The memory 1102 may be used to store software programs and modules, and the processor 1101 executes various functional applications and data processing by executing the software programs and modules stored in the memory 1102. The memory 1102 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the electronic device, etc. In addition, memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 1102 may also include a memory controller to provide the processor 1101 with access to the memory 1102.
The electronic device also includes a power supply 1103 that provides power to the various components, and the power supply 1103 may be logically connected to the processor 1101 by a power management system, such that the functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 1103 may also include one or more of any of a direct current or alternating current power supply, recharging system, power device commissioning circuitry, power converter or inverter, power status indicator, etc.
The electronic device may also include an input unit 1104, which input unit 1104 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 1101 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 1102 according to the following instructions, and the processor 1101 executes the application programs stored in the memory 1102, so as to implement the steps in any teleconference method provided by the embodiment of the present application.
The embodiment of the application receives the position selection image sent by the conference access equipment aiming at the conference scene, the position selection image is displayed on the virtual display screen of the augmented reality equipment, the corresponding target position of the selection instruction in the conference scene is determined in response to the selection instruction of the target user, the selection instruction acts on the position selection image, the target position is sent to the conference access equipment, the playing content sent by the conference access equipment is received, the playing content is played on the virtual display screen, the corresponding target position is returned to the conference access equipment according to the technical scheme generated by the conference access equipment according to the target position, and then the playing content generated by the conference access equipment according to the target position is received and played, so that the audio-visual feeling of the conference scene is brought to the target user using the augmented reality equipment for remote participation, and the technical effect of the participation experience of the remote user is improved.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, the present application provides a computer readable storage medium having stored thereon a computer program that can be loaded by a processor to perform the steps of any of the teleconferencing methods provided by the present application.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
The computer readable storage medium may include, among others, read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disks, and the like.
Because the instructions stored in the computer readable storage medium can execute the steps in any teleconferencing method provided by the present application, the beneficial effects that any teleconferencing method provided by the present application can achieve can be achieved, and detailed descriptions of the foregoing embodiments are omitted herein.
The foregoing describes in detail a teleconferencing method, apparatus, electronic device and computer readable storage medium, and specific examples are provided herein to illustrate the principles and embodiments of the present application and to assist in understanding the method and core ideas thereof, and meanwhile, the present disclosure should not be construed as limiting the application to any extent by those skilled in the art, depending on the ideas of the present application.

Claims (13)

1.一种远程会议方法,其特征在于,应用于扩展现实设备,所述方法包括:1. A remote conference method, characterized in that it is applied to an extended reality device, and the method comprises: 接收会议接入设备发送的针对会议场景的位置选择图像,所述位置选择图像显示在所述扩展现实设备的虚拟显示屏上;Receiving a location selection image for a conference scene sent by a conference access device, wherein the location selection image is displayed on a virtual display screen of the extended reality device; 响应于目标用户的选择指令,确定所述选择指令在所述位置选择图像中对应的目标位置;In response to a selection instruction of a target user, determining a target position corresponding to the selection instruction in the position selection image; 发送所述目标位置给所述会议接入设备,所述会议接入设备根据所述目标位置和姿态信息生成播放内容,所述姿态信息通过所述扩展现实设备的姿态采集模组获取;Sending the target position to the conference access device, the conference access device generating playback content according to the target position and posture information, the posture information being acquired by a posture acquisition module of the extended reality device; 接收所述会议接入设备发送的所述播放内容,并在所述虚拟显示屏上播放所述播放内容。The playback content sent by the conference access device is received, and the playback content is played on the virtual display screen. 2.根据权利要求1所述的方法,其特征在于,所述响应于目标用户的选择指令,确定所述选择指令在所述会议场景中对应的目标位置包括:2. The method according to claim 1, wherein, in response to a selection instruction of a target user, determining a target position corresponding to the selection instruction in the conference scene comprises: 确定所述选择指令在所述位置选择图像上的坐标信息;Determine the coordinate information of the selection instruction on the position selection image; 根据所述坐标信息确定所述会议场景中的所述目标位置。The target position in the conference scene is determined according to the coordinate information. 3.根据权利要求1所述的方法,其特征在于,所述扩展现实设备包括:第一拍摄模组,所述第一拍摄模组用于获取所述扩展现实设备前方的图像,在所述响应于目标用户的选择指令之前,所述方法还包括:3. The method according to claim 1, wherein the extended reality device comprises: a first shooting module, the first shooting module is used to obtain an image in front of the extended reality device, and before responding to the selection instruction of the target user, the method further comprises: 通过所述第一拍摄获取所述目标用户的手势图像;Acquire a gesture image of the target user through the first shooting; 确定所述手势图像在所述位置选择图像中对应的选择信息;Determine selection information corresponding to the gesture image in the position selection image; 根据所述选择信息生成所述选择指令。The selection instruction is generated according to the selection information. 4.根据权利要求1所述的方法,其特征在于,所述扩展现实设备包括:所述姿态采集模组,在所述接收所述会议接入设备发送的播放内容之前,所述方法还包括:4. The method according to claim 1, characterized in that the extended reality device comprises: the posture acquisition module, and before receiving the playback content sent by the conference access device, the method further comprises: 通过所述姿态采集模组获取所述目标用户的姿态信息;Acquiring the posture information of the target user through the posture acquisition module; 发送所述姿态信息给所述会议接入设备。The posture information is sent to the conference access device. 5.一种远程会议方法,其特征在于,应用于会议接入设备,所述会议接入设备包括多个第二拍摄模组,所述方法包括:5. A remote conference method, characterized in that it is applied to a conference access device, the conference access device includes a plurality of second shooting modules, and the method includes: 通过所述多个第二拍摄模组确定会议场景对应的位置选择图像,所述多个第二拍摄模组设置在所述会议场景的不同位置;Determine a position selection image corresponding to a conference scene by using the multiple second shooting modules, wherein the multiple second shooting modules are arranged at different positions of the conference scene; 发送所述位置选择图像给扩展现实设备;Sending the location selection image to an extended reality device; 接收所述扩展现实设备发送的目标位置,所述目标位置根据目标用户的选择指令在所述位置选择图像中生成;Receiving a target position sent by the extended reality device, wherein the target position is generated in the position selection image according to a selection instruction of a target user; 根据所述目标位置生成播放内容,并发送所述播放内容给所述扩展现实设备,所述播放内容在所述扩展现实设备上进行播放。The playback content is generated according to the target position, and the playback content is sent to the extended reality device, and the playback content is played on the extended reality device. 6.根据权利要求5所述的方法,其特征在于,所述通过所述多个第二拍摄模组确定会议场景对应的位置选择图像,包括:6. The method according to claim 5, characterized in that the step of determining the position selection image corresponding to the conference scene by using the plurality of second shooting modules comprises: 通过所述多个第二拍摄模组获取所述会议场景的多个场景图像;Acquire multiple scene images of the conference scene through the multiple second shooting modules; 根据所述多个场景图像确定所述位置选择图像。The position selection image is determined based on the plurality of scene images. 7.根据权利要求6所述的方法,其特征在于,所述根据所述多个场景图像确定所述位置选择图像,包括:7. The method according to claim 6, wherein determining the position selection image according to the plurality of scene images comprises: 根据所述多个场景图像确定所述会议场景中的第一位置区域和第二位置区域,其中,所述第一位置区域为已存在参会用户的无法进行选择的位置区域,所述第二位置区域为不存在参会用户的位置区域;Determine a first location area and a second location area in the conference scene according to the multiple scene images, wherein the first location area is a location area where conference users already exist and cannot be selected, and the second location area is a location area where no conference users exist; 根据所述第一位置区域和所述第二位置区域确定所述位置选择图像。The position selection image is determined according to the first position area and the second position area. 8.根据权利要求5所述的方法,其特征在于,所述会议接入设备包括多个音频采集模组,所述根据所述目标位置生成播放内容,包括:8. The method according to claim 5, wherein the conference access device comprises a plurality of audio acquisition modules, and the generating of playback content according to the target location comprises: 通过所述多个第二拍摄模组获取所述会议场景的多个场景图像;Acquire multiple scene images of the conference scene through the multiple second shooting modules; 通过所述多个音频采集模组获取所述会议场景中不同角度的多个音频数据,其中,所述多个音频采集模组设置在所述会议场景的不同位置;Acquire multiple audio data at different angles in the conference scene through the multiple audio acquisition modules, wherein the multiple audio acquisition modules are arranged at different positions of the conference scene; 根据所述多个场景图像和所述目标位置确定目标画面,以及根据所述多个音频数据和所述目标位置确定所述目标画面对应的音频内容;Determining a target picture according to the multiple scene images and the target position, and determining audio content corresponding to the target picture according to the multiple audio data and the target position; 根据所述目标画面和所述音频内容生成所述播放内容。The playback content is generated according to the target picture and the audio content. 9.根据权利要求8所述的方法,其特征在于,根据所述多个场景图像和所述目标位置确定目标画面,以及根据所述多个音频数据和所述目标位置确定所述目标画面对应的音频内容,包括:9. The method according to claim 8, characterized in that determining a target picture according to the multiple scene images and the target position, and determining audio content corresponding to the target picture according to the multiple audio data and the target position, comprises: 接收所述扩展现实设备发送的所述目标用户的姿态信息;Receiving the posture information of the target user sent by the extended reality device; 根据所述姿态信息和所述多个场景图像确定所述目标画面,以及根据所述姿态信息和所述多个音频数据确定所述目标画面对应的所述音频内容。The target picture is determined according to the posture information and the multiple scene images, and the audio content corresponding to the target picture is determined according to the posture information and the multiple audio data. 10.一种远程会议装置,其特征在于,应用于扩展现实设备,所述装置包括:10. A remote conference device, characterized in that it is applied to an extended reality device, and the device comprises: 第一处理模块,用于接收会议接入设备发送的针对会议场景的位置选择图像,所述位置选择图像显示在所述扩展现实设备的虚拟显示屏上;A first processing module, configured to receive a location selection image for a conference scene sent by a conference access device, wherein the location selection image is displayed on a virtual display screen of the extended reality device; 所述第一处理模块,还用于响应于目标用户的选择指令,确定所述选择指令在所述位置选择图像中对应的目标位置;The first processing module is further configured to determine, in response to a selection instruction of a target user, a target position corresponding to the selection instruction in the position selection image; 第一发送模块,用于发送所述目标位置给所述会议接入设备,所述会议接入设备根据所述目标位置和姿态信息生成播放内容,所述姿态信息通过所述扩展现实设备的姿态采集模组获取;A first sending module, configured to send the target position to the conference access device, wherein the conference access device generates playback content according to the target position and posture information, wherein the posture information is acquired by a posture acquisition module of the extended reality device; 所述第一处理模块,还用于接收所述会议接入设备发送的播放内容,并在所述虚拟显示屏上播放所述播放内容。The first processing module is further configured to receive the playback content sent by the conference access device and play the playback content on the virtual display screen. 11.一种远程会议装置,其特征在于,应用于会议接入设备,所述会议接入设备包括多个第二拍摄模组,所述方法包括:11. A remote conference device, characterized in that it is applied to a conference access device, the conference access device includes a plurality of second shooting modules, and the method includes: 第二处理模块,用于通过所述多个第二拍摄模组确定会议场景对应的位置选择图像,所述多个第二拍摄模组设置在所述会议场景的不同位置;A second processing module, configured to determine a position selection image corresponding to a conference scene through the plurality of second shooting modules, wherein the plurality of second shooting modules are arranged at different positions of the conference scene; 第二发送模块,用于发送所述位置选择图像给扩展现实设备;A second sending module, used for sending the position selection image to an extended reality device; 所述第二处理模块,还用于接收所述扩展现实设备发送的目标位置,所述目标位置根据目标用户的选择指令在所述位置选择图像中生成;The second processing module is further used to receive a target position sent by the extended reality device, wherein the target position is generated in the position selection image according to a selection instruction of a target user; 所述第二处理模块,还用于根据所述目标位置生成播放内容,并发送所述播放内容给所述扩展现实设备,所述播放内容在所述扩展现实设备上进行播放。The second processing module is further used to generate playback content according to the target position, and send the playback content to the extended reality device, and the playback content is played on the extended reality device. 12.一种电子设备,其特征在于,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1-9任一项所述的远程会议方法中的步骤。12. An electronic device, characterized in that it comprises a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps in the remote conferencing method according to any one of claims 1 to 9 are implemented. 13.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-9任一项所述的远程会议方法中的步骤。13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the remote conference method according to any one of claims 1 to 9 are implemented.
CN202411569805.8A 2024-11-04 2024-11-04 Teleconference method, teleconference device, electronic equipment and computer-readable storage medium Pending CN119583744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411569805.8A CN119583744A (en) 2024-11-04 2024-11-04 Teleconference method, teleconference device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411569805.8A CN119583744A (en) 2024-11-04 2024-11-04 Teleconference method, teleconference device, electronic equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN119583744A true CN119583744A (en) 2025-03-07

Family

ID=94799033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411569805.8A Pending CN119583744A (en) 2024-11-04 2024-11-04 Teleconference method, teleconference device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN119583744A (en)

Similar Documents

Publication Publication Date Title
US10602121B2 (en) Method, system and apparatus for capture-based immersive telepresence in virtual environment
AU2018227710B2 (en) Virtual and real object recording in mixed reality device
US11563998B2 (en) Video distribution system for live distributing video containing animation of character object generated based on motion of distributor user, video distribution method, and video distribution program
KR102581453B1 (en) Image processing for Head mounted display devices
US20180324229A1 (en) Systems and methods for providing expert assistance from a remote expert to a user operating an augmented reality device
WO2020210213A1 (en) Multiuser asymmetric immersive teleconferencing
WO2022108662A1 (en) Multiple device sensor input based avatar
CN114766038A (en) Individual views in a shared space
CN113168526B (en) System and method for virtual and augmented reality
CN111862348B (en) Video display method, video generation method, device, equipment and storage medium
US11461942B2 (en) Generating and signaling transition between panoramic images
CN107810634A (en) Display for three-dimensional augmented reality
US20240056492A1 (en) Presentations in Multi-user Communication Sessions
Roberts et al. Comparing the end to end latency of an immersive collaborative environment and a video conference
CN119583744A (en) Teleconference method, teleconference device, electronic equipment and computer-readable storage medium
KR20210056414A (en) System for controlling audio-enabled connected devices in mixed reality environments
CN115756263A (en) Script interaction method and device, storage medium, electronic equipment and product
Parikh et al. A mixed reality workspace using telepresence system
CN117041474B (en) Remote conference system and method based on virtual reality and artificial intelligence technology
CN114207557B (en) Synchronize virtual and physical camera positions
JP6937803B2 (en) Distribution A video distribution system, video distribution method, and video distribution program that delivers live video including animation of character objects generated based on the movement of the user.
US20240420718A1 (en) Voice processing for mixed reality
WO2024009653A1 (en) Information processing device, information processing method, and information processing system
JP2023134065A (en) Communication device, communication system, display method, and program
JP2023134089A (en) Communication device, communication system, display method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination