CN119583744A

CN119583744A - Teleconference method, teleconference device, electronic equipment and computer-readable storage medium

Info

Publication number: CN119583744A
Application number: CN202411569805.8A
Authority: CN
Inventors: 秦仙魁; 欧阳琼林
Original assignee: Thunderbird Innovation Technology Shenzhen Co ltd
Current assignee: Thunderbird Innovation Technology Shenzhen Co ltd
Priority date: 2024-11-04
Filing date: 2024-11-04
Publication date: 2025-03-07

Abstract

The embodiment of the present application discloses a remote conference method, device, electronic device and computer-readable storage medium, the method comprising: receiving a location selection image for a conference scene sent by a conference access device, the location selection image being displayed on a virtual display screen of an extended reality device; responding to a selection instruction of a target user, determining a target location corresponding to the selection instruction in the location selection image; sending the target location to the conference access device, the conference access device generating playback content based on the target location and posture information, the posture information being acquired by a posture acquisition module of the extended reality device; receiving the playback content sent by the conference access device, and playing the playback content on a virtual display screen. The remote conference method provided by the embodiment of the present application can bring the audio-visual experience of the conference site to the target user who uses an extended reality device to participate in a remote meeting, thereby improving the remote user's experience of participating in the meeting.

Description

Teleconference method, teleconference device, electronic equipment and computer-readable storage medium

Technical Field

The embodiment of the application relates to the technical field of augmented reality, in particular to a teleconference method, a teleconference device, electronic equipment and a computer-readable storage medium.

Background

With the continuous development of the augmented reality device, a user can perform a virtual reality conference through the VR device, a simulated real conference scene is displayed in a display screen of the VR device by constructing a virtual space and a character image, or an augmented reality conference can be performed through the AR device, and a conference screen is displayed in a real front view of the user by superposing a video conference interface in the display screen of the AR device.

In the prior art, when users use the existing augmented reality equipment to conduct teleconference, most of the users adopt the method that the live conference recorded on the conference site is presented to participants who conduct teleconference from the view angle of the recorder.

Therefore, even if the existing teleconference participants wear the augmented reality equipment, the existing teleconference participants do not have the experience of personally meeting the scene, and the use requirement of personally meeting the scene of the user cannot be met, so that the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a remote conference method, a remote conference device, electronic equipment and a computer readable storage medium, which can bring audiovisual feeling of a conference site to a target user using an augmented reality device for remote conference and promote the conference experience of the remote user.

In a first aspect, an embodiment of the present application provides a teleconferencing method, applied to an augmented reality device, including:

receiving a position selection image for a conference scene sent by conference access equipment, wherein the position selection image is displayed on a virtual display screen of the augmented reality equipment;

Responding to a selection instruction of a target user, and determining a corresponding target position of the selection instruction in the position selection image;

The target position is sent to the conference access equipment, the conference access equipment generates playing content according to the target position and gesture information, and the gesture information is acquired through a gesture acquisition module of the augmented reality equipment;

and receiving the play content sent by the conference access equipment and playing the play content on the virtual display screen.

Optionally, in some embodiments of the present application, the determining, in response to a selection instruction of a target user, a corresponding target position of the selection instruction in the conference scene includes:

Determining coordinate information of the selection instruction on the position selection image;

And determining the target position in the conference scene according to the coordinate information.

Optionally, in some embodiments of the present application, the augmented reality device includes a first shooting module for acquiring an image in front of the augmented reality device, and before the responding to the selection instruction of the target user, the method further includes:

Acquiring a gesture image of the target user through the first shooting;

determining corresponding selection information of the gesture image in the position selection image;

and generating the selection instruction according to the selection information.

Optionally, in some embodiments of the present application, the augmented reality device includes a gesture acquisition module, and before the receiving the play content sent by the conference access device, the method further includes:

acquiring the gesture information of the target user through the gesture acquisition module;

and sending the gesture information to the conference access equipment.

In a second aspect, an embodiment of the present application provides a teleconference method, which is applied to a conference access device, where the conference access device includes a plurality of second shooting modules, and the method includes:

determining position selection images corresponding to a conference scene through the plurality of second shooting modules, wherein the plurality of second shooting modules are arranged at different positions of the conference scene;

Transmitting the position selection image to an augmented reality device;

receiving a target position sent by the augmented reality equipment, wherein the target position is generated in the position selection image according to a selection instruction of a target user;

and generating playing content according to the target position, and sending the playing content to the augmented reality equipment, wherein the playing content is played on the augmented reality equipment.

Optionally, in some embodiments of the present application, the determining, by the plurality of second shooting modules, a position selection image corresponding to a conference scene includes:

Acquiring a plurality of scene images of the conference scene through the plurality of second shooting modules;

The position selection image is determined from the plurality of scene images.

Optionally, in some embodiments of the present application, the determining the position selection image according to the plurality of scene images includes:

determining a first position area and a second position area in the conference scene according to the scene images, wherein the first position area is an unselected position area of the existing participating user, and the second position area is a position area where the participating user does not exist;

and determining the position selection image according to the first position area and the second position area.

Optionally, in some embodiments of the present application, the conference access device includes a plurality of audio collection modules, and the generating playing content according to the target location includes:

acquiring a plurality of audio data of different angles in the conference scene through the plurality of audio acquisition modules, wherein the plurality of audio acquisition modules are arranged at different positions of the conference scene;

determining a target picture according to the plurality of scene images and the target position, and determining audio content corresponding to the target picture according to the plurality of audio data and the target position;

and generating the playing content according to the target picture and the audio content.

Optionally, in some embodiments of the present application, generating the play content according to the plurality of scene images, the plurality of audio data, and the target location information includes:

receiving gesture information of the target user sent by the augmented reality equipment;

and determining the target picture according to the gesture information and the plurality of scene images, and determining the audio content corresponding to the target picture according to the gesture information and the plurality of audio data.

In a third aspect, an embodiment of the present application further provides a teleconferencing apparatus, applied to an augmented reality device, including:

the first processing module is used for receiving a position selection image for a conference scene sent by the conference access equipment, and the position selection image is displayed on a virtual display screen of the augmented reality equipment;

The first processing module is further used for responding to a selection instruction of a target user and determining a corresponding target position of the selection instruction in the position selection image;

the first sending module is used for sending the target position to the conference access equipment, the conference access equipment generates playing content according to the target position and the gesture information, and the gesture information is acquired through a gesture acquisition module of the augmented reality equipment;

the first processing module is further configured to receive the play content sent by the conference access device, and play the play content on the virtual display screen.

Optionally, in some embodiments of the present application, the first processing module is configured to:

Optionally, in some embodiments of the present application, the first processing module is further configured to:

Acquiring a gesture image of the target user through the first shooting;

Optionally, in some embodiments of the present application, the first sending module is configured to:

and sending the gesture information to the conference access equipment.

In a fourth aspect, an embodiment of the present application further provides a teleconference device, applied to a conference access apparatus, where the teleconference device includes:

The second processing module is used for determining position selection images corresponding to the conference scene through the plurality of second shooting modules, and the plurality of second shooting modules are arranged at different positions of the conference scene;

the second sending module is used for sending the position selection image to the augmented reality equipment;

The second processing module is further configured to receive a target position sent by the augmented reality device, where the target position is generated in the position selection image according to a selection instruction of a target user;

The second processing module is further configured to generate a play content according to the target position, and send the play content to the augmented reality device, where the play content is played on the augmented reality device.

Optionally, in some embodiments of the application, the second processing module is configured to:

The position selection image is determined from the plurality of scene images.

Optionally, in some embodiments of the present application, the second processing module is further configured to:

In a fifth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the computer program when executed by the processor implements the steps in the teleconferencing method described above.

In a sixth aspect, embodiments of the present application further provide a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps of the teleconferencing method described above.

In a seventh aspect, embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations described in the embodiments of the present application.

In summary, the embodiment of the application receives a position selection image for a conference scene sent by conference access equipment, wherein the position selection image is displayed on a virtual display screen of the augmented reality equipment, responds to a selection instruction of a target user, determines a corresponding target position of the selection instruction in the position selection image, sends the target position to the conference access equipment, generates playing contents according to the target position and gesture information, the gesture information is acquired through a gesture acquisition module of the augmented reality equipment, receives the playing contents sent by the conference access equipment, plays the playing contents on the virtual display screen, returns the corresponding target position to the conference access equipment according to the selection instruction of the user, and then receives and plays the playing contents generated by the conference access equipment according to the target position, thereby realizing the technical effects of bringing the audiovisual experience of the conference scene to the target user using the augmented reality equipment to remotely participate and improving the participation experience of the remote user.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a scenario in which an augmented reality device according to an embodiment of the present application performs a teleconferencing method;

Fig. 2 is a schematic view of a scenario in which an augmented reality device according to an embodiment of the present application performs the teleconferencing method;

fig. 3 is a schematic structural diagram of XR glasses corresponding to a teleconferencing method according to an embodiment of the present application;

Fig. 4 is a schematic view of a scenario in which a conference access device provided in an embodiment of the present application performs a teleconference method;

Fig. 5 is a schematic view of a scenario in which a conference access device provided in an embodiment of the present application performs the teleconference method;

fig. 6 is a schematic view of a conference scenario provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a target location selection result according to an embodiment of the present application;

Fig. 8 is a schematic diagram of the overall structure of a teleconferencing method according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of a teleconference device of an augmented reality apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a teleconference device of a conference access apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the features described above. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the present application, the term "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" in this disclosure is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known structures and processes have not been described in detail so as not to obscure the description of the application with unnecessary detail. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

First, the terms related to the present application will be explained:

Augmented reality, augmented Reality, AR, is a technique that creates an enhanced perceived environment by combining virtual information with a real world scene.

An augmented reality device for fusing virtual content with the real world to provide an enhanced visual experience. These devices typically employ Head-Mounted displays (HMDs), smart glasses (SMART GLASSES), or other forms of wearable devices.

The embodiment of the application provides a teleconference method, a teleconference device, electronic equipment and a computer-readable storage medium. Specifically, the embodiment of the application provides a remote conference device suitable for a remote conference method, which comprises the remote conference device and a main control device of the remote conference device.

In the prior art, with the rapid development and the gradual maturity of the augmented reality display technology, the augmented reality display and the artificial intelligence large model make the augmented reality equipment have rich and colorful functional application, more and more wearable augmented reality equipment (such as AR, VR, XR glasses and the like) are sequentially marketed, and more remote conference scenes are carried out by using the wearable augmented reality equipment.

However, when a user uses an existing augmented reality device to conduct a teleconference, a method is mostly adopted in which a live conference recorded on a conference site is presented to participants who conduct the teleconference from the perspective of a recorder.

However, even if the augmented reality device is worn, the teleconference mode does not have the experience of being in the meeting place, and cannot meet the use requirement of being in the scene of the user, so that the user experience is poor.

Therefore, the existing remote conference method of the augmented reality equipment has the defect of insufficient real experience effect, and is not used for presenting play content to the remote online user at the view angle of the user in the conference site, so that the remote online user cannot be placed in the site conference, and the immersive conference feeling of the user cannot be brought.

The embodiment of the application provides a teleconference method, a teleconference device, electronic equipment and a computer-readable storage medium. The method comprises the steps of receiving a position selection image which is sent by conference access equipment and is aimed at a conference scene, displaying the position selection image on a virtual display screen of the augmented reality equipment, responding to a selection instruction of a target user, determining a corresponding target position of the selection instruction in the conference scene, enabling the selection instruction to act on the position selection image, sending the target position to the conference access equipment, receiving playing content sent by the conference access equipment, playing the playing content on the virtual display screen, and generating the playing content according to the target position by the conference access equipment.

In summary, the remote conference method in the embodiment of the application can return the corresponding target position to the conference access device by the selection instruction of the user, and then receive and play the play content generated by the conference access device according to the target position, thereby realizing the audiovisual feeling of the conference site for the target user using the augmented reality device to carry out remote conference and improving the technical effect of the conference experience of the remote user.

The following will describe in detail. It should be noted that the following description order of embodiments is not a limitation of the priority order of embodiments.

Referring to fig. 1, fig. 1 is a schematic view of a scenario of a teleconference method provided by an embodiment of the present application, where the teleconference system may include an augmented reality device 100 and a master device 200, and the augmented reality device 100 and the master device 200 may be connected by any communication method, including but not limited to signal communication through an electronic circuit, communication through a wireless signal, and the wireless signal may be a computer network communication of a TCP/IP protocol family (TCP/IP Protocol Suite, TCP/IP), a user datagram protocol (User Datagram Protocol, UDP), and so on. The augmented reality device 100 may receive a control signal from a remote controller or a control panel, and the augmented reality device 100 may also receive instruction information sent by the main control apparatus 200, and the augmented reality device 100 may perform a corresponding operation, such as a teleconferencing method according to the present application, according to the corresponding instruction information.

In an embodiment of the present application, the augmented reality device 100 includes, but is not limited to, a Head-Mounted Display (HMD), smart glasses (SMART GLASSES), or other forms of wearable devices, etc.

It will be appreciated by those skilled in the art that the application environment shown in fig. 1 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may also include more or fewer augmented reality devices than those shown in fig. 1, for example, only 1 augmented reality device is shown in fig. 1, and is not limited thereto.

As shown in fig. 1, the main control device 200 may include any hardware device capable of performing data processing and instruction transmission, such as a CPU or a single chip microcomputer embedded in the augmented reality device, and is not limited in this particular regard.

It should be noted that, the schematic view of the teleconference system shown in fig. 1 is only an example, and the teleconference system and the scene described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of the teleconference system and the appearance of a new service scenario, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

Specifically, referring to fig. 2, fig. 2 is a schematic view of a scenario in which an augmented reality device executes the teleconference method according to an embodiment of the present application, where a specific execution procedure of the augmented reality device executing the teleconference method is as follows:

and S201, receiving a position selection image for a conference scene sent by the conference access device.

In the embodiment of the application, the position selection image is displayed on a virtual display screen of the augmented reality device, and the virtual display screen is generated after the physical display screen is subjected to multiple propagation and conversion of a plurality of optical components.

As an alternative embodiment, the position selection image may be a plurality of different view position selection images for the user to select his or her desired position within a selectable area within the image.

The selection condition may include that the seat is selected with a gap, that the seat is added without a gap, and that the seat is added with a gap.

In the embodiment of the application, the conference scene is an off-line conference scene, such as an off-line conference room scene and the like.

As an alternative embodiment, after the conference access device at the site of the offline conference room acquires the site image, the generated position selection image is sent to the augmented reality device, and the augmented reality device receives the position selection image and displays the position selection image on the virtual display screen.

In the embodiment of the application, the extended reality device can be various wearable extended reality devices such as XR glasses, etc., and the XR terminal electronic device refers to human body wearable devices applied to the XR technology (including VR, AR and MR), namely XR intelligent glasses and XR head-mounted displays.

Optionally, as shown in the schematic XR glasses structure of fig. 3, the XR smart glasses have a natural form of ordinary glasses, and the glasses body is composed of a frame and legs, and further includes a computing processor (i.e. the above-mentioned main control device 200), a sensor, a display, a shooting module, an optical component, a speaker, a communication component, and the like, in addition to the components included in the ordinary glasses.

The computing processor is used for information data processing, logic operation and the like of the XR intelligent glasses and comprises CPU, GPU, NPU units and the like, and can be a general chip system or a chip system specially designed for XR business, such as Gao Tongxiao dragon AR1 chip system and Gao Tongxiao dragon AR2 chip system.

The sensor is used for the intelligent XR glasses to collect and observe the surrounding environment information of glasses including the user, and comprises a camera for collecting video image information, a gyroscope for measuring the gesture action of the user, an accelerometer for measuring the position of the user and the equipment, a magnetic force sensor for measuring the magnetic declination of the position of the equipment, an eye tracker for tracking the point of sight falling of the user and a microphone for collecting the sound information of the user.

The display is used for displaying and presenting pictures generated by rendering of the computing processor, and comprises self-luminous active devices micro-led and micro-oled, liquid crystal display LCD and LCOS which need external light source illumination, micro electro mechanical systems MEMS and digital mirror array DMD and the like.

The optical component is used for image imaging and transmitting image light into human eyes and comprises a lens, an optical film material, an optical waveguide and the like.

The speaker is used for playing sound.

The communication component is used for communication connection between the XR glasses and external other devices such as a platform cloud server, a conference site controller, XR glasses used by other participants, and the like, and can be a cellular communication module, a WiFi module, a Bluetooth module, and the like.

In addition, XR intelligent glasses still include the power, and the power can set up in XR glasses, also can separate and the electricity is connected with the glasses body. XR glasses may also include ring, watch, wristband, console, headset, etc. for use with devices that are optionally electrically connected to the glasses body when desired. It will be appreciated that the foregoing computing processor, sensors, speakers, communication components may also be disposed outside the eyeglass body in electrical communication therewith.

In the embodiment of the application, the virtual meeting position of the user on-line off-site meeting room, namely the target position, is determined, the on-site information of the off-line on-site meeting room observed by the virtual meeting position is generated, and the on-site information of the meeting room is presented to the user.

Optionally, in a meeting preparation stage before the meeting formally starts, the XR intelligent glasses are in communication connection with a meeting access device arranged on the on-line meeting room site, the meeting access device is an electronic device integrated with environment sensing and communication functions, and possibly also integrated with part of computing processing functions, the XR intelligent glasses send instructions to the meeting access device to instruct the meeting access device to shoot and collect meeting room site picture pictures before the meeting starts through a camera, and the meeting room site picture at least comprises information such as meeting room site arrangement, participants seat and the like. The intelligent glasses present the conference room live picture photos in a virtual display mode in front of the eyes of the user.

Alternatively, the conference room live picture photograph, i.e., the location selection image, may be a photograph to which additional information is added, such as personal information, location information, etc., of other participants of the conference in a remote on-line form. The user can select his own conference location directly on the location selection image.

Note that, the setting position of the conference access device is not particularly limited, and in the embodiment of the present application, the conference access device is exemplified as being installed in a conference scene.

S202, responding to a selection instruction of a target user, determining a corresponding target position of the selection instruction in a conference scene, wherein the selection instruction acts on a position selection image;

In the embodiment of the application, after the position selection image appears on the virtual display screen of the expansion display device, the user can select the position wanted by himself in the selectable area of the position selection image, and the expansion display device determines the corresponding target position in the conference scene according to the selection instruction.

As an alternative embodiment, the location selection situation may include a free available option, a free self-addition seat, and a free or self-addition seat. For example, the conference room has an empty seat, the user selects a vacancy as the position of the conference, for example, the conference room has no empty seat, the user adds a seat as the position of the conference, for example, the conference room has an empty seat, but the vacancy is the seat of the conference speaker, and the user can add a seat as the position of the conference participant.

Alternatively, the eyeglass end obtains data (scene, person, seat, etc.) transmitted from the server, and the user selects the location based on the scene and seat data.

In some embodiments of the present application, the augmented reality device includes a first shooting module, and before responding to a selection instruction of a target user, optionally, the teleconference method provided in the embodiment of the present application further includes acquiring a gesture image of the target user through first shooting, determining selection information corresponding to the gesture image in the position selection image, and generating the selection instruction according to the selection information.

In the embodiment of the application, the XR intelligent glasses can receive the input instruction of a user taking gestures and hand actions as the selection reference positions.

As an alternative embodiment, when the virtual display screen of the XR smart glasses is a reference position selection interface, the camera of the XR smart glasses takes a gesture action picture of the user to determine a selection instruction of the user.

Optionally, the remote conference method further comprises the steps of determining coordinate information of the selection instruction on the position selection image, and determining the target position in the conference scene according to the coordinate information.

In the embodiment of the application, the XR intelligent glasses receive and analyze the instruction of selecting the meeting position by the user, calculate the meeting position coordinates corresponding to the meeting position in the meeting place in the virtual picture, and send the meeting position corresponding to the meeting place to the meeting access device.

As an optional embodiment, after the camera of the XR smart glasses takes a picture of gesture actions of the user, the main control device interprets the gesture actions to determine a position selected by the user on the virtual screen (i.e. a position selection image), inquires specific position coordinates of the selected position in the virtual screen, and determines specific position coordinates of a selected reference position in the real conference room on the virtual screen through a coordinate conversion relationship between the virtual screen and the real conference room.

Optionally, the XR smart glasses generate an initial view of the live conference room after selecting the virtual meeting location of the user in the real conference room.

S203, sending the target position to conference access equipment;

In the embodiment of the application, after the target position of the user is determined, the target position is sent to the conference access equipment, so that the conference access equipment generates the playing content according to the target position.

As an alternative embodiment, the target position may also be submitted to the conference access device for processing confirmation, e.g. after obtaining a user selection instruction, the selection instruction and the corresponding image are sent to the conference access device, so that the conference access device determines the target position.

In some embodiments of the present application, the augmented reality device includes a gesture acquisition module, and optionally, before receiving the play content sent by the conference access device, the teleconference method provided in the embodiment of the present application further includes acquiring gesture information of the target user by the gesture acquisition module, and sending the gesture information to the conference access device.

In the embodiment of the application, after the teleconference user formally enters the conference, the on-site information of the on-site conference room is synchronously presented to the user in real time, so that the gesture acquisition module can acquire the gesture information of the user in real time after confirming the target position and send the gesture information to the conference access equipment.

Optionally, after the user debugs the confirmation effect on the initial picture of the conference room, the user formally joins the conference. The XR intelligent glasses calculate the head gesture of the user through measurement of a gyroscope, an accelerometer and the like, and the eye movement instrument measures the eye drop point of the user.

Optionally, the method for generating updated conference room pictures according to the head gesture parameters and the line-of-sight falling point parameters of the user can be that the conference access device is provided with more than 3 cameras at different positions of a conference site, and the cameras shoot pictures of the conference room site pictures in real time. The conference access device sends the position coordinates of the cameras and the photos shot at the same time to the XR intelligent glasses, the XR intelligent glasses acquire the current sight falling point of the user through the eye movement instrument and calculate to obtain the observation view angle of the user, and according to the pictures of the conference room, the participation positions of the user and the sight falling point shot at the same time, the XR intelligent glasses generate the conference room scene pictures shot and observed at the attention view angle of the participation positions of the user according to the information.

Optionally, the method for generating and updating the conference room picture according to the head gesture parameter and the user sight line drop point parameter can also be that the conference access device is provided with more than 3 cameras at different positions of the conference site, the cameras shoot pictures of the conference room site in real time, a 3D scene is built according to the pictures, and after the position and the head gesture of an observer are determined, the picture under the view angle of the observer is generated. The XR smart glasses control the micro-display optical assembly to be presented in an augmented reality manner in front of the user's eye.

In the embodiment of the application, the XR intelligent glasses receive the conference site sound collected by the conference access device through the microphone, and convert the conference site sound collected by the microphone into the conference site sound heard at the virtual participation position of the user.

Optionally, the sound conversion generating method may include determining a position of a sound source, determining a position of an acquisition microphone, determining a virtual reference position of a user, and converting a characteristic parameter of sound acquired by the microphone into a characteristic parameter of sound received at the virtual reference position of the user through an algorithm based on the relative position parameter. The computing processor of the XR smart glasses controls the acoustic modules such as the speakers of the XR smart glasses to present conference room site sounds around the user. During the user speaking phase, the microphone of the XR intelligent glasses collects user sound and transmits the user sound to the conference access device at the conference site.

S204, receiving the playing content sent by the conference access equipment, playing the playing content on the virtual display screen, and generating the playing content according to the target position by the conference access equipment.

In the embodiment of the application, after the conference access device generates the play content according to the target position, the play content is sent to the augmented reality device, and the augmented reality device receives the play content sent by the conference access device and plays the play content on the virtual display screen.

As an alternative embodiment, the XR smart glasses send instructions to the conference access device, the conference access device can move to a meeting position selected by a user, and a camera and a microphone of the conference access device observe and receive meeting room field picture information and sound information at the current moment at the meeting position.

Optionally, another method for generating meeting room site information by the XR intelligent glasses is that for sound information, a microphone of the meeting access device receives and records the sound information of the meeting room site at an original position, the recorded sound information comprises sound sources, tone, loudness and the like, the position of the meeting access device and the position of a user's participation are sent to the XR intelligent glasses, the XR intelligent glasses generate meeting site sound information which is observed at the position of the user's participation according to the information, and a loudspeaker is controlled by the XR intelligent glasses to play sound according to the meeting site sound information and present the sound to the user.

Optionally, for visual information, the conference access device is provided with more than 3 cameras at different positions of the conference site, and the cameras take pictures of the conference room site in real time. The conference access device sends the position coordinates of the cameras and the photos shot at the same time to the XR intelligent glasses, the XR intelligent glasses acquire the current sight falling point of the user through the eye movement instrument and calculate to obtain the observation view angle of the user, the XR intelligent glasses generate the conference room scene pictures shot and observed at the attention view angle according to the pictures of the conference room pictures shot at the same time, the participation positions of the user and the sight falling point of the user, and the XR intelligent glasses control the display screen and the optical components to display the conference room scene pictures to the user in a virtual display mode.

In this scenario, visual picture information received by the user through the XR smart glasses will switch as the user's attention viewing angle switches, which typically follows the conference speaker switch. In this scene, XR intelligent glasses can also accept the sound information that the record user sent to the meeting access ware, meeting access ware control speaker makes the sound in meeting place user's meeting position, builds the effect that the user just talks in the scene.

Referring to fig. 4, fig. 4 is a schematic view of a scenario of a teleconference method according to an embodiment of the present application, where the teleconference system may include a conference access device 400 and a processing apparatus 500, where the conference access device 400 and the processing apparatus 500 may be connected by any communication means, including but not limited to signal communication through electronic circuits, communication through wireless signals, and the wireless signals may be computer network communication of TCP/IP protocol (TCP/IP Protocol Suite, TCP/IP), user datagram protocol (User Datagram Protocol, UDP), and so on. The conference access device 400 may receive a control signal on a remote controller or a control panel, and the conference access device 400 may also receive instruction information sent by the processing apparatus 500, and the conference access device 400 may perform a corresponding operation, such as a remote conference method in the present application, according to the corresponding instruction information.

In an embodiment of the present application, conference access device 400 includes, but is not limited to, a remote conference access device or other form of conference access device on-line to a conference room site, and the like.

It will be appreciated by those skilled in the art that the application environment shown in fig. 4 is merely an application scenario of the present application, and is not limited to the application scenario of the present application, and other application environments may also include more or fewer conference access devices than those shown in fig. 4, for example, only 1 conference access device is shown in fig. 4, and in particular, the application is not limited thereto.

As shown in fig. 4, the processing apparatus 500 may include any hardware device capable of performing data processing and instruction transmission, such as a CPU or a single chip microcomputer embedded in the conference access device, and is not limited in this particular embodiment.

It should be noted that, the schematic view of the teleconference system shown in fig. 4 is only an example, and the teleconference system and the scene described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of the teleconference system and the appearance of the new service scenario, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

Specifically, referring to fig. 5, fig. 5 is a schematic view of a scenario in which a conference access device executes the teleconference method according to an embodiment of the present application, where a specific execution procedure of the conference access device executing the teleconference method is as follows:

S501, determining a position selection image corresponding to a conference scene through a plurality of second shooting modules, wherein the plurality of second shooting modules are arranged at different positions of the conference scene.

In the embodiment of the application, the scene is actually tested according to the number and the installation positions of the second shooting modules of the scene, and the effect achieved by the three-dimensional reconstruction technology can be achieved.

Optionally, the three-dimensional reconstruction technology is performed through the second shooting module, and the conference site is constructed one by one. The second shooting module acquires information and transmits the information to the processing device for unified processing to obtain related information such as conference scenes, personnel, seats and the like, and the processing device transmits the processed information to the augmented reality equipment.

Optionally, the teleconference method further comprises the steps of obtaining a plurality of scene images of the conference scene through a plurality of second shooting modules, and determining a position selection image according to the plurality of scene images.

In the embodiment of the application, as shown in a conference scene schematic diagram in fig. 6, a plurality of second shooting modules (cameras) in a conference room are arranged in advance to comprehensively capture scenes of a conference place, and corresponding pictures are transmitted to a server, wherein the server can be a processing device of equipment such as a site conference access device, intelligent glasses, a cloud server and the like.

It should be noted that the large sharing screen shown in fig. 6 is merely an example, and may be a main screen in a conference room, etc., and the positions selected by the user A, B, C, D, E, F and the installation positions of the plurality of second shooting modules are also exemplary, and may be adjusted according to actual application situations.

As an alternative embodiment, the server may use three-dimensional Reconstruction (3D Reconstruction) techniques to generate a three-dimensional model of the same conference scene as the current conference scene.

It should be noted that the three-dimensional reconstruction technique involves the steps of finally generating a three-dimensional model like a picture capturing scene 1. Data acquisition, taking pictures of the room from different angles using a plurality of cameras. These pictures need to cover every corner of the room in order to be able to capture the complete appearance of the room. 2. And extracting key feature points such as corner points, edges and the like from each picture. 3. And (3) feature matching, namely finding the same feature points among different pictures, and determining the corresponding relation among the different pictures. 4. Camera calibration-determining the internal parameters (focal length, principal point, etc.) and external parameters (position and orientation of the camera) of the camera is typically done by taking pictures of known patterns. 5. And generating a three-dimensional point cloud, namely converting points on the two-dimensional image into points in a three-dimensional space by utilizing feature matching and camera parameters to form the point cloud. 6. Surface reconstruction-recovering the surface of an object from a point cloud, which may involve techniques such as mesh generation, surface fitting, etc. 7. Texture mapping, namely mapping the texture of an original picture onto a reconstructed three-dimensional model to obtain a realistic appearance. 8. Post-processing, which may include the steps of smoothing, removing noise, optimizing the model, etc.

Optionally, the teleconference method further comprises the steps of determining a first position area and a second position area in the conference scene according to the plurality of scene images, wherein the first position area is an unselected position area of the existing participating user, the second position area is an unselected position area, and determining a position selection image according to the first position area and the second position area.

As an alternative embodiment, a selectable position area and a non-selectable position area in the conference scene are determined from the plurality of scene images, and a position selection image is generated from the selectable position area and the non-selectable position area.

S502, sending a position selection image to the augmented reality device.

In the embodiment of the application, a seat selection interface is generated according to the position data of the site conference, and the position selection image corresponding to the seat selection interface is sent to the augmented reality equipment for display, so that a user can select a seat.

S503, receiving the target position sent by the augmented reality device.

In the embodiment of the present application, as shown in the target position selection result schematic diagram in fig. 7, a user selection result, that is, a target position, sent by the augmented reality device is received, and corresponding play content is generated based on the target position.

It should be noted that the rectangle at the selected position of the glasses end shown in fig. 7 may be the orientation of the field of view of the user of the augmented reality device, that is, the field of view seen by the remote user in the augmented reality device is the direction of the field of view facing the shared large screen according to the selected position.

And S504, generating play content according to the target position, and sending the play content to the augmented reality equipment.

In the embodiment of the application, according to the seat selected by the user and the site conference three-dimensional model three-dimensionally reconstructed by the processing device, the corresponding system automatically moves, rotates, scales and the like the conference site three-dimensional model, accords with the position of a remote conference site as much as possible, and presents the conference site three-dimensional model at the glasses end.

Optionally, the corresponding three-dimensional character model is correspondingly generated and placed into the three-dimensional model of the conference site, and the three-dimensional model is bound with some conventional basic animation states (for example, idle, speaking, standing up, sitting down, turning around and clapping hands), and furthermore, the presentation of higher-level and complex actions can be realized by using the action capturing technology to drive the corresponding model.

In the embodiment of the application, according to the selected position of the glasses end, a three-dimensional model scene generated on the conference site, such as a graph, moves, rotates and places the scene, so that the participants on the glasses end can be in an on-site conference, the glasses end can be combined with 3dof (or 6 dof) of the glasses, the scene can be rotationally observed, or the scene roaming can be carried out.

The server acquires the site information in real time, transmits the site information to the glasses end, and drives the character expression, action and the like of the three-dimensional model scene according to different conference site conditions in real time.

As an alternative embodiment, a conference room picture that a user sees at a conference virtual meeting position (target position) may be obtained by inputting the virtual meeting position and an initial viewing position such as a front view conference room display large screen. As an example, the process of generating a conference scene three-dimensional model by three-dimensional reconstruction to regenerate a scene picture at a specified viewing angle at a specified location may be a NERF rendering process.

It should be noted that NERF (neural radiation field) is an emerging rendering technology, and the core objective is to solve how to generate images from some captured images under new viewing angles. Unlike conventional three-dimensional reconstruction methods, NERF does not represent the scene as an explicit form of point cloud, grid, voxel, etc., but rather models the scene as a continuous 5D radiation field that is implicitly stored in the neural network. NERF is trained by inputting sparse multi-angle images to obtain a neural radiation field model, and then a clear picture under any view angle can be rendered according to the model. The rendering process is to input the position and direction of the light rays emitted under a certain visual angle and the corresponding coordinates, then send the light rays into NERF to obtain volume density and color, and finally obtain a final image through volume rendering.

Optionally, the remote conference method further comprises the steps of receiving gesture information of a target user sent by the augmented reality device, determining a target picture according to the gesture information and the plurality of scene images, and determining audio content corresponding to the target picture according to the gesture information and the plurality of audio data.

In the embodiment of the application, after the teleconference user formally enters the conference, the on-site information of the on-site conference room can be synchronously presented to the user in real time, so that after the target position is confirmed, the gesture acquisition module can acquire the gesture information of the user in real time and send the gesture information to the conference access equipment, and the conference access equipment receives the gesture information sent by the augmented reality equipment in real time.

As an alternative embodiment, conference room presence information mainly refers to visual picture information and auditory sound information, and may include information such as smell if possible. Visual picture information of the live conference room observed by the virtual reference position (namely, the target position) is generated.

In some embodiments of the present application, the conference access device includes a plurality of audio acquisition modules, optionally, the teleconference method provided in the embodiment of the present application further includes acquiring a plurality of scene images of the conference scene through a plurality of second shooting modules, acquiring a plurality of audio data of different angles in the conference scene through a plurality of audio acquisition modules, wherein the plurality of audio acquisition modules are disposed at different positions of the conference scene, determining a target picture according to the plurality of scene images and the target position, determining audio content corresponding to the target picture according to the plurality of audio data and the target position, and generating play content according to the target picture and the audio content.

In the embodiment of the application, the surrounding environment data can be acquired through a gyroscope, a magnetic sensor, an IMU and the like carried by the intelligent glasses, the head gesture parameter of the user at the current moment is obtained by utilizing a 6Dof tracking technology, the view angle of the user at the current observation meeting room is further obtained, the virtual meeting position of the user at the scene meeting room is determined in response to the instruction of the user to select the meeting position at the current moment in the virtual picture, the view angle of the user in the three-dimensional model of the front meeting scene is updated, and then the picture of the meeting scene is also updated.

It should be noted that 6DOF (six degrees of freedom) provides a more complete description of spatial motion, including three translational degrees of freedom and three rotational degrees of freedom, X-axis translation, movement in the X-axis direction. And the Y-axis translation is movement along the Y-axis direction. Z-axis translation, movement in the Z-axis direction. Rotation about the X-axis, commonly referred to as Pitch (Pitch), rotation of the object about the X-axis. Rotating about the Y-axis, commonly referred to as Yaw (Yaw), rotation of the object about the Y-axis. Rotation about the Z-axis, commonly referred to as tumbling (Roll), rotation of the object about the Z-axis. The 6DOF allows more complex movements of objects in space, including the precise operation of the aircraft, robots in flight simulators, and the omnidirectional movements of the user's head and hands in virtual reality.

Optionally, the eye vision of the user can be tracked through an eye tracker carried by the intelligent glasses, so that the view angle of the user observing the conference room at the current moment can be obtained, the view angle of the user in the three-dimensional model of the conference scene is updated, and meanwhile, the conference scene picture is updated. The computing processor controls the display screen to display the generated picture, and the virtual picture of the conference room is presented to the user through imaging and transmission of the optical component.

By the steps, the virtual picture presented to the user changes along with the observation sight and the participating position of the remote participant user, and the virtual picture is equivalent to the picture of the conference room.

In the embodiment of the application, for the generation of the audible sound information of the live conference room observed by the virtual meeting position, the conference site sound collected by the conference access device through the microphone array (namely the audio collection module) arranged in the live conference room can be received through the XR intelligent glasses, and the conference site sound collected by the microphone is converted into the conference site sound heard by the user at the virtual meeting position.

Optionally, the sound conversion generation method may be that the position of the sound source is determined, the position of the collecting microphone is determined, the virtual reference position of the user is determined, and the characteristic parameters of the sound collected by the microphone are converted into the characteristic parameters of the sound received at the virtual reference position of the user through an algorithm based on the relative position parameters. By using the spatial audio technology, a computing processor of the XR intelligent glasses controls an acoustic module such as an XR loudspeaker to present conference room site sounds around a user.

The server can capture and analyze information such as speaking, actions and the like in real time, and then the identity, the position and the information corresponding to the three-dimensional model of the conference site are generated into data packets correspondingly and are transmitted to the eyeglass end.

After the data are acquired and analyzed by the glasses end, the corresponding character model in the corresponding conference site three-dimensional model of the driving glasses end is subjected to mouth shape and action driving, and meanwhile, the position of the sound source is moved to the position of the corresponding character model in the corresponding conference site three-dimensional model.

The user at the end of the glasses can also see the speaking person by rotating.

Optionally, the above steps are continuously repeated, so that the glasses end can see the actual conference effect similar to the scene, the action and the sound position of the person on the conference site.

With the embodiment of the application, the sound heard by the remote participant user varies with the location selected by the user, and the sound is equivalent to the live sound. The remote user can randomly switch the observation and listening positions and angles in the field conference room through the XR intelligent glasses in the form of virtual participation. The intelligent glasses control the camera to shoot the image photo of the user, the microphone is controlled to collect the sound of the user and transmit the sound to the conference access device on site, the conference access device can control the optical instrument on site to display the image of the user in a holographic manner at the virtual meeting position selected by the user, and the loudspeaker array on site can be controlled to play the sound equivalent to the sound emitted by the user at the meeting position in the conference room.

In addition, if the on-site participants use the PC client to participate in the conference, the PC client can directly acquire the identity and the voice of the speaker. If the video conference is started, the video information of the speaker can be directly obtained.

In the embodiment of the present application, as shown in the overall structure schematic diagram of the teleconference method in fig. 8, the main terminal may have a server terminal (processing device of the conference access device) for providing related services, and is responsible for management of the conference on the XR glasses line, different server services may be obtained by logging in the client under different scenes, and one or more second shooting modules on the conference site are controlled to obtain the situation of the current environmental site, so as to analyze the position layout of the conference seat, whether personnel exist on the position, and the like, and transmit the position layout to the server in real time. If the meeting participants participate in the meeting, the computer end (PC end) synchronizes the sharing information of the meeting, the camera module of the computer end can acquire and capture the head-in-body image information and transmit the information to the server in real time, and meanwhile, the virtual image of the participant at the glasses end can be projected in a holographic mode.

Specifically, the on-site participants speak through the PC client and transmit the corresponding identity, sound and video information to the server. The server acquires the three-dimensional model and character model information of the conference site according to the corresponding identity, and then the information is transmitted to the glasses terminal after the video information step to the action and expression of the character, and the information is packed into an information data packet. And after the data are acquired and analyzed by the glasses end, the corresponding character model in the corresponding conference site three-dimensional model of the driving glasses end is subjected to mouth shape and action driving, and meanwhile, the position of the sound source is moved to the position of the corresponding character model in the corresponding conference site three-dimensional model.

In the embodiment of the application, XR glasses log in a conference, firstly, a corresponding conference scene is generated on the glasses according to the site environment scanned by a server and the actual conference site layout, then, a position is preliminarily allocated to a glasses end according to the site vacant position condition, the display scene on the glasses, the position and the site conference scene are basically consistent, and a client at the glasses end can roam in the scene or sit and twist to watch a person or speak by combining Slam of the glasses. And the client at the glasses end hears the speaking in different conferences, the hearing sound is similar to the real scene according to different positions, and is the sound effect of the 3D scene, and the server end transmits related information according to the character information of the scene, and builds a corresponding scene character model in the 3D scene of the glasses end, so that the information of the actions, the expressions and the like of the scene character can be synchronized in real time.

According to the embodiment of the application, the position selection image corresponding to the conference scene is determined through the plurality of second shooting modules, the plurality of second shooting modules are arranged at different positions of the conference scene, the position selection image is sent to the augmented reality equipment, the target position sent by the augmented reality equipment is received, the playing content is generated according to the target position, the playing content is sent to the augmented reality equipment, the corresponding target position is returned to the conference access equipment according to the selection instruction of the user, and then the playing content generated by the conference access equipment according to the target position is received and played, so that the audio-visual feeling of the conference scene is brought to the target user remotely participating in by using the augmented reality equipment, and the technical effect of the participation experience of the remote user is improved.

In order to facilitate better implementation of the teleconference method, the application also provides a teleconference device based on the teleconference method. Where nouns have the same meaning as in the teleconferencing method described above, specific implementation details may be referred to in the description of the method embodiments.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a teleconference device provided in an embodiment of the present application, where the teleconference device 900 is applied to an augmented reality apparatus, and may specifically be as follows:

The first processing module 901 is configured to receive a position selection image for a conference scene sent by a conference access device, where the position selection image is displayed on a virtual display screen of an augmented reality device;

The first processing module 901 is further configured to determine a target position corresponding to the selection instruction in the position selection image in response to a selection instruction of the target user;

A first sending module 902, configured to send the target position to a conference access device, where the conference access device generates play content according to the target position and gesture information, and the gesture information is acquired by a gesture acquisition module of the augmented reality device;

the first processing module 901 is further configured to receive a play content sent by the conference access device, and play the play content on the virtual display screen.

Optionally, in some embodiments of the present application, the first processing module 901 is configured to:

determining coordinate information of a selection instruction on a position selection image;

Optionally, in some embodiments of the present application, the first processing module 901 is further configured to:

Acquiring a gesture image of a target user through first shooting;

And generating a selection instruction according to the selection information.

Optionally, in some embodiments of the present application, the first sending module 902 is configured to:

acquiring gesture information of a target user through a gesture acquisition module;

and sending the gesture information to the conference access equipment.

The method comprises the steps that firstly, a first processing module 901 receives a position selection image which is sent by conference access equipment and is aimed at a conference scene, the position selection image is displayed on a virtual display screen of the augmented reality equipment, then, the first processing module 901 responds to a selection instruction of a target user, a corresponding target position of the selection instruction in the position selection image is determined, then, the first sending module 902 sends the target position to the conference access equipment, the conference access equipment generates playing contents according to the target position and gesture information, the gesture information is acquired through a gesture acquisition module of the augmented reality equipment, and then, the first processing module 901 receives the playing contents sent by the conference access equipment and plays the playing contents on the virtual display screen.

The method comprises the steps of receiving a position selection image which is sent by a conference access device and is specific to a conference scene, displaying the position selection image on a virtual display screen of an augmented reality device, responding to a selection instruction of a target user, determining a corresponding target position of the selection instruction in the position selection image, sending the target position to the conference access device, generating playing contents according to the target position and gesture information by the conference access device, acquiring the gesture information through a gesture acquisition module of the augmented reality device, receiving the playing contents sent by the conference access device, playing the playing contents on the virtual display screen, returning the corresponding target position to the conference access device according to the selection instruction of the user, and then receiving and playing the playing contents generated by the conference access device according to the target position, so that the target user using the augmented reality device for remote conference brings audio-visual feeling of the conference scene, and the technical effect of the remote user is improved.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a teleconference device provided in an embodiment of the present application, where the teleconference device 1000 is applied to a conference access apparatus, and may specifically be as follows:

the second processing module 1001 is configured to determine a position selection image corresponding to the conference scene through a plurality of second shooting modules, where the plurality of second shooting modules are disposed at different positions of the conference scene;

a second transmitting module 1002, configured to transmit the position selection image to the augmented reality device;

The second processing module 1001 is further configured to receive a target position sent by the augmented reality device, where the target position is generated in the position selection image according to a selection instruction of the target user;

The second processing module 1001 is further configured to generate a play content according to the target location, and send the play content to the augmented reality device, where the play content is played on the augmented reality device.

Optionally, in some embodiments of the present application, the second processing module 1001 is configured to:

acquiring a plurality of scene images of the conference scene through a plurality of second shooting modules;

A position selection image is determined from the plurality of scene images.

Optionally, in some embodiments of the present application, the second processing module 1001 is further configured to:

determining a first position area and a second position area in the conference scene according to the scene images, wherein the first position area is an unselected position area of the existing participating user, and the second position area is a position area of the non-participating user;

A position selection image is determined based on the first position area and the second position area.

acquiring a plurality of audio data of different angles in a conference scene through a plurality of audio acquisition modules, wherein the plurality of audio acquisition modules are arranged at different positions of the conference scene;

and generating playing content according to the target picture and the audio content.

Receiving gesture information of a target user sent by an augmented reality device;

and determining a target picture according to the gesture information and the plurality of scene images, and determining audio content corresponding to the target picture according to the gesture information and the plurality of audio data.

In the embodiment of the application, first, the second processing module 1001 determines the position selection image corresponding to the conference scene through the plurality of second shooting modules, the plurality of second shooting modules are arranged at different positions of the conference scene, then the second sending module 1002 sends the position selection image to the augmented reality device, then the second processing module 1001 receives the target position sent by the augmented reality device, then the second processing module 1001 generates the play content according to the target position, and sends the play content to the augmented reality device, and the play content is played on the augmented reality device.

The embodiment of the application determines position selection images corresponding to a conference scene through a plurality of second shooting modules, the plurality of second shooting modules are arranged at different positions of the conference scene, sends the position selection images to the augmented reality equipment, receives target positions sent by the augmented reality equipment, generates play contents according to the target positions and sends the play contents to the augmented reality equipment, the corresponding target positions are returned to the conference access equipment according to the selection instructions of the users, and then receives and plays the play contents generated by the conference access equipment according to the target positions, thereby realizing the audiovisual feeling of a conference scene for target users using the augmented reality equipment to remotely participate and improving the technical effect of the participation experience of the remote users.

In addition, the present application further provides an electronic device, as shown in fig. 11, which shows a schematic structural diagram of the electronic device according to the present application, specifically:

The electronic device may include one or more processors 1101 of a processing core, memory 1102 of one or more computer readable storage media, a power supply 1103, an input unit 1104, a camera module 1105, and the like. Those skilled in the art will appreciate that the electronic device structure shown in fig. 11 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:

The processor 1101 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 1102 and invoking data stored in the memory 1102, thereby performing overall monitoring of the electronic device. Optionally, the processor 1101 may include one or more processing cores, and preferably the processor 1101 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, etc., and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1101.

The memory 1102 may be used to store software programs and modules, and the processor 1101 executes various functional applications and data processing by executing the software programs and modules stored in the memory 1102. The memory 1102 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the electronic device, etc. In addition, memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 1102 may also include a memory controller to provide the processor 1101 with access to the memory 1102.

The electronic device also includes a power supply 1103 that provides power to the various components, and the power supply 1103 may be logically connected to the processor 1101 by a power management system, such that the functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 1103 may also include one or more of any of a direct current or alternating current power supply, recharging system, power device commissioning circuitry, power converter or inverter, power status indicator, etc.

The electronic device may also include an input unit 1104, which input unit 1104 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 1101 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 1102 according to the following instructions, and the processor 1101 executes the application programs stored in the memory 1102, so as to implement the steps in any teleconference method provided by the embodiment of the present application.

The embodiment of the application receives the position selection image sent by the conference access equipment aiming at the conference scene, the position selection image is displayed on the virtual display screen of the augmented reality equipment, the corresponding target position of the selection instruction in the conference scene is determined in response to the selection instruction of the target user, the selection instruction acts on the position selection image, the target position is sent to the conference access equipment, the playing content sent by the conference access equipment is received, the playing content is played on the virtual display screen, the corresponding target position is returned to the conference access equipment according to the technical scheme generated by the conference access equipment according to the target position, and then the playing content generated by the conference access equipment according to the target position is received and played, so that the audio-visual feeling of the conference scene is brought to the target user using the augmented reality equipment for remote participation, and the technical effect of the participation experience of the remote user is improved.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, the present application provides a computer readable storage medium having stored thereon a computer program that can be loaded by a processor to perform the steps of any of the teleconferencing methods provided by the present application.

The computer readable storage medium may include, among others, read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disks, and the like.

Because the instructions stored in the computer readable storage medium can execute the steps in any teleconferencing method provided by the present application, the beneficial effects that any teleconferencing method provided by the present application can achieve can be achieved, and detailed descriptions of the foregoing embodiments are omitted herein.

The foregoing describes in detail a teleconferencing method, apparatus, electronic device and computer readable storage medium, and specific examples are provided herein to illustrate the principles and embodiments of the present application and to assist in understanding the method and core ideas thereof, and meanwhile, the present disclosure should not be construed as limiting the application to any extent by those skilled in the art, depending on the ideas of the present application.

Claims

1. A remote conference method, characterized in that it is applied to an extended reality device, and the method comprises:

Receiving a location selection image for a conference scene sent by a conference access device, wherein the location selection image is displayed on a virtual display screen of the extended reality device;

In response to a selection instruction of a target user, determining a target position corresponding to the selection instruction in the position selection image;

Sending the target position to the conference access device, the conference access device generating playback content according to the target position and posture information, the posture information being acquired by a posture acquisition module of the extended reality device;

The playback content sent by the conference access device is received, and the playback content is played on the virtual display screen.

2. The method according to claim 1, wherein, in response to a selection instruction of a target user, determining a target position corresponding to the selection instruction in the conference scene comprises:

Determine the coordinate information of the selection instruction on the position selection image;

The target position in the conference scene is determined according to the coordinate information.

3. The method according to claim 1, wherein the extended reality device comprises: a first shooting module, the first shooting module is used to obtain an image in front of the extended reality device, and before responding to the selection instruction of the target user, the method further comprises:

Acquire a gesture image of the target user through the first shooting;

Determine selection information corresponding to the gesture image in the position selection image;

The selection instruction is generated according to the selection information.

4. The method according to claim 1, characterized in that the extended reality device comprises: the posture acquisition module, and before receiving the playback content sent by the conference access device, the method further comprises:

Acquiring the posture information of the target user through the posture acquisition module;

The posture information is sent to the conference access device.

5. A remote conference method, characterized in that it is applied to a conference access device, the conference access device includes a plurality of second shooting modules, and the method includes:

Determine a position selection image corresponding to a conference scene by using the multiple second shooting modules, wherein the multiple second shooting modules are arranged at different positions of the conference scene;

Sending the location selection image to an extended reality device;

Receiving a target position sent by the extended reality device, wherein the target position is generated in the position selection image according to a selection instruction of a target user;

The playback content is generated according to the target position, and the playback content is sent to the extended reality device, and the playback content is played on the extended reality device.

6. The method according to claim 5, characterized in that the step of determining the position selection image corresponding to the conference scene by using the plurality of second shooting modules comprises:

Acquire multiple scene images of the conference scene through the multiple second shooting modules;

The position selection image is determined based on the plurality of scene images.

7. The method according to claim 6, wherein determining the position selection image according to the plurality of scene images comprises:

Determine a first location area and a second location area in the conference scene according to the multiple scene images, wherein the first location area is a location area where conference users already exist and cannot be selected, and the second location area is a location area where no conference users exist;

The position selection image is determined according to the first position area and the second position area.

8. The method according to claim 5, wherein the conference access device comprises a plurality of audio acquisition modules, and the generating of playback content according to the target location comprises:

Acquire multiple audio data at different angles in the conference scene through the multiple audio acquisition modules, wherein the multiple audio acquisition modules are arranged at different positions of the conference scene;

Determining a target picture according to the multiple scene images and the target position, and determining audio content corresponding to the target picture according to the multiple audio data and the target position;

The playback content is generated according to the target picture and the audio content.

9. The method according to claim 8, characterized in that determining a target picture according to the multiple scene images and the target position, and determining audio content corresponding to the target picture according to the multiple audio data and the target position, comprises:

Receiving the posture information of the target user sent by the extended reality device;

The target picture is determined according to the posture information and the multiple scene images, and the audio content corresponding to the target picture is determined according to the posture information and the multiple audio data.

10. A remote conference device, characterized in that it is applied to an extended reality device, and the device comprises:

A first processing module, configured to receive a location selection image for a conference scene sent by a conference access device, wherein the location selection image is displayed on a virtual display screen of the extended reality device;

The first processing module is further configured to determine, in response to a selection instruction of a target user, a target position corresponding to the selection instruction in the position selection image;

A first sending module, configured to send the target position to the conference access device, wherein the conference access device generates playback content according to the target position and posture information, wherein the posture information is acquired by a posture acquisition module of the extended reality device;

The first processing module is further configured to receive the playback content sent by the conference access device and play the playback content on the virtual display screen.

11. A remote conference device, characterized in that it is applied to a conference access device, the conference access device includes a plurality of second shooting modules, and the method includes:

A second processing module, configured to determine a position selection image corresponding to a conference scene through the plurality of second shooting modules, wherein the plurality of second shooting modules are arranged at different positions of the conference scene;

A second sending module, used for sending the position selection image to an extended reality device;

The second processing module is further used to receive a target position sent by the extended reality device, wherein the target position is generated in the position selection image according to a selection instruction of a target user;

The second processing module is further used to generate playback content according to the target position, and send the playback content to the extended reality device, and the playback content is played on the extended reality device.

12. An electronic device, characterized in that it comprises a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps in the remote conferencing method according to any one of claims 1 to 9 are implemented.

13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the remote conference method according to any one of claims 1 to 9 are implemented.