CN116866336A

CN116866336A - Method and equipment for performing remote assistance

Info

Publication number: CN116866336A
Application number: CN202310863598.6A
Authority: CN
Inventors: 胡军; 廖春元; 徐健钢; 李文卿
Original assignee: Hiscene Information Technology Co Ltd
Current assignee: Hiscene Information Technology Co Ltd
Priority date: 2019-03-29
Filing date: 2019-04-10
Publication date: 2023-10-10
Also published as: CN110138831A

Abstract

The application aims to provide a method and equipment for carrying out remote assistance, which specifically comprise the following steps: if the communication connection between the first user equipment and the second user equipment meets a preset weak network triggering condition, acquiring image information about a scene to be assisted; transmitting the image information to the second user equipment; receiving remote assistance information which is returned by the second user equipment and related to the image information; and presenting the remote assistance information through the first user equipment. The application can realize good remote assistance when in weak network, can improve assistance efficiency while saving cost, can greatly save bandwidth/flow, can complete good remote assistance even in 2G network, and improves user experience.

Description

Method and equipment for performing remote assistance

The scheme is a divisional application of a method and equipment for remote assistance (application number: 201910284878.5, application date: 2019.04.10)

Priority of case CN201910250594.4 (filing date: 2019.03.29)

Technical Field

The application relates to the field of communication, in particular to a technology for performing remote assistance.

Background

Augmented Reality (AR) is a technology for calculating the position and angle of a camera image in real time and adding a corresponding image, and the goal of the technology is to fit a virtual world around the real world on a screen and interact with the virtual world. The AR remote assistance is a process of conducting remote operation guidance on a local communication party by utilizing audio and video communication and through an AR display mode. In the existing remote assistance method, when the network is poor, the video frame rate, resolution, video quality and the like are reduced, or the video is directly closed, and audio is used for remote assistance. The remote assistance mode can not accurately grasp the field environment, can bring a certain obstacle to remote assistance, affects the efficiency of remote assistance, and even can not achieve the assistance effect.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for remote assistance.

According to one aspect of the present application, there is provided a method for remote assistance at a first ue, the method comprising:

if the communication connection between the first user equipment and the second user equipment meets a preset weak network triggering condition, acquiring image information about a scene to be assisted;

Transmitting the image information to the second user equipment;

receiving remote assistance information which is returned by the second user equipment and related to the image information;

and presenting the remote assistance information through the first user equipment.

According to still another aspect of the present application, there is provided a method for performing remote assistance at a second ue, the method including:

receiving and presenting image information about a scene to be assisted, which is sent by first user equipment;

acquiring remote assistance information for guiding a user about the image information;

and sending the remote assistance information to the first user equipment.

According to one aspect of the present application, there is provided a method of remote assistance, the method comprising:

if the communication connection between the first user equipment and the second user equipment meets a preset weak network triggering condition, the first user equipment acquires image information about a scene to be assisted and sends the image information to the second user equipment;

the second user equipment receives and presents the image information, acquires remote assistance information for guiding a user about the image information, and sends the remote assistance information to the first user equipment;

The first user equipment receives the remote assistance information and presents the remote assistance information through the first user equipment.

According to one aspect of the present application there is provided a first user equipment for remote assistance, the apparatus comprising:

the one-to-one module is used for acquiring image information about a scene to be assisted if the communication connection between the first user equipment and the second user equipment meets a preset weak network triggering condition;

a second module for transmitting the image information to the second user equipment;

the three modules are used for receiving remote assistance information which is returned by the second user equipment and related to the image information;

and the four modules are used for presenting the remote assistance information through the first user equipment.

According to another aspect of the present application there is provided a second user equipment for remote assistance, the apparatus comprising:

the second module is used for receiving and presenting the image information about the site to be assisted, which is sent by the first user equipment;

the second module is used for acquiring remote assistance information for guiding a user about the image information;

and the second and third modules are used for sending the remote assistance information to the first user equipment.

According to one aspect of the present application there is provided a system for remote assistance comprising a first user equipment as described above and a second user equipment as described above.

According to one aspect of the present application there is provided an apparatus for remote assistance, the apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the operations of any of the methods described above.

According to one aspect of the present application there is provided a computer readable medium storing instructions that, when executed, cause a system to perform the operations of any of the methods described above.

Compared with the prior art, the method and the device have the advantages that when the communication connection between the first user equipment and the second user equipment meets the preset weak network triggering condition, the image information about the site to be assisted is acquired, the corresponding remote assistance information is acquired based on the image information, and the remote assistance information is presented. The application can realize good remote assistance when in weak network, can improve assistance efficiency while saving cost, can greatly save bandwidth/flow, can complete good remote assistance even in 2G network, and improves user experience.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a system topology for remote assistance in accordance with one embodiment of the present application;

FIG. 2 illustrates a flow chart of a method of remote assistance in accordance with an aspect of the subject application;

FIG. 3 illustrates a flow chart of a method of remote assistance at a first user equipment end in accordance with an aspect of the application;

FIG. 4 illustrates an exemplary diagram of interactions based on interaction type information, according to one embodiment of the application;

FIG. 5 illustrates an example of interactions based on interaction type information, according to one embodiment of the application;

FIG. 6 illustrates an example of interactions based on interaction type information, according to one embodiment of the application;

FIG. 7 illustrates an example of interactions based on interaction type information, according to one embodiment of the application;

FIG. 8 is a flow chart of a method of remote assistance at a second user equipment end in accordance with another aspect of the present application;

FIG. 9 illustrates functional modules of a system for remote assistance in accordance with an aspect of the present application;

FIG. 10 illustrates an exemplary system that can be used to implement various embodiments described in the present application.

The same or similar reference numbers in the drawings refer to the same or similar parts.

Detailed Description

The application is described in further detail below with reference to the accompanying drawings.

In one exemplary configuration of the application, the terminal, the device of the service network, and the trusted party each include one or more processors (e.g., central processing unit (CentralProcessingUnit, CPU)), input/output interfaces, network interfaces, and memory.

The memory may include non-volatile memory in a computer readable medium, random access memory (RandomAccessMemory, RAM) and/or non-volatile memory, etc., such as read-only memory (ReadOnlyMemory, ROM) or flash memory (flash memory). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-change memory (Phase-ChangeMemory, PCM), programmable random access memory (ProgrammableRandomAccessMemory, PRAM), static random access memory (staticrrandom access memory-AccessMemory, SRAM), dynamic random access memory (Dynamic RandomAccessMemory, DRAM), other types of Random Access Memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (Electrically-Erasable ProgrammableRead-OnlyMemory, EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DigitalVersatileDisc, DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device.

The device includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product which can perform man-machine interaction with a user (such as man-machine interaction through a touch pad), for example, a smart phone, a tablet computer and the like, and the mobile electronic product can adopt any operating system, for example, an android operating system, an iOS operating system and the like. The network device includes an electronic device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), a programmable logic device (ProgrammableLogicDevice, PLD), a field programmable gate array (FieldProgrammableGateArray, FPGA), a digital signal processor (DigitalSignal Processor, a DSP), an embedded device, and the like. The network device includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud of servers; here, the cloud is composed of a large number of computers or web servers based on cloud computing (CloudComputing), which is a kind of distributed computing, a virtual supercomputer consisting of a group of loosely coupled computer sets. Including but not limited to the internet, wide area networks, metropolitan area networks, local area networks, VPN networks, wireless ad hoc networks (ad hoc networks), and the like. Preferably, the device may be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the above-described devices are merely examples, and that other devices now known or hereafter may be present as applicable to the present application, and are intended to be within the scope of the present application and are incorporated herein by reference.

In the description of the present application, the meaning of "a plurality" is two or more unless explicitly defined otherwise.

Fig. 1 shows a typical scenario of the present application, where a communication connection is established between a first user device 100 and a second user device 200, where the communication connection may be a communication connection directly established by a wired or wireless manner, or a corresponding communication connection established through a cloud end. The first user equipment 100 includes a mobile terminal, such as a mobile phone, a tablet, and an augmented reality glasses, etc., in which a corresponding augmented reality application is installed in the remote assistance, and the mobile terminal is capable of displaying corresponding augmented reality content in a superimposed manner, where the augmented reality content includes superimposed content that is presented in a display device (such as a screen, etc.) and is sleeved on a current scene image captured by the image capturing device. The second user device includes, but is not limited to, any mobile electronic product that can interact with the user (e.g., via a touch pad), such as a smart phone, a tablet, augmented reality glasses, etc. In this regard, the following embodiments are described by taking the direct communication between the first ue and the second ue as an example, and those skilled in the art should understand that the embodiments are equally applicable to the communication connection between the first ue and the second ue through the cloud, and are also included in the scope of the present application.

Referring to the system topology shown in fig. 1, fig. 2 illustrates a method of remote assistance according to one aspect of the application, wherein the method comprises:

if the communication connection between the first user equipment and the second user equipment meets a preset weak network triggering condition, the first user equipment acquires image information about a scene to be assisted and sends the image information to the second user equipment; the second user equipment receives and presents the image information, acquires remote assistance information for guiding a user about the image information, and sends the remote assistance information to the first user equipment; the first user equipment receives the remote assistance information and presents the remote assistance information through the first user equipment.

The following describes embodiments of the present application from two angles of the first user equipment and the second user equipment, respectively.

Fig. 3 illustrates a method for remote assistance at a first ue according to an aspect of the present application, where the method includes steps S101, S102, S103, and S104. In step S101, if the communication connection between the first user equipment and the second user equipment meets a predetermined weak network triggering condition, the first user equipment acquires image information about a site to be assisted; in step S102, the first user equipment sends the image information to the second user equipment; in step S103, the first user equipment receives remote assistance information about the image information returned by the second user equipment; in step S104, the first user equipment presents the remote assistance information through the first user equipment.

Specifically, in step S101, if the communication connection between the first ue and the second ue satisfies a predetermined weak network trigger condition, the first ue acquires image information about the site to be assisted. For example, a user to be guided holds a first user device, the user is guided to hold a corresponding second user device, the first user device and the second user device establish communication connection in a wired or wireless mode, if the communication connection meets a preset weak network triggering condition, the first user device shoots image information of a current site to be assisted through a corresponding camera device (such as a camera) and the like, wherein the image information comprises static image frames (such as pictures and the like) related to the current site to be assisted. In some embodiments, the weak network trigger condition includes, but is not limited to: the current available bandwidth information of the communication connection is lower than or equal to the predetermined bandwidth threshold information; the current packet loss rate information of the communication connection is greater than or equal to preset packet loss rate threshold information; the current video frame rate information of the communication connection is lower than or equal to the preset video frame rate threshold information; receiving information which is sent by the second user equipment and is sent by the second user equipment, wherein the communication connection meets the weak network triggering condition; and receiving a weak network triggering operation submitted by a user at the first user equipment. For example, the available bandwidth information of the communication connection includes a throughput of the available bandwidth determined by a bandwidth transmission rate of the current bandwidth and a corresponding coding scheme; the packet loss rate information of the communication connection comprises the ratio of the number of lost data packets to the transmitted data set; the video frame rate of the communication connection includes the number of frames per second of video displayed, etc. The first user equipment end presets corresponding bandwidth threshold information (100 kb/s) and the like, and when the bandwidth rate (such as 80kb/s currently) of the communication connection between the first user equipment and the second user equipment is lower than the bandwidth threshold information, the corresponding weak network triggering condition is triggered; for another example, the first ue end presets corresponding packet loss rate threshold information (for example, packet loss rate is 10%), and the current packet loss rate information (for example, 15% and the like) of the first ue end or the second ue end is greater than the preset packet loss rate threshold information due to reasons such as that router resources are occupied too much, so as to trigger corresponding weak network triggering conditions; if the first ue end presets corresponding video frame rate information (e.g., the video frame rate is 30 frames/second), the first ue end or the second ue end triggers corresponding weak network triggering conditions because the current video frame rate information (e.g., 20 frames/second) is smaller than the preset frame rate information due to the network speed; for example, the user to be guided wants to save the traffic consumed by the remote assistance, and the first user equipment may directly start the weak network triggering condition based on the operation instruction of the user to be guided.

In step S102, the first user device sends the image information to the second user device. For example, after the first user equipment acquires the image information of the site to be assisted, the image information is sent to the second user equipment corresponding to the guiding personnel through the weak network communication connection, for example, a picture of the site to be assisted is sent to the second user equipment. Subsequently, the second user equipment receives the picture sent by the first user equipment, acquires corresponding remote assistance information (such as marking, graffiti, tracking target selection and the like) based on the picture information, and returns the remote assistance information to the first user equipment.

In step S103, the first user equipment receives remote assistance information about the image information returned by the second user equipment. For example, the first user device receives remote assistance information which is returned by the second user device and is acquired based on the image information, wherein the remote assistance information comprises operation information (such as annotation, graffiti or tracking target selection, etc.) which is acquired by the second user device based on operation of guiding a user and related to the image information, and/or position information (such as image coordinates marked in the image information, etc.) corresponding to the operation information.

In step S104, the first user equipment presents the remote assistance information through the first user equipment. For example, after receiving the corresponding remote assistance information, the first user equipment presents the remote assistance information through the corresponding display device, for example, presents the annotation or the graffiti at the corresponding position of the image coordinate of the annotation or the graffiti, or extracts the tracking target information based on the tracking target selection and performs target tracking in the subsequent video frame, etc., where the three kinds of remote assistance information may be performed alternatively, or any combination may be selected for performing, etc.

In the existing remote assistance mode, if the transmission frame number of the video per second is 15 and the image resolution is 720p, the network speed at least needs to reach 2M to finish the remote assistance based on the real-time video. If the modes of reducing resolution, frame rate and code stream are adopted in the weak network environment, the real-time video effect can be poor, and the remote guiding effect can be poor. The invention does not need to send video under the weak network environment, only needs to send pictures, audios, marks and the like each time, thus only needing very low bandwidth (for example, within 10K) to carry out remote assistance, simultaneously greatly saving bandwidth/flow and completing good remote assistance even in a 2G network.

In some cases, the remote assistance between the first user device and the second user device includes, but is not limited to: the three remote assistance modes can be alternatively or in any combination, the interactive content obtained by the method is called remote assistance information, and the mode of obtaining corresponding remote assistance information is called interactive type information. As in some embodiments, the interaction type information includes, but is not limited to: performing graffiti labeling based on the image information; tracking the marked content in the image information; and 3D labeling is carried out based on the image information. The following is a specific description of the three modes:

1) And carrying out graffiti labeling based on the image information. The second user equipment acquires corresponding remote assistance information based on operations of guiding a user to doodle, writing characters, figures, placing pictures and the like on pictures (video frames), then the remote assistance information (the content and the position of the doodle) is synchronously sent to the first user equipment, the first user equipment receives the remote assistance information, the same labels are overlapped on the corresponding positions of the pictures, and the user to be guided is assisted to operate under the guidance of the label information. The content of the graffiti may be the graffiti itself, such as a picture, or may be an ID of the graffiti. If the ID is the ID of the graffiti, the first user equipment and the second user equipment both store the picture locally, and after the second user equipment synchronizes the remote auxiliary information (the ID and the position of the graffiti) to the first user equipment, the first user equipment acquires the corresponding graffiti according to the ID of the graffiti and displays the graffiti at the corresponding position of the picture.

The second user equipment obtains coordinates of marking points on the picture of the operation information (such as the graffiti, the written text, the number, the placement mark and the like) based on the instruction of the user on the picture, converts the coordinates into coordinates based on a pixel coordinate system and normalizes the coordinates, sends the coordinates to the first user equipment, and after obtaining the normalized coordinates based on the pixel coordinate system, the first user equipment converts the coordinates into a screen coordinate system of the equipment and draws the coordinates at corresponding positions on the screen. Specifically, the calculation mode of the corresponding position of the receiver is as follows:

for example, fig. 4 shows an example diagram of interactions based on interaction type information, a establishing a communication connection with B, C. After the connection is successful, the A encodes the picture shot by the camera through a media engine encoding and decoding protocol (such as H.264, VP8 and the like), and transmits the picture to a server through a transmission protocol (such as RTP), and then the picture is forwarded to other participants (namely B, C ends) through the server. The A end simultaneously renders the picture to the interface control, wherein the A end can directly send the coded picture to B, C and B, C for decoding.

B. And the C end receives the data and decodes the data through the same coding and decoding protocol in the media engine to obtain a decoded picture. Rendering onto the interface control. Because the screen sizes and proportions of the devices are different, the video rendering modes are also different, at the moment, all the devices are unified into the proportion of the pictures of the sending end to be rendered, for example, a C terminal in the example diagram is a 16:10 screen, and at the moment, the video is still rendered at the central position of the screen according to the proportion of the sending end 9:16.

Fig. 5 shows an example of interaction based on interaction type information, such as when labeling is started, the B-terminal clicks a picture frame on an interface control to label. And converting the unified coordinate point position according to click coordinate point parameters fed back by the equipment (coordinate points on a screen of the equipment), a screen DPI (pixel points in the area of each inch of the screen represent pixel density), a picture proportion and a display position (the display position of a video on the screen generally refers to the position of the top left vertex and the width and height of video display), and then encrypting and packaging the position information and sending the position information to a A, C end. The labels may be sent to the server first, and forwarded to the A, C end by the server.

A. After receiving the labeling information, the C end decrypts the received information, and calculates the display coordinate point position of the labeling point on the equipment according to the DPI of the current equipment screen, the picture proportion and the display position (the display position of the video on the screen generally refers to the position of the top left vertex and the width and height of the video display), so as to ensure that the labeling and the B end coincide.

2) Tracking the marked content in the image information. Fig. 6 shows an example of interaction based on interaction type information, where the second user device marks the content to be noted at the front end based on the way of guiding the user to take frame selection, place images, words, numbers, etc. on the picture, for example, a frame selection of an engine, the coordinate point of the frame and/or the frame selected content are sent to the first user device in an equivalent and synchronous manner, after the first user device receives the data, the frame of far-end guiding the user to frame selection is displayed in the current video (the engine is selected by the frame), and then the first user device tracks in the real-time video shot later by using a tracking algorithm, so that the frame is always superimposed on the engine along with the movement of the camera. Similarly, the position selected by the guiding frame is converted into coordinates based on a pixel coordinate system and normalized, the coordinates are obtained by the guided side and then converted into corresponding positions on a screen, and then tracking is displayed in a video. In the case of a binocular optical perspective (OST) display device (for example, binocular AR glasses), the tracking process is specifically as follows:

One is to estimate world coordinates corresponding to 2D pixel coordinates of a marker frame by using a SLAM (simultaneous localization and mapping) method, and then obtain real-time marker frames on left and right screens of a display device according to real-time poses, so as to realize corresponding target tracking.

And the other is that the position in the screen is obtained by tracking by using a 2D tracking method according to the 2D pixel coordinates of the marking frame, and then the final display position of the marker on the left and right screens is obtained by using a depth camera and an internal reference, so that the corresponding target tracking is realized.

3) And 3D labeling is carried out based on the image information. Fig. 7 shows an example of interaction based on interaction type information, where the second user device sends the 2D pixel coordinates and 3D labeling content (or the ID of the 3D labeling content) of the 3D labeling placed on the picture to the first user device, and the first user device performs SLAM on the scene, calculates world coordinates corresponding to the 2D pixel coordinates, and then displays the 3D labeling on the display device, so that the 3D labeling is superimposed on the correct position of the live-action. And the subsequent SLAM calculates the camera pose of each frame in real time, and displays the 3D label in a display screen according to the 3D world coordinates of the mark point. 3D annotations include, but are not limited to, pictures, numbers, text, 3D models, and the like. Specifically, when the first user equipment sends a field picture (video frame) to the second user equipment, the equipment position, the posture and the 3D point cloud of the current photographing frame are obtained by utilizing a SLAM algorithm. When the second user equipment performs annotation on the picture based on guiding the user, two-dimensional pixel coordinates of the marked point are obtained and transmitted to the first user equipment, the first user equipment obtains the position (3D coordinates in a world coordinate system) of the point in the real world coordinate system according to the pose and the 3D point cloud, and then the 3D annotation can be displayed on a display device (including but not limited to a mobile device or an AR/VR device). And calculating SLAM in real time to obtain the camera pose of each frame, and displaying the 3D label in a display screen according to the 3D world coordinates of the mark points, so that the 3D label is overlapped at the correct position of the real scene.

For example, taking a binocular OST as an example, when the second user equipment marks a point on the basis of guiding the user to click in the 2D image, obtaining the 3D world coordinate of the marked point through a series of transformations, and then corresponding to one 3D coordinate point under the camera coordinates on the AR glasses to be guided worn by the user at the first user equipment. Since we finally appear on two OST lenses, it is necessary to convert the coordinate system into a left eye OST lens coordinate system and a right eye OST lens coordinate system, respectively. Finally, respectively rendering 3D labels under the corresponding coordinate systems on the two OST lenses. Thus, due to binocular disparity, after the user to be instructed wears AR glasses, he observes the real world and then finds a virtual 3D annotation (e.g., arrow annotation) indicated on the object marked by the far-end instructor. If the method is a single-view and binocular video perspective (VST) presentation method, after the labeling position under the camera coordinate system is obtained, only the dimension reduction projection is needed to be carried out to form a 2D picture, and the whole picture of the final 2D picture is displayed on the lenses of the AR glasses. The 3D spatial coordinate estimation of a specific point in the given 2D video further includes the following existing implementation schemes:

1) A 3D map reconstruction of the whole space is obtained using a monocular SLAM algorithm (with only one RGB camera), while in the reconstructed map the pose of the camera on the first user device in the 3D space is determined. For example, a camera of a first user device may continuously collect RGB images, and the collected images are divided into two uses: the first purpose is to send the three-dimensional point cloud modeling to SLAM algorithm, to perform 3D point cloud modeling, and to determine the pose of the camera in coincidence with the world coordinate system; the second purpose is to transmit to a second user device at the far end through a network, so that a guiding user can check the site situation in real time through carriers such as a PC, a tablet personal computer, a mobile phone and the like. The specific calculation method comprises the following steps: when the coordinates P2D of a certain pixel in the 2D picture are input, and that 2D point P2D is mapped to the camera coordinate system, according to the dimension mapping principle, when the 2D coordinate point is mapped into the 3D space (low-dimension to high-dimension mapping), the 2D coordinate (one point) corresponds to one straight line L3D in the 3D space. The 3D point cloud in the world coordinate system is mapped to the camera coordinate system, which is also a 3D point. At this time, the system selects a point P3dC 'from the 3D point cloud established by the SLAM algorithm (for example, P3dC' is the point closest to the vertical distance of the mapping ray), and obtains a point P3dC on the mapping ray by using the depth value (camera coordinate system) of the point (the obtained point is the camera coordinate system, and then converts it into a point P3D in the 3D point cloud coordinate system (world coordinate system). We consider that P3D is an estimated value of the mapping of the point P2D under the coordinate system of the 3D point cloud, and thus the 3D world coordinate corresponding to the 2D point under the world coordinate system is obtained.

2) After the 3D point cloud is obtained through SLAM algorithm, a plane under a world coordinate system is fitted by the reconstructed 3D point cloud. When the coordinates of a specific 2D point in the video are input, the specific 2D point is mapped to a specific point on a plane under the fitted world coordinate system through the mapping relation of the coordinate system. And then the expression of the plane under the world coordinate system is used for reversely deducing the space 3D coordinate of the specific point under the world coordinate system.

3) Two different sensors, namely an RGB camera and a depth camera, are installed on the device, a 2D image and a depth image are acquired simultaneously, when the coordinates of one 2D specific point in a video are input, an algorithm combines the depth images recorded simultaneously when the 2D image is acquired, the image coordinates of the marked 2D specific point are calculated to correspond to the pixel coordinates in the depth image, and then depth information is acquired from the pixel coordinates. The marked depth information is obtained through the steps, and then the 3D space position coordinates under the world coordinate system can be deduced.

In some cases, the interaction type information may be selected by default, may be selected based on a user to be guided corresponding to the first user equipment, or may be selected based on a guiding user corresponding to the second user equipment, for example, before the first user equipment presents the corresponding remote assistance information, the first user equipment or the second user equipment selects the corresponding interaction type information based on the user to be guided or an operation instruction of the guiding user, and the subsequent first user equipment presents the remote assistance information based on the acquired remote assistance information and the interaction type information. In some embodiments, in step S104, the first user device presents the remote assistance information through the first user device in combination with preset interaction type information.

In some cases, the interaction type may be set by the first user equipment based on the operation of the user to be guided, or may be interaction type information, which is determined by the first user equipment based on the current image information or network bandwidth, and is suitable for the current interaction state, then the first user equipment sends the interaction type information to the second user equipment, and the second user equipment receives the interaction type information and obtains remote assistance information of the guiding user based on the interaction type information and the image information. As in some embodiments, in step S102, the first user device sends the image information and interaction type information about the image information to the second user device. In other cases, the interaction type may be set by the second user device based on the operation of the guiding user, or may be interaction type information, which is determined by the second user device based on the current image information or network bandwidth, and is suitable for the current interaction state, and then the second user device obtains remote assistance information of the guiding user based on the interaction type information and the image information, and returns the interaction type information and the remote assistance information to the corresponding first user device. As in some embodiments, in step S103, the first user equipment receives the remote assistance information and the interaction type information about the image information returned by the second user equipment. And after the interaction type information determined by the first user equipment and the second user equipment is based on the interaction type information, the first user equipment superimposes and presents corresponding remote assistance information on the current display device based on the interaction type information. As in some embodiments, in step S104, a first user device presents the remote assistance information through the first user device in combination with the interaction type information.

In some cases, the first user device receives the corresponding remote assistance information and determines corresponding interaction type information based on the content of the remote assistance information, e.g., the remote assistance information includes interaction type information, from which the corresponding interaction type information can be obtained after the first user obtains the remote assistance information. If the corresponding interaction type information is determined to be a graffiti annotation mode according to the graffiti annotation information, the corresponding interaction type information is determined to be annotation tracking according to the target tracking frame, the corresponding interaction type information is determined to be a 3D annotation according to the 3D annotation information, and the like; and then, the first user equipment presents corresponding remote assistance information based on the interaction type information superposition. As in some embodiments, the method further comprises step S105 (not shown), the first user equipment determining interaction type information corresponding to the image information according to the remote assistance information; wherein in step S104, the first user equipment presents the remote assistance information through the first user equipment in combination with the interaction type information.

In some embodiments, the method further includes step S106 (not shown), where the first user equipment determines corresponding interaction type information based on the weak network state information corresponding to the communication connection; subsequently, in step S104, the first user equipment presents the remote assistance information through the first user equipment in combination with the interaction type information. For example, the weak network state information includes available bandwidth information of a network connection between the first user equipment and the second user equipment, current packet loss rate information, current video frame rate information, and the like; selecting proper interaction type information for the current image information according to the consumption flow of transmission data of the interaction type information and the like, and selecting the interaction type information for tracking the labeling content by the first user equipment when the current available bandwidth information is smaller than a first bandwidth threshold (such as 20 kb/s); if the current available bandwidth information is larger than the first bandwidth threshold and smaller than the second bandwidth threshold (such as 50 kb/s), the first user equipment selects interaction type information for graffiti labeling; and if the current available bandwidth information is larger than the second bandwidth threshold value and meets the weak network triggering condition, the first user equipment selects the interaction type information of the 3D label. For example, when the current packet loss rate information is greater than a first packet loss rate threshold (e.g., 10%), the first user equipment selects interaction type information for tracking the labeling content; if the current packet loss rate information is between a first packet loss rate threshold value and a second packet loss rate threshold value (such as 5%), the first user equipment selects interaction type information for graffiti marking; and if the current packet loss rate information is smaller than the second packet loss rate threshold value and the weak network condition is met, the first user equipment selects the interaction type information of the 3D label. For example, when the current video frame rate is smaller than a first video frame rate threshold (e.g., 5 frames/s), the first user equipment selects interaction type information for tracking the annotation content; if the current video frame rate is greater than the first video frame rate threshold and less than the second video frame rate threshold (such as 10 frames/s), the first user equipment selects interaction type information for graffiti labeling; and if the current video frame rate is greater than the second video frame rate threshold and the weak network triggering condition is met, the first user equipment selects interaction type information of the 3D label. In some embodiments, the method further comprises step S107 (not shown), the first user device performing compression processing on the image information; in step S102, the first user device sends the compressed image information to the second user device. For example, the current network condition is poor, the first user equipment compresses (reduces the image quality) the image information of the site to be assisted, and then sends the compressed image information to the second user equipment, for example, the second user equipment receives the compressed image information and then presents the compressed image information when the picture is compressed to within 100k under the condition that details are not needed.

In some embodiments, in step S107, the first user equipment determines a compression rate of the image information based on the weak network state information corresponding to the communication connection, and performs compression processing on the image information according to the compression rate. For example, the weak network state information includes available bandwidth information of a network connection between the first user equipment and the second user equipment, current packet loss rate information, current video frame rate information, and the like. The first user equipment adjusts the size of the corresponding compressed picture in real time according to the network state information of the communication connection, so that the corresponding picture information can be timely transmitted to the guiding party, and communication is more timely and effective. For example, the first user equipment detects the current available bandwidth, and adapts the picture compression rate, for example, when the current available bandwidth is 100k/s, the picture is compressed to 100k; when the net speed is reduced to 50k/s, the picture is compressed to 50k, etc. The first user equipment then transmits the compressed picture to the corresponding second user equipment. Similarly, the current packet loss rate information and the current video frame rate information are similar to the available bandwidth information described above, and are not described in detail herein. In some embodiments, the method further comprises step S108 (not shown), where the first user equipment receives information sent by the second user equipment that the communication connection satisfies the weak network trigger condition. For example, the second ue may directly start the weak network triggering condition based on the operation instruction for guiding the user, and send the weak network triggering condition to the first ue, so that the first ue starts the image information transmission mode under the corresponding weak network triggering condition, and so on.

In some embodiments, the method further comprises step S109 (not shown), the first user equipment detecting current communication status information of the communication connection or receiving current communication status information of the communication connection sent by the second user equipment; wherein the current communication status information of the communication connection includes, but is not limited to: current available bandwidth information of the communication connection; the current packet loss rate information of the communication connection; current video frame rate information of the communication connection. For example, the first user equipment detects current communication status information of the communication connection in real time, or the second user equipment detects current communication status information of the communication connection in real time and sends the current communication status information to the first user equipment. The first user equipment judges whether the first user equipment is in a weak network condition according to the communication condition information. For another example, the first user equipment detects current communication status information of the communication connection in real time, and if the current communication status information is not good, the corresponding weak network triggering condition is satisfied; or the second user equipment detects the current communication condition information of the communication connection in real time, if the current communication condition information is not good, the weak network triggering condition is determined, and the weak network triggering condition is sent to the first user equipment. Wherein the current communication status information includes, but is not limited to: available bandwidth information, packet loss rate information, video frame rate information, etc. of the current communication connection. The determination of the network status based on the available bandwidth information, the packet loss rate information, and the video frame rate information is similar to that described above, and will not be described in detail herein.

Fig. 8 illustrates a method for performing remote assistance at a second ue according to another aspect of the present application, where the method includes step S201, step S202, and step S203. In step S201, the second user device receives and presents the image information about the site to be assisted, which is sent by the first user device; in step S202, the second user equipment acquires remote assistance information guiding the user about the image information; in step S203, the second user equipment sends the remote assistance information to the first user equipment.

Specifically, in step S201, the second user device receives and presents the image information about the site to be assisted, which is sent by the first user device. For example, a user to be guided holds a first user device, the user is guided to hold a corresponding second user device, the first user device and the second user device establish communication connection in a wired or wireless mode, if the communication connection meets a preset weak network triggering condition, the first user device shoots image information of a current site to be assisted through a corresponding camera device (such as a camera) and the like, wherein the image information comprises static image frames (such as pictures and the like) related to the current site to be assisted. After the first user equipment acquires the image information of the site to be assisted, the image information is sent to the second user equipment corresponding to the guiding personnel through weak network communication connection, for example, a picture of the site to be assisted is sent to the second user equipment.

In step S202, the second user equipment acquires remote assistance information guiding the user about the image information. For example, the second user device receives the picture sent by the first user device, obtains corresponding remote assistance information (such as annotation, graffiti, tracking target selection, etc.) based on the picture information, and returns the remote assistance information to the first user device, wherein the remote assistance information includes operation information (such as annotation, graffiti or tracking target selection, etc.) of the second user device on the basis of operation information (such as annotation, graffiti or tracking target selection, etc.) of the image information obtained by guiding the operation of the user, and/or location information (such as image coordinates marked in the image information, etc.) corresponding to the operation information.

In step S203, the second user equipment sends the remote assistance information to the first user equipment. For example, the second user device returns corresponding remote assistance information to the first user device based on the communication connection of the first user.

In some embodiments, in step S201, the second user device receives and presents the image information about the site to be assisted and the interaction type information about the image information sent by the first user device; wherein in step S202, the second user equipment acquires remote assistance information guiding the user about the image information based on the interaction type information. For example, remote assistance between the first user device and the second user device includes, but is not limited to: the three remote assistance modes can be alternatively or in any combination, the interactive content obtained by the method is called remote assistance information, and the mode of obtaining corresponding remote assistance information is called interactive type information. In some cases, the interaction type may be set by the first user equipment based on the operation of the user to be guided, or may be interaction type information, which is determined by the first user equipment based on the current image information or network bandwidth, and is suitable for the current interaction state, then the first user equipment sends the interaction type information to the second user equipment, and the second user equipment receives the interaction type information and obtains remote assistance information of the guiding user based on the interaction type information and the image information.

In some embodiments, the method of fig. 8 further includes step S204 (not shown), where the second user device obtains interaction type information about the image information set by the guiding user; in step S203, the second user equipment sends the remote assistance information and the interaction type information about the image information to the first user equipment. For example, the interaction type may be set by the second user equipment based on the operation of the guiding user, or may be interaction type information, which is determined by the second user equipment based on the current image information or network bandwidth, and is suitable for the current interaction state, then the second user equipment obtains remote assistance information of the guiding user based on the interaction type information and the image information, and returns the interaction type information and the remote assistance information to the corresponding first user equipment.

In some embodiments, the method of fig. 8 further comprises step S205 (not shown), where a second user equipment detects whether a communication connection between the first user equipment and the second user equipment meets a predetermined weak network trigger condition; if yes, sending information that the communication connection meets the weak network triggering condition to the first user equipment. For example, the second ue may directly start the weak network triggering condition based on the operation instruction for guiding the user, and send the weak network triggering condition to the first ue, so that the first ue starts the image information transmission mode under the corresponding weak network triggering condition, and so on.

In some embodiments, the method of fig. 8 further includes step S206 (not shown), where the second user equipment detects current communication status information of the communication connection between the first user equipment and the second user equipment, and sends the current communication status information of the communication connection to the first user equipment. For example, the second user equipment detects current communication status information of the communication connection in real time and sends the current communication status information to the first user equipment. The first user equipment judges whether the first user equipment is in a weak network condition according to the communication condition information. For another example, the second ue detects current communication status information of the communication connection in real time, determines a weak network trigger condition if the current communication status information is not good, and sends the weak network trigger condition to the first ue. Wherein the current communication status information includes, but is not limited to: available bandwidth information, packet loss rate information, video frame rate information, etc. of the current communication connection. The determination of the network status based on the available bandwidth information, the packet loss rate information, and the video frame rate information is similar to that described above, and will not be described in detail herein.

Referring to the system topology shown in fig. 1, fig. 9 shows a remote assistance system for performing remote assistance according to an aspect of the present application, wherein the system includes a first user equipment 100 and a second user equipment 200, and specifically includes:

The following describes the apparatus for implementing the method corresponding to each embodiment of the present application from two angles of the first user equipment and the second user equipment.

A first user device 100 for remote assistance is shown in fig. 9, wherein the device 100 comprises a one-to-one module 101, a two-to-two module 102, a three-to-three module 103 and a four-to-four module 104. A module 101, configured to acquire image information about a scene to be assisted if a communication connection between the first user equipment and the second user equipment meets a predetermined weak network triggering condition; a second module 102, configured to send the image information to the second user equipment; a third module 103, configured to receive remote assistance information about the image information returned by the second user equipment; and a fourth module 104, configured to present the remote assistance information through the first user equipment.

Specifically, a module 101 is configured to acquire image information about a scene to be assisted if the communication connection between the first ue and the second ue satisfies a predetermined weak network trigger condition. For example, a user to be guided holds a first user device, the user is guided to hold a corresponding second user device, the first user device and the second user device establish communication connection in a wired or wireless mode, if the communication connection meets a preset weak network triggering condition, the first user device shoots image information of a current site to be assisted through a corresponding camera device (such as a camera) and the like, wherein the image information comprises static image frames (such as pictures and the like) related to the current site to be assisted. In some embodiments, the weak network trigger condition includes, but is not limited to: the current available bandwidth information of the communication connection is lower than or equal to the predetermined bandwidth threshold information; the current packet loss rate information of the communication connection is greater than or equal to preset packet loss rate threshold information; the current video frame rate information of the communication connection is lower than or equal to the preset video frame rate threshold information; receiving information which is sent by the second user equipment and is sent by the second user equipment, wherein the communication connection meets the weak network triggering condition; and receiving a weak network triggering operation submitted by a user at the first user equipment. For example, the available bandwidth information of the communication connection includes a throughput of the available bandwidth determined by a bandwidth transmission rate of the current bandwidth and a corresponding coding scheme; the packet loss rate information of the communication connection comprises the ratio of the number of lost data packets to the transmitted data set; the video frame rate of the communication connection includes the number of frames per second of video displayed, etc. The first user equipment end presets corresponding bandwidth threshold information (100 kb/s) and the like, and when the bandwidth rate (such as 80kb/s currently) of the communication connection between the first user equipment and the second user equipment is lower than the bandwidth threshold information, the corresponding weak network triggering condition is triggered; for another example, the first ue end presets corresponding packet loss rate threshold information (for example, packet loss rate is 10%), and the current packet loss rate information (for example, 15% and the like) of the first ue end or the second ue end is greater than the preset packet loss rate threshold information due to reasons such as that router resources are occupied too much, so as to trigger corresponding weak network triggering conditions; if the first ue end presets corresponding video frame rate information (e.g., the video frame rate is 30 frames/second), the first ue end or the second ue end triggers corresponding weak network triggering conditions because the current video frame rate information (e.g., 20 frames/second) is smaller than the preset frame rate information due to the network speed; for example, the user to be guided wants to save the traffic consumed by the remote assistance, and the first user equipment may directly start the weak network triggering condition based on the operation instruction of the user to be guided.

A second module 102, configured to send the image information to the second user equipment. For example, after the first user equipment acquires the image information of the site to be assisted, the image information is sent to the second user equipment corresponding to the guiding personnel through the weak network communication connection, for example, a picture of the site to be assisted is sent to the second user equipment. Subsequently, the second user equipment receives the picture sent by the first user equipment, acquires corresponding remote assistance information (such as marking, graffiti, tracking target selection and the like) based on the picture information, and returns the remote assistance information to the first user equipment.

And a third module 103, configured to receive remote assistance information about the image information returned by the second user equipment. For example, the first user device receives remote assistance information which is returned by the second user device and is acquired based on the image information, wherein the remote assistance information comprises operation information (such as annotation, graffiti or tracking target selection, etc.) which is acquired by the second user device based on operation of guiding a user and related to the image information, and/or position information (such as image coordinates marked in the image information, etc.) corresponding to the operation information.

And a fourth module 104, configured to present the remote assistance information through the first user equipment. For example, after receiving the corresponding remote assistance information, the first user equipment presents the remote assistance information through the corresponding display device, for example, presents the annotation or the graffiti at the corresponding position of the image coordinate of the annotation or the graffiti, or extracts the tracking target information based on the tracking target selection and performs target tracking in the subsequent video frame, etc., where the three kinds of remote assistance information may be performed alternatively, or any combination may be selected for performing, etc.

2) Tracking the marked content in the image information. Fig. 6 shows an example of interaction based on interaction type information, where the second user device marks the content to be noted at the front end based on guiding the user to take a frame selection, place an image, text, number, etc. on the picture information, for example, a frame selection of an engine, the coordinate point of the frame and/or the frame selected content are sent to the first user device in an equivalent way, after the first user device receives the data, a frame guiding the user to select a frame at the far end is displayed in the current video (the engine is selected by the frame), and then the first user device tracks in the real-time video shot later by using a tracking algorithm, so that the frame is always superimposed on the engine along with the movement of the camera. Similarly, the position selected by the guiding frame is converted into coordinates based on a pixel coordinate system and normalized, the coordinates are obtained by the guided side and then converted into corresponding positions on a screen, and then tracking is displayed in a video. In the case of a binocular optical perspective (OST) display device (for example, binocular AR glasses), the tracking process is specifically as follows:

3) And 3D labeling is carried out based on the image information. Fig. 7 shows an example of interaction based on interaction type information, where the second user device sends the 2D pixel coordinates and 3D labeling content (or the ID of the 3D labeling content) of the 3D labeling placed on the picture to the first user device, and the first user device performs SLAM on the scene, calculates world coordinates corresponding to the 2D pixel coordinates, and then displays the 3D labeling on the display device, so that the 3D labeling is superimposed on the correct position of the live-action. And the subsequent SLAM calculates the camera pose of each frame in real time, and displays the 3D label in a display screen according to the 3D world coordinates of the mark point. 3D annotations include, but are not limited to, pictures, text, numbers, 3D models, and the like. Specifically, when the first user equipment sends a field picture (video frame) to the second user equipment, the equipment position, the posture and the 3D point cloud of the current photographing frame are obtained by utilizing a SLAM algorithm. When the second user equipment performs annotation on the picture based on guiding the user, two-dimensional pixel coordinates of the marked point are obtained and transmitted to the first user equipment, the first user equipment obtains the position (3D coordinates in a world coordinate system) of the point in the real world coordinate system according to the pose and the 3D point cloud, and then the 3D annotation can be displayed on a display device (including but not limited to a mobile device or an AR/VR device). And calculating SLAM in real time to obtain the camera pose of each frame, and displaying the 3D label in a display screen according to the 3D world coordinates of the mark points, so that the 3D label is overlapped at the correct position of the real scene.

In some cases, the interaction type information may be selected by default, may be selected based on a user to be guided corresponding to the first user equipment, or may be selected based on a guiding user corresponding to the second user equipment, for example, before the first user equipment presents the corresponding remote assistance information, the first user equipment or the second user equipment selects the corresponding interaction type information based on the user to be guided or an operation instruction of the guiding user, and the subsequent first user equipment presents the remote assistance information based on the acquired remote assistance information and the interaction type information. In some embodiments, a fourth module 104 is configured to present the remote assistance information through the first user equipment in combination with preset interaction type information.

In some cases, the interaction type may be set by the first user equipment based on the operation of the user to be guided, or may be interaction type information, which is determined by the first user equipment based on the current image information or network bandwidth, and is suitable for the current interaction state, then the first user equipment sends the interaction type information to the second user equipment, and the second user equipment receives the interaction type information and obtains remote assistance information of the guiding user based on the interaction type information and the image information. As in some implementations, a module 102 is configured to send the image information and interaction type information about the image information to the second user device. In other cases, the interaction type may be set by the second user device based on the operation of the guiding user, or may be interaction type information, which is determined by the second user device based on the current image information or network bandwidth, and is suitable for the current interaction state, and then the second user device obtains remote assistance information of the guiding user based on the interaction type information and the image information, and returns the interaction type information and the remote assistance information to the corresponding first user device. In some embodiments, a third module 103 is configured to receive remote assistance information and interaction type information about the image information returned by the second ue. And after the interaction type information determined by the first user equipment and the second user equipment is based on the interaction type information, the first user equipment superimposes and presents corresponding remote assistance information on the current display device based on the interaction type information. In some embodiments, a fourth module 104 is configured to present the remote assistance information via the first user equipment in combination with the interaction type information.

In some cases, the first user device receives the corresponding remote assistance information and determines corresponding interaction type information based on the content of the remote assistance information, e.g., the remote assistance information includes interaction type information, from which the corresponding interaction type information can be obtained after the first user obtains the remote assistance information. If the corresponding interaction type information is determined to be a graffiti annotation mode according to the graffiti annotation information, the corresponding interaction type information is determined to be annotation tracking according to the target tracking frame, the corresponding interaction type information is determined to be a 3D annotation according to the 3D annotation information, and the like; and then, the first user equipment presents corresponding remote assistance information based on the interaction type information superposition. In some embodiments, the apparatus 100 further includes a fifth module 105 (not shown) for determining interaction type information corresponding to the image information according to the remote assistance information; the four modules 104 are configured to present the remote assistance information through the first user equipment in combination with the interaction type information.

In some embodiments, the apparatus 100 further includes a six module 106 (not shown) for determining corresponding interaction type information based on the weak network state information corresponding to the communication connection; a fourth module 104 is then configured to present the remote assistance information via the first user equipment in combination with the interaction type information. For example, the weak network state information includes available bandwidth information of a network connection between the first user equipment and the second user equipment, current packet loss rate information, current video frame rate information, and the like; selecting proper interaction type information for the current image information according to the consumption flow of transmission data of the interaction type information and the like, and selecting the interaction type information for tracking the labeling content by the first user equipment when the current available bandwidth information is smaller than a first bandwidth threshold (such as 20 kb/s); if the current available bandwidth information is larger than the first bandwidth threshold and smaller than the second bandwidth threshold (such as 50 kb/s), the first user equipment selects interaction type information for graffiti labeling; and if the current available bandwidth information is larger than the second bandwidth threshold value and meets the weak network triggering condition, the first user equipment selects the interaction type information of the 3D label. For example, when the current packet loss rate information is greater than a first packet loss rate threshold (e.g., 10%), the first user equipment selects interaction type information for tracking the labeling content; if the current packet loss rate information is between a first packet loss rate threshold value and a second packet loss rate threshold value (such as 5%), the first user equipment selects interaction type information for graffiti marking; and if the current packet loss rate information is smaller than the second packet loss rate threshold value and the weak network condition is met, the first user equipment selects the interaction type information of the 3D label. For example, when the current video frame rate is smaller than a first video frame rate threshold (e.g., 5 frames/s), the first user equipment selects interaction type information for tracking the annotation content; if the current video frame rate is greater than the first video frame rate threshold and less than the second video frame rate threshold (such as 10 frames/s), the first user equipment selects interaction type information for graffiti labeling; and if the current video frame rate is greater than the second video frame rate threshold and the weak network triggering condition is met, the first user equipment selects interaction type information of the 3D label.

In some embodiments, the apparatus 100 further comprises a six module 107 (not shown) for compressing the image information; and a second module 102, configured to send the compressed image information to the second user equipment. For example, the current network condition is poor, the first user equipment compresses (reduces the image quality) the image information of the site to be assisted, and then sends the compressed image information to the second user equipment, for example, the second user equipment receives the compressed image information and then presents the compressed image information when the picture is compressed to within 100k under the condition that details are not needed.

In some embodiments, a seventh module 107 is configured to determine a compression rate of the image information based on the weak network state information corresponding to the communication connection, and perform compression processing on the image information according to the compression rate. For example, the weak network state information includes available bandwidth information of a network connection between the first user equipment and the second user equipment, current packet loss rate information, current video frame rate information, and the like. The first user equipment adjusts the size of the corresponding compressed picture in real time according to the network state information of the communication connection, so that the corresponding picture information can be timely transmitted to the guiding party, and communication is more timely and effective. For example, the first user equipment adjusts the size of the corresponding compressed picture in real time according to the network state information of the communication connection, so that the corresponding picture information can be timely transmitted to the guiding party, and communication is more timely and effective. For example, the first user equipment detects the current available bandwidth, and adapts the picture compression rate, for example, when the current available bandwidth is 100k/s, the picture is compressed to 100k; when the net speed is reduced to 50k/s, the picture is compressed to 50k, etc. The first user equipment then transmits the compressed picture to the corresponding second user equipment. Similarly, the current packet loss rate information and the current video frame rate information are similar to the available bandwidth information described above, and are not described in detail herein.

In some embodiments, the apparatus 100 further includes a seventh module 108 (not shown) configured to receive information sent by the second ue that the communication connection satisfies the weak network triggering condition. For example, the second ue may directly start the weak network triggering condition based on the operation instruction for guiding the user, and send the weak network triggering condition to the first ue, so that the first ue starts the image information transmission mode under the corresponding weak network triggering condition, and so on.

In some embodiments, the apparatus 100 further includes an eighth module 109 (not shown) configured to detect current communication status information of the communication connection, or receive current communication status information of the communication connection sent by the second user equipment; wherein the current communication status information of the communication connection includes, but is not limited to: current available bandwidth information of the communication connection; the current packet loss rate information of the communication connection; current video frame rate information of the communication connection. For example, the first user equipment detects current communication status information of the communication connection in real time, or the second user equipment detects current communication status information of the communication connection in real time and sends the current communication status information to the first user equipment. The first user equipment judges whether the first user equipment is in a weak network condition according to the communication condition information. For another example, the first user equipment detects current communication status information of the communication connection in real time, and if the current communication status information is not good, the corresponding weak network triggering condition is satisfied; or the second user equipment detects the current communication condition information of the communication connection in real time, if the current communication condition information is not good, the weak network triggering condition is determined, and the weak network triggering condition is sent to the first user equipment. Wherein the current communication status information includes, but is not limited to: available bandwidth information, packet loss rate information, video frame rate information, etc. of the current communication connection. The determination of the network status based on the available bandwidth information, the packet loss rate information, and the video frame rate information is similar to that described above, and will not be described in detail herein.

In fig. 9, a second user equipment 200 for remote assistance according to another aspect of the present application is shown, wherein the device comprises a two-one module 201, a two-two module 202 and a two-three module 203. A second module 201, configured to receive and present image information about a site to be assisted, which is sent by the first user equipment; the second-second module 202 is configured to obtain remote assistance information for guiding the user about the image information; and a second and third module 203, configured to send the remote assistance information to the first user equipment.

Specifically, two modules 201 are configured to receive and present image information about a scene to be assisted, which is sent by the first user equipment. For example, a user to be guided holds a first user device, the user is guided to hold a corresponding second user device, the first user device and the second user device establish communication connection in a wired or wireless mode, if the communication connection meets a preset weak network triggering condition, the first user device shoots image information of a current site to be assisted through a corresponding camera device (such as a camera) and the like, wherein the image information comprises static image frames (such as pictures and the like) related to the current site to be assisted. After the first user equipment acquires the image information of the site to be assisted, the image information is sent to the second user equipment corresponding to the guiding personnel through weak network communication connection, for example, a picture of the site to be assisted is sent to the second user equipment.

And the two-in-two module 202 is used for acquiring remote assistance information for guiding the user about the image information. For example, the second user device receives the picture sent by the first user device, obtains corresponding remote assistance information (such as annotation, graffiti, tracking target selection, etc.) based on the picture information, and returns the remote assistance information to the first user device, wherein the remote assistance information includes operation information (such as annotation, graffiti or tracking target selection, etc.) of the second user device on the basis of operation information (such as annotation, graffiti or tracking target selection, etc.) of the image information obtained by guiding the operation of the user, and/or location information (such as image coordinates marked in the image information, etc.) corresponding to the operation information.

And a second and third module 203, configured to send the remote assistance information to the first user equipment. For example, the second user device returns corresponding remote assistance information to the first user device based on the communication connection of the first user.

In some embodiments, two modules 201 are configured to receive and present image information about a site to be assisted and interaction type information about the image information sent by a first user equipment; the second module 202 is configured to obtain remote assistance information for guiding the user about the image information based on the interaction type information. For example, remote assistance between the first user device and the second user device includes, but is not limited to: the three remote assistance modes can be alternatively or in any combination, the interactive content obtained by the method is called remote assistance information, and the mode of obtaining corresponding remote assistance information is called interactive type information. In some cases, the interaction type may be set by the first user equipment based on the operation of the user to be guided, or may be interaction type information, which is determined by the first user equipment based on the current image information or network bandwidth, and is suitable for the current interaction state, then the first user equipment sends the interaction type information to the second user equipment, and the second user equipment receives the interaction type information and obtains remote assistance information of the guiding user based on the interaction type information and the image information.

In some embodiments, the second user device 200 further includes a second four module 204 (not shown) for acquiring interaction type information about the image information set by the guiding user; the second and third modules 203 are configured to send the remote assistance information and the interaction type information related to the image information to the first user equipment. For example, the interaction type may be set by the second user equipment based on the operation of the guiding user, or may be interaction type information, which is determined by the second user equipment based on the current image information or network bandwidth, and is suitable for the current interaction state, then the second user equipment obtains remote assistance information of the guiding user based on the interaction type information and the image information, and returns the interaction type information and the remote assistance information to the corresponding first user equipment.

In some embodiments, the second ue 200 further includes a second-fifth module 205 (not shown) configured to detect whether the communication connection between the first ue and the second ue meets a predetermined weak network trigger condition; if yes, sending information that the communication connection meets the weak network triggering condition to the first user equipment. For example, the second ue may directly start the weak network triggering condition based on the operation instruction for guiding the user, and send the weak network triggering condition to the first ue, so that the first ue starts the image information transmission mode under the corresponding weak network triggering condition, and so on.

In some embodiments, the second ue 200 further includes two six modules 206 (not shown), where the second ue detects current communication status information of the communication connection between the first ue and the second ue, and sends the current communication status information of the communication connection to the first ue. For example, the second user equipment detects current communication status information of the communication connection in real time and sends the current communication status information to the first user equipment. The first user equipment judges whether the first user equipment is in a weak network condition according to the communication condition information. For another example, the second ue detects current communication status information of the communication connection in real time, determines a weak network trigger condition if the current communication status information is not good, and sends the weak network trigger condition to the first ue. Wherein the current communication status information includes, but is not limited to: available bandwidth information, packet loss rate information, video frame rate information, etc. of the current communication connection. The determination of the network status based on the available bandwidth information, the packet loss rate information, and the video frame rate information is similar to that described above, and will not be described in detail herein.

The application also provides a computer readable storage medium storing computer code which, when executed, performs a method as claimed in any preceding claim.

The application also provides a computer program product which, when executed by a computer device, performs a method as claimed in any preceding claim.

The present application also provides a computer device comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

FIG. 10 illustrates an exemplary system that may be used to implement various embodiments described in the present disclosure;

in some embodiments, as shown in fig. 10, the system 300 can function as any of the above-described devices of the various described embodiments. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement the modules to perform the actions described in the present application.

For one embodiment, the system control module 310 may include any suitable interface controller to provide any suitable interface to at least one of the processor(s) 305 and/or any suitable device or component in communication with the system control module 310.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

The system memory 315 may be used, for example, to load and store data and/or instructions for the system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as, for example, a suitable DRAM. In some embodiments, the system memory 315 may comprise a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.

For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable nonvolatile memory (e.g., flash memory) and/or may include any suitable nonvolatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or which may be accessed by the device without being part of the device. For example, NVM/storage 320 may be accessed over a network via communication interface(s) 325.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. The system 300 may wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged together with logic of one or more controllers (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic of one or more controllers of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die as logic of one or more controllers of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic of one or more controllers of the system control module 310 to form a system on chip (SoC).

In various embodiments, the system 300 may be, but is not limited to being: a server, workstation, desktop computing device, or mobile computing device (e.g., laptop computing device, handheld computing device, tablet, netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, keyboards, liquid Crystal Display (LCD) screens (including touch screen displays), non-volatile memory ports, multiple antennas, graphics chips, application Specific Integrated Circuits (ASICs), and speakers.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present application may be executed by a processor to perform the steps or functions described above. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

Furthermore, portions of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Communication media includes media whereby a communication signal containing, for example, computer readable instructions, data structures, program modules, or other data, is transferred from one system to another. Communication media may include conductive transmission media such as electrical cables and wires (e.g., optical fibers, coaxial, etc.) and wireless (non-conductive transmission) media capable of transmitting energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied as a modulated data signal, for example, in a wireless medium, such as a carrier wave or similar mechanism, such as that embodied as part of spread spectrum technology. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory, such as random access memory (RAM, DRAM, SRAM); and nonvolatile memory such as flash memory, various read only memory (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memory (MRAM, feRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed computer-readable information/data that can be stored for use by a computer system.

An embodiment according to the application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to operate a method and/or a solution according to the embodiments of the application as described above.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Claims

1. A method for performing remote assistance at a first user equipment, wherein the method comprises:

transmitting the image information to the second user equipment;

receiving remote assistance information about the image information returned by the second user equipment, wherein the remote assistance information is acquired based on corresponding interaction type information and the image information, and the interaction type information is selected by default by corresponding applications, or is selected based on a user to be guided corresponding to the first user equipment, or is selected based on a guiding user corresponding to the second user equipment, or is determined by the first user equipment based on weak network state information corresponding to the communication connection, or is determined by the second user equipment based on weak network state information corresponding to the communication connection;

2. The method of claim 1, wherein the sending the image information to the second user device comprises:

And sending the image information and interaction type information related to the image information to the second user equipment.

3. The method of claim 1, wherein the receiving the remote assistance information about the image information returned by the second user equipment comprises:

and receiving remote assistance information and interaction type information which are returned by the second user equipment and related to the image information.

4. A method according to claim 2 or 3, wherein the presenting of the remote assistance information by the first user equipment comprises:

the remote assistance information is presented by the first user equipment in combination with the interaction type information.

5. The method of claim 1, wherein the method further comprises:

determining interaction type information corresponding to the image information according to the remote assistance information;

wherein the presenting, by the first user device, the remote assistance information includes:

6. The method of claim 1, wherein the presenting, by the first user device, the remote assistance information comprises:

And presenting the remote assistance information through the first user equipment and combining preset interaction type information.

7. The method of claim 1, wherein the method further comprises:

determining corresponding interaction type information based on the weak network state information corresponding to the communication connection;

8. The method of any of claims 2 to 7, wherein the interaction type information comprises at least any of:

performing graffiti labeling based on the image information;

tracking the marked content in the image information;

and 3D labeling is carried out based on the image information.

9. The method of any one of claims 1 to 8, wherein the method further comprises:

compressing the image information;

wherein the sending the image information to the second user equipment includes:

and sending the compressed image information to the second user equipment.

10. The method of claim 9, wherein the compressing the image information comprises:

Determining the compression rate of the image information based on the weak network state information corresponding to the communication connection;

and carrying out compression processing on the image information according to the compression rate.

11. The method of any of claims 1 to 10, wherein the weak network trigger condition comprises at least any of:

the current available bandwidth information of the communication connection is lower than or equal to the predetermined bandwidth threshold information;

the current packet loss rate information of the communication connection is greater than or equal to preset packet loss rate threshold information;

the current video frame rate information of the communication connection is lower than or equal to the preset video frame rate threshold information;

receiving information which is sent by the second user equipment and is sent by the second user equipment, wherein the communication connection meets the weak network triggering condition;

and receiving a weak network triggering operation submitted by a user at the first user equipment.

12. The method of claim 11, wherein the method further comprises:

and receiving information which is sent by the second user equipment and is sent by the communication connection meets the weak network triggering condition.

13. The method of claim 11, wherein the method further comprises:

detecting current communication condition information of the communication connection, or receiving the current communication condition information of the communication connection, which is sent by the second user equipment;

Wherein the current communication status information of the communication connection includes at least any one of:

current available bandwidth information of the communication connection;

the current packet loss rate information of the communication connection;

current video frame rate information of the communication connection.

14. A method for performing remote assistance at a second ue, wherein the method comprises:

obtaining remote assistance information of a guiding user about the image information, wherein the remote assistance information is obtained based on corresponding interaction type information and the image information, and the interaction type information is selected by default by corresponding application, or is selected based on a guiding user corresponding to the first user equipment, or is selected based on a guiding user corresponding to the second user equipment, or is determined by the first user equipment based on weak network state information corresponding to the communication connection, or is determined by the second user equipment based on weak network state information corresponding to the communication connection;

and sending the remote assistance information to the first user equipment.

15. The method of claim 14, wherein the receiving and presenting the image information about the site to be assisted sent by the first user device comprises:

receiving and presenting image information about a scene to be assisted and interaction type information about the image information, wherein the image information is sent by first user equipment;

wherein the obtaining remote assistance information guiding the user about the image information includes:

remote assistance information is acquired that instructs a user about the image information based on the interaction type information.

16. The method of claim 14, wherein the method further comprises:

acquiring interaction type information which is set by the guiding user and related to the image information;

wherein the sending the remote assistance information to the first user equipment includes:

and sending the remote assistance information and the interaction type information about the image information to the first user equipment.

17. The method of any of claims 14 to 16, wherein the method further comprises:

detecting whether the communication connection between the first user equipment and the second user equipment meets a preset weak network triggering condition or not;

If yes, sending information that the communication connection meets the weak network triggering condition to the first user equipment.

18. The method of any of claims 14 to 16, wherein the method further comprises:

detecting current communication condition information of communication connection between the first user equipment and the second user equipment;

and sending the current communication status information of the communication connection to the first user equipment.

19. A method of performing remote assistance, wherein the method comprises:

the second user equipment receives and presents the image information, acquires remote assistance information for guiding a user about the image information, and sends the remote assistance information to the first user equipment, wherein the remote assistance information is acquired based on corresponding interaction type information and the image information, the interaction type information is selected by default by corresponding applications, or is selected based on a guiding user corresponding to the first user equipment, or is selected based on a guiding user corresponding to the second user equipment, or is determined by the first user equipment based on weak network state information corresponding to the communication connection, or is determined by the second user equipment based on weak network state information corresponding to the communication connection;

20. An apparatus for remote assistance, wherein the apparatus comprises:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the operations of the method of any one of claims 1 to 18.

21. A computer readable medium storing instructions that, when executed, cause a system to perform the operations of the method of any one of claims 1 to 18.