CN112509148B

CN112509148B - An interactive method, device and computer equipment based on multi-feature recognition

Info

Publication number: CN112509148B
Application number: CN202011416912.9A
Authority: CN
Inventors: 侯战胜; 宋金根; 孙世军
Original assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; State Grid Shandong Electric Power Co Ltd; Global Energy Interconnection Research Institute
Current assignee: State Grid Corp of China SGCC; State Grid Zhejiang Electric Power Co Ltd; State Grid Shandong Electric Power Co Ltd; Global Energy Interconnection Research Institute
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2024-12-24
Anticipated expiration: 2040-12-04
Also published as: WO2022116545A1; CN112509148A

Abstract

The present invention discloses an interactive method, device and computer equipment based on multi-feature recognition, wherein the method comprises: obtaining target video stream data of a target device; calling a three-dimensional model of the target device according to the target video stream data; sending the data of the three-dimensional model to a remote device; receiving the change increment information of the three-dimensional model fed back by the remote device; and displaying the change of the three-dimensional model according to the change increment information. By implementing the present invention, the problem that the remote repair and maintenance operations existing in the related technology cannot be accurate and effective is solved. By combining the three-dimensional model generated according to the target video stream data and the change increment information fed back by the received remote device, the on-site equipment can obtain accurate guidance information, and realize the remote virtual-real fusion interaction of the augmented reality mode and the three-dimensional model of the power equipment.

Description

Interaction method and device based on multi-feature recognition and computer equipment

Technical Field

The present invention relates to the field of streaming media and information communication, and in particular, to an interaction method, apparatus and computer device based on multi-feature recognition.

Background

With the development of the internet, information communication and 5G communication technologies, the audio-video communication technology mainly goes through four development stages of a local analog signal audio-video system, a PC-based multimedia remote assistance system, a Web server-based remote audio-video collaboration system and a mobile terminal-based audio-video collaboration system. At present, the development of audio and video stream communication technology is in the stage of an audio and video collaboration system (audio and video communication) based on a mobile terminal.

Along with the development of technology, higher requirements are also put forward on the cooperation mode. In the related art, the collaboration mode of remote video call still has great limitation, specifically, the electric power enterprise is used as a public service enterprise of relation energy safety and national folk life, the electric power enterprise is provided with operation sites of power grid transmission and transformation inspection, overhaul, emergency repair and the like, the electric power equipment is multiple in types and complex in operation, new problems are continuously emerging and high in recognition processing difficulty, when the electric power equipment is processed, the electric power equipment needs to be subjected to collaborative operation across teams and across work areas and needs technical experts or equipment manufacturers to remotely support, in the related art, the remote support is provided by the common video call, and because the visual angle of video acquired by using video call equipment at the far end is relatively limited, the real-time guiding operation on the site with the same visual angle is likely not realized, so that the operation of remote maintenance is not very accurate and effective.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide an interaction method, apparatus and computer device based on multi-feature recognition, so as to solve the problem that the remote maintenance operation in the related art cannot be very accurate and effective.

According to a first aspect, an embodiment of the invention provides an interaction method based on multi-feature recognition, which comprises the steps of obtaining target video stream data of target equipment, calling a three-dimensional model of the target equipment according to the target video stream data, sending the data of the three-dimensional model to remote equipment, and receiving change increment information of the three-dimensional model fed back by the remote equipment.

With reference to the first aspect, in a first implementation manner of the first aspect, the method further includes displaying a change of the three-dimensional model according to the change increment information, and controlling the target device according to the change of the three-dimensional model.

With reference to the first aspect, in a second implementation manner of the first aspect, the acquiring target video stream data of the target device includes acquiring and transmitting initial video stream data of the target device in a target area, receiving an initial key frame sent by the remote device, determining a target key frame according to the initial key frame, and acquiring target video stream data of the target device according to the target key frame.

With reference to the second embodiment of the first aspect, in a third embodiment of the first aspect, the determining a target keyframe according to the initial keyframe includes extracting color feature information, texture feature information and motion feature information in the initial video keyframe, fusing the color feature information, the texture feature information and the motion feature information, respectively calculating similarity of each initial video keyframe, determining candidate video keyframes according to the similarity of each initial video keyframe, and determining the target keyframe according to a preset adaptive algorithm.

With reference to the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the obtaining the target video stream data of the target device according to the target key frame includes obtaining first video stream data, identifying a first feature point in the first video stream data according to a preset optical flow method, identifying a second feature point in the target key frame, determining that the first video stream data is matched with the target key frame when the similarity between the first feature point and the second feature point is greater than a preset similarity threshold, and determining that the first video stream is the target video stream data of the target device when the first video stream data is matched with the target key frame.

With reference to the first aspect, in a fifth implementation manner of the first aspect, the method further includes determining a first center position of the target key frame according to the first feature point and a preset relative distance, determining a second center position of the first video stream data according to the second feature point and the preset relative distance, and tracking and acquiring target video stream data of a target device according to the first center position and the second center position.

According to a second aspect, the embodiment of the invention provides an interaction method based on multi-feature recognition, which comprises the steps of receiving data of a three-dimensional model sent by field equipment, generating the three-dimensional model according to the data of the three-dimensional model, determining change increment information according to the three-dimensional model and a preset database, and feeding the change increment information back to the field equipment.

With reference to the second aspect, in a first implementation manner of the second aspect, before receiving the data of the three-dimensional model sent by the field device, the method further includes receiving initial video stream data of a target area sent by the field device, determining a problem area according to the initial video stream data, generating an initial key frame according to the problem area, and sending the initial key frame to the field device.

According to a third aspect, the embodiment of the invention provides an interaction device based on multi-feature recognition, which comprises a target video stream data acquisition module, a calling module, a data sending module and a change increment information receiving module, wherein the target video stream data acquisition module is used for acquiring target video stream data of target equipment, the calling module is used for calling a three-dimensional model of the target equipment according to the target video stream data, the data sending module is used for sending data of the three-dimensional model to remote equipment, and the change increment information receiving module is used for receiving change increment information of the three-dimensional model fed back by the remote equipment.

According to a fourth aspect, the embodiment of the invention provides an interaction device based on multi-feature recognition, which comprises a data receiving module, a three-dimensional model generating module, a determining module and a change increment information sending module, wherein the data receiving module is used for receiving data of a three-dimensional model sent by field equipment, the three-dimensional model generating module is used for generating the three-dimensional model according to the data of the three-dimensional model, the determining module is used for determining change increment information according to the three-dimensional model and a preset database, and the change increment information sending module is used for feeding the change increment information back to the field equipment.

According to a fifth aspect, an embodiment of the present invention provides a computer device, including at least one processor, and a memory communicatively connected to the at least one processor, where the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the multi-feature recognition based interaction method described in the first aspect or any implementation of the first aspect, or the steps of the multi-feature recognition based interaction method described in the second aspect or any implementation of the second aspect.

According to a sixth aspect, an embodiment of the present invention provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the multi-feature recognition based interaction method described in the first aspect or any implementation manner of the first aspect, or the steps of the multi-feature recognition based interaction method described in the second aspect or any implementation manner of the second aspect.

The technical scheme of the invention has the following advantages:

1. The embodiment of the invention provides an interaction method, an interaction device and computer equipment based on multi-feature recognition, wherein the method comprises the steps of obtaining target video stream data of target equipment, calling a three-dimensional model of the target equipment according to the target video stream data, sending the data of the three-dimensional model to remote equipment, receiving change increment information of the three-dimensional model fed back by the remote equipment, and controlling the target equipment according to the change increment information. By implementing the invention, the field device can acquire accurate guiding information by combining the three-dimensional model generated according to the target video stream data and the received change increment information fed back by the remote device, and the interaction of the augmented reality mode and the remote virtual-real fusion of the three-dimensional model of the power device is realized.

2. The embodiment of the invention provides an interaction method, an interaction device and computer equipment based on multi-feature recognition, wherein the method comprises the steps of receiving data of a three-dimensional model sent by field equipment, generating the three-dimensional model according to the data of the three-dimensional model, determining change increment information according to the three-dimensional model and a preset database, and feeding the change increment information back to the field equipment. By implementing the method and the device, the target equipment can be marked by the remote equipment in a first view angle mode by combining the generated three-dimensional model, so that on-site operators are guided to operate accurately, the operation is efficient and accurate, and the interaction of the augmented reality mode and the remote virtual-real fusion of the three-dimensional model of the power equipment is realized.

3. According to the interaction method based on multi-feature recognition, the target position is determined by combining the collaborative labeling and recognition matching between the remote equipment and the field equipment, and the features of the target equipment can be continuously detected through the relative distance of the feature points, so that the real-time position of the target equipment can be continuously obtained, accurate tracking of the target equipment is realized, that is, the multi-feature point information of the target equipment is combined with the real-time detection, and the matching is carried out with the temporarily stored video key frames, so that the monitoring, recognition, matching and tracking of the target equipment are realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the related art, the drawings that are required to be used in the description of the embodiments or the related art will be briefly described, and it is apparent that the drawings in the description below are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a schematic diagram of a structure for communication between a field device and a remote device in an interaction method based on multi-feature recognition according to an embodiment of the present invention;

FIG. 2 is a flowchart of a specific example of a field device end in an interaction method based on multi-feature recognition in an embodiment of the present invention;

FIG. 3 is a flowchart of a specific example of acquiring target video stream data in an interaction method based on multi-feature recognition according to an embodiment of the present invention;

FIG. 4 is a flowchart of a specific example of a remote device in an interaction method based on multi-feature recognition according to an embodiment of the present invention;

FIG. 5 is a flowchart of another specific example of a remote device side in an interaction method based on multi-feature recognition according to an embodiment of the present invention;

FIG. 6 is a block diagram illustrating an exemplary embodiment of a multi-feature recognition based interaction method in accordance with an embodiment of the present invention;

FIG. 7 is a flow chart of one specific example of an interactive apparatus field device end based on multi-feature recognition in an embodiment of the present invention;

FIG. 8 is a schematic block diagram of a specific example of a remote device side in an interaction apparatus based on multi-feature recognition in an embodiment of the present invention;

fig. 9 is a diagram showing a specific example of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. The technical features of the different embodiments of the invention described below may be combined with one another as long as they do not conflict with one another.

Collaboration is a trend of development of modern society, and a traditional face-to-face collaboration mode (such as video conference) has great limitations in both time and space, so that the requirements of people on collaboration can not be met. In particular, remote collaboration refers to the process of helping geographically dispersed organizations and individuals to complete collaboration, with the support of computer and communication technologies. In order to support efficient collaboration, the platform must be able to support live streaming media, such as real-time video, audio, and other multimedia information, such as graphics annotations, still images, text, etc., as well as comprehensive processing of such multimedia information. In practical application scenarios, for example, in operation sites such as power grid transmission and transformation inspection, overhaul and emergency repair, the types of power equipment are multiple, the operation is complex, new problems continue to emerge, the recognition processing difficulty is high, the cooperative operation of the cross-class group and the cross-work area is needed, and technical support of a technical expert or equipment manufacturer is needed remotely, so that the current communication mode has low efficiency and high error probability, and needs to be solved urgently.

In order to solve the problems of low communication efficiency and high error probability in the related art, the embodiment of the invention provides an interaction method, an interaction device and computer equipment based on multi-feature recognition, and aims to efficiently and accurately guide field equipment to operate through real-time audio/video interaction between the field equipment and remote equipment.

As shown in FIG. 1, the field device and the remote expert device communicate through a wireless channel, wherein the field device may be provided with a wearable terminal device, a camera device or an image acquisition device such as a control ball, and a wireless communication device, and the remote device may be provided with a wireless communication module for communicating with other devices, and may be the remote expert device. Specifically, the field device may send the collected video stream data of the field to a remote expert device, and the remote expert device may receive the video stream data through the wireless communication module and send back corresponding feedback information.

The embodiment of the invention provides an interaction method based on multi-feature recognition, which is particularly applied to a field device end, as shown in fig. 2, and comprises the following steps:

In the embodiment, the target device may be any device set in an actual application scenario, for example, when the target device is applied to a power grid transportation scenario, the target device may be an electronic device, such as an oil temperature gauge, an electronic switch, etc. The video stream data may be a data format in which a series of consecutive image information is stored and recorded, the consecutive images recording a specific event within one or more consecutive periods of time, and when the consecutive images are sequentially played at a faster frame rate, consecutive pictures are displayed, that is, the video stream data. The target video stream data may be video stream data of a problem area marked by the remote device, and the video stream data of the corresponding problem area is acquired again.

Specifically, the field device acquires video stream data of the problem area, namely, target video stream data. For example, when a remote device (e.g., a technical support expert) marks a problematic device as an oil temperature gauge, when a camera of a field device is moved to point to the oil temperature gauge, video stream data including the oil temperature gauge, that is, target video stream data, may be acquired.

Specifically, the field device may acquire video stream data through a wearable terminal device, a camera device, a trackball, or the like.

Step S12, calling a three-dimensional model of the target equipment according to the target video stream data, wherein the three-dimensional model can be a three-dimensional model used for representing structural feature information and the like of the target equipment in the embodiment. Specifically, at the field device end, according to the obtained target video stream data containing the target device, determining the identification information of the target device in the target video stream data, determining the model of the target device according to the identification information, calling a corresponding three-dimensional model in a preset three-dimensional model database according to the model of the target device, for example, when the target device is determined to be an oil temperature gauge according to the target video stream data, firstly determining the device model of the oil temperature gauge, for example, xxx-1, calling a three-dimensional model of the oil temperature gauge with the device model xxx-1 in the preset three-dimensional model data, and displaying the three-dimensional model at the field device end.

In the embodiment, the field device end and the remote expert end communicate through a wireless channel to transmit data and the like, and the field device end transmits the generated data of the three-dimensional model to the remote device.

In the embodiment, the three-dimensional model of the target device is displayed on the remote device based on the data of the three-dimensional model through the step S13. The change increment information of the three-dimensional model may be information record of the operation performed on the three-dimensional model of the target device when the remote expert observes the target device at the first viewing angle, for example, when the remote expert confirms that the oil temperature gauge has a problem, the corresponding operation is performed on the three-dimensional model of the remote expert, for example, the oil temperature gauge is moved to the left by 0.6 cm, and at this time, the change increment information is "move the oil temperature gauge to the left by 0.6 cm". The delta change information is transmitted from the remote expert terminal to the field device via a wireless channel, wherein the delta change information is used for moving the oil temperature gauge to the left by 0.6 cm.

The interaction method based on multi-feature recognition comprises the steps of obtaining target video stream data of target equipment, calling a three-dimensional model of the target equipment according to the target video stream data, sending the data of the three-dimensional model to remote equipment, receiving change increment information of the three-dimensional model fed back by the remote equipment, and displaying change of the three-dimensional model according to the change increment information. By implementing the invention, the field device can acquire accurate guiding information by combining the three-dimensional model generated according to the target video stream data and the received change increment information fed back by the remote device, and the interaction of the augmented reality mode and the remote virtual-real fusion of the three-dimensional model of the power device is realized.

As an alternative embodiment of the invention, the method further comprises displaying the change of the three-dimensional model according to the change increment information and controlling the target device according to the change of the three-dimensional model. In this embodiment, the field device may display the delta information directly on the three-dimensional model, for example, the three-dimensional model is adjusted accordingly according to the received delta information, for example, when the received delta information is "move the oil temperature gauge to the left by 0.6 cm", the oil temperature gauge in the three-dimensional model at the field device side is moved to the left by 0.6 cm directly. Furthermore, the operation and maintenance personnel can control the target equipment according to the change of the three-dimensional model, for example, the operation and maintenance personnel can change the three-dimensional model, and then control the oil temperature gauge in the actual equipment, so that the oil temperature gauge moves leftwards by 0.6 cm.

As an optional embodiment of the present invention, the step S11, obtaining the target video stream data of the target device includes:

In the embodiment, the target area may be any area in an actual application scene, the initial video stream data may be a video stream initially acquired by the field device, and at this time, the video stream data of the field device may be acquired by the wearable terminal device, the camera device or the control ball, and the acquired initial video stream data may be transmitted to the remote device, that is, the remote expert end in real time.

In this embodiment, the remote expert terminal performs drawing annotation, for example, text annotation, image annotation, etc., on a region with a problem in the initial video stream data at a first view angle, and the video stream segment with the drawing annotation is the initial key frame, specifically, the field device may receive the initial key frame fed back by the remote expert terminal after sending the initial video stream data to the remote expert terminal.

In the embodiment, the target key frame can be a frame which is extracted and contains all characteristic information in the initial key frame after the field device side optimizes and aggregates the initial key frame, namely the target key frame, specifically, key information in the initial key frame is extracted, redundant information in the initial key frame is removed, and the frame with the key information is extracted as the target key frame through an image saliency detection mode, a candidate key frame extraction method, a self-adaptive hierarchical clustering method and the like, and the target key frame is structurally stored.

And step S24, acquiring target video stream data of target equipment according to the target key frame. In this embodiment, the target video stream data of the target device is determined according to the target key frame, and may be matched with the re-acquired video stream data according to the target key frame, and when the re-acquired video stream data is matched with the target key frame, the re-acquired video stream data may be determined to be the target video stream data, and at this time, text labels, image labels, three-dimensional model labels and the like of the remote expert terminal on the target key frame may be displayed on the re-acquired video stream data.

According to the interaction method based on multi-feature recognition, provided by the embodiment of the invention, the key information in the video is extracted by combining the video stream key frame technology, redundant information in the video is removed, the target key frame is determined and is structurally stored by the methods of image saliency detection, candidate key frame extraction, self-adaptive hierarchical clustering and the like, so that the target key frame and target video stream data can be efficiently and accurately determined, and the minimization of resource consumption and the maximization of key information storage are realized.

As an optional embodiment of the present invention, the step S23, determining the execution process of the target key frame according to the initial key frame, includes:

In this embodiment, the color feature information is the most significant feature in the image, is based on the feature of the pixel point, different electrical devices display different colors, the process of extracting the color feature information may include extracting the color feature information in the initial video key frame, and describing the color feature information with a histogram, and specifically, may generate a color histogram according to the different color feature information of each electrical device. The process for extracting texture feature information can comprise the steps of carrying out statistical calculation according to a plurality of areas comprising a plurality of pixel points to obtain texture feature information, specifically, segmenting an initial video key frame into a plurality of images through a Markov random field model to obtain texture feature information of different area numbers, pixel positions, pixel value sets and the like, and determining the texture feature information of the initial video key frame. The process of extracting motion feature information may include firstly extracting a saliency image of an initial video key frame, specifically, may determine a saliency target in the initial video key frame based on a saliency detection algorithm (SDSP) and a core rule of ciel×a×b×color feature, contrast principle, and saliency calculation, and may save most of information of the initial image. And secondly, performing motion estimation on the saliency image by a pyramid-based Lucas-Kanade optical flow method to generate motion characteristic information of the saliency image.

The interaction method based on multi-feature recognition combines an SDSP algorithm, namely three priori knowledge are combined, namely, the behavior of a person's vision on a scene always detecting a salient object can be simulated by using a log-Gabor filter, the person's vision tends to concentrate on the center of an image, the modeling is performed by using Gaussian mapping, and warm colors can attract visual attention more than cold colors. The algorithm can exclude the influences of colors, complex textures and changeable backgrounds through mathematical modeling, so that a salient image can be obtained rapidly and accurately.

In this embodiment, the similarity between the initial video key frames may be the same degree indicating the content contained in each initial video key frame, specifically, the color feature information, the texture feature information and the motion feature information are normalized to generate a fusion feature vector, and according to the fusion feature vector, the euclidean distance between two adjacent initial video key frames is calculated, and then the similarity between two adjacent initial video key frames is calculated. The smaller the euclidean distance, the higher the similarity between two adjacent frames.

And determining candidate video key frames according to the similarity of the initial video key frames, and determining target key frames according to a preset self-adaptive algorithm. In this embodiment, a clustering threshold is determined by an adaptive hierarchical clustering algorithm, and mutual information (Mutual Information, MI) between salient images of each initial video key frame is determined, which is used to characterize the correlation between two variables. The method comprises the steps of determining candidate video key frames, namely candidate key frame sequences, according to initial video key frames, calculating mutual information between salient images of adjacent candidate key frame sequences, calculating to determine the mutual information sequences, calculating joint probability according to normalized overlapping areas of adjacent images and histograms, determining a clustering threshold according to the joint probability, arranging the mutual information sequences in descending order according to mutual information values, then taking the first frame as a first cluster according to the original time sequence of the candidate key frames, and generating a new cluster if the mutual information value between the two subsequent frames is smaller than or equal to the threshold. In contrast, the subsequent frames are partitioned into the current cluster to determine the target key frames, and the target key frames are ordered clusters of clusters and the frames in each cluster are also ordered according to the relevance of the original video content.

According to the interaction method based on multi-feature recognition, provided by the embodiment of the invention, the rotation invariance of texture feature information is combined, so that the interaction method has stronger resistance to noise, and object information contained in an image is distinguished in a micro level. The method comprises the steps of combining image saliency detection, extracting candidate key frames and key frames determined through a clustering self-adaptive algorithm, wherein an SDSP algorithm can be applied to an original video sequence, saliency information which is noticed by eyes in a video can be extracted by combining three saliency detection methods with priori knowledge, data information contained in the video frames can be quantitatively described, then the candidate key frames with smaller redundancy are obtained, the problem of unstable clustering results caused by inaccurate initial boundary point selection is solved by adopting a clustering self-adaptive determination threshold, and the extracted key frames keep the time sequence of the original input video according to the time sequence of the original video content among final clusters obtained after self-adaptive hierarchical clustering.

As an optional embodiment of the present invention, step S24, an execution process of obtaining target video stream data of a target device according to a target key frame, includes:

In this embodiment, the video stream data may be acquired when the wearable terminal device, the camera device or the control ball moves to the target area again.

In the embodiment, the first characteristic point in the first video stream data and the second characteristic point in the target key frame are determined through a forward optical flow method or a backward optical flow method. Specifically, the first feature point may be a plurality of feature points including a plurality of features in the first video stream data, for example, may be target feature points in pixels, and then query the target key frame and extract a plurality of target feature points in the target key frame.

In the embodiment, the similarity degree of the first feature point in the first video stream data and the second feature point in the target key frame in the aspects of position, quantity and the like is calculated and compared with the preset similarity threshold, and when the calculated similarity degree is larger than the preset similarity threshold, the first video stream data and the target key frame are successfully matched.

Second, when the first video stream data matches the target key frame, the first video stream is determined to be target video stream data of the target device. In this embodiment, when the first video stream data is successfully matched with the target key frame data, it is indicated that the field device re-captures the video stream data including the target device marked with the problem at the remote expert end through the wearable terminal at this time, that is, the target video stream data.

The embodiment of the invention provides an interaction method based on multi-feature recognition, which is applied to an electric power field and an emergency repair operation scene, wherein a video stream of a field device can be acquired through a wearable terminal, a camera device or a control ball, then a plurality of frames in the video stream are read, a remote expert performs drawing annotation on a collaboration target of the video stream acquired by field operators at a first visual angle, a forward/backward optical flow method is used for recognizing a target feature point, the characteristics and feature description of a current frame are calculated, a temporary key frame storage set is inquired, the feature point in the current frame is matched with the feature point of the temporary key frame storage set, if the matching is successful, the remote collaboration annotation target recognition is successful, preparation is made for augmented reality information superposition interaction, and otherwise, the temporary key frame is updated and stored in the temporary key frame storage set in an increment mode. That is, when the first video stream data is successfully matched with the target key frame, a three-dimensional model can be generated according to the power equipment based on the augmented reality service platform and the augmented reality mode, characters and image labels are superimposed on the three-dimensional model, change increment information is transmitted through the position relation, the angle, the operation behavior and the model feedback result between the real-time collaboration personnel and the power equipment, the change increment information is encoded and decoded at the distributed terminals, and meanwhile, the synchronous change of the model and the information is ensured by adopting a checking mechanism, so that tracking interaction between the field equipment and the remote expert terminal based on multi-feature recognition is realized.

As an alternative embodiment of the present invention, the method further comprises:

The method comprises the steps of determining a first central position of a target key frame according to a first characteristic point and a preset relative distance, determining a second central position of first video stream data according to a second characteristic point and the preset relative distance, and tracking and obtaining target video stream data of target equipment according to the first central position and the second central position. In this embodiment, the preset relative distance is the distance between the target feature point and the center position, and because the relative distance between the feature point and the center position is unchanged during zooming and rotation of the same image, the first center position of the target key frame is determined according to the first feature point and the preset relative distance, the center position of the first video stream data is determined according to the second feature point, and the target video stream data of the target device is continuously acquired according to the detected center position, so as to realize continuous tracking of the target device.

The embodiment of the invention provides an interaction method based on multi-feature recognition, which combines clustering voting of centers to determine the center position and the relative distance of each feature point to determine the position of target equipment, and the distance of each feature point relative to the center position is determined under the scaling and rotation proportion, so that the real-time tracking of the object position can be realized through the continuous detection of the object features. The object monitoring, identifying, matching and tracking is realized by a method of detecting multi-feature point information of the object in real time and matching with a video stream structured key frame temporary storage set.

The embodiment of the invention provides an interaction method based on multi-feature recognition, which is particularly applied to a remote equipment end, as shown in fig. 4, and comprises the following steps:

in the embodiment, the remote expert receives the data of the three-dimensional model sent by the field device.

In the embodiment, the remote expert end constructs the three-dimensional model according to the received data of the three-dimensional model.

In this embodiment, the remote expert terminal adjusts the area with the problem in the three-dimensional model according to the preset database, for example, when the remote expert terminal determines that the oil temperature gauge in the three-dimensional model has the problem, the remote expert terminal adjusts the oil temperature gauge according to the preset database, for example, moves the oil temperature gauge to the left by 10 cm or to the left by 0.6 cm, and the adjustment information is the change increment information.

And step S34, feeding back the change increment information to the field device. In this embodiment, when the delta change signal is that the oil temperature gauge moves to the left by 0.6 cm, the adjustment information is transmitted to the field device end as delta change information.

The interaction method based on multi-feature recognition comprises the steps of receiving data of a three-dimensional model sent by field equipment, generating the three-dimensional model according to the data of the three-dimensional model, determining change increment information according to the three-dimensional model and a preset database, and feeding the change increment information back to the field equipment. By implementing the method and the device, the target equipment can be marked by the remote equipment in a first view angle mode by combining the generated three-dimensional model, so that on-site operators are guided to operate accurately, the interaction of the augmented reality mode and the remote virtual-real fusion of the three-dimensional model of the power equipment is realized efficiently and accurately.

As an alternative embodiment of the present invention, as shown in fig. 5, before receiving the data of the three-dimensional model transmitted by the field device in step S31, the communication method further includes:

step 301, receiving initial video stream data of a target area sent by a field device;

In this embodiment, the problem area may be an area where some devices and wiring methods are considered to be problematic by the remote expert, and specifically, the remote expert may mark the problem area in text form or image form, and then generate the initial keyframe.

Step S303, the initial key frame is sent to the field device. In this embodiment, after labeling the initial video stream data sent by the field device, the remote expert terminal generates an initial key frame, and then sends the generated initial key frame to the field device.

The embodiment of the invention provides an interaction method based on multi-feature recognition, which combines a remote expert to draw and mark video stream data acquired by field operators at a first visual angle, so as to generate an initial key video stream segment, and can efficiently and accurately acquire and store key frames of a video stream.

The interaction based on multi-feature recognition in the above embodiment is described in detail below with reference to a specific implementation manner, specifically, the video frames are the most basic components of the video stream, the video frames with the most abundant information are extracted, and the main content in the video frames is converted into advanced semantic information for structural information storage. Information contained in the video stream is classified into bottom layer feature information, key image frame information, and advanced semantic information. The bottom layer feature information refers to extracting global features, local features and structural features of an image. Global features, namely the basic features of the image such as shape, color, texture and the like, local features, feature point sets of the video image, which are used for feature matching, and structural features, which reflect the geometric and space-time domain relations between the image features. The key image frame information refers to extracting key frames according to the bottom layer characteristics and target information of the image, representing the information difference between frames or the information richness of video frames after fusing various bottom layer characteristic information, and then screening out representative video frames. The high-level semantic information refers to semantic logic description and feature expression according to targets and contents contained in the video. By adopting a deep learning technology, a targeted model is trained according to a proper amount of picture sets, target semantics, scene semantics, image semantics and the like are extracted, the extracted semantic information is synthesized, and text sentences are extracted to logically describe events reflected in videos, so that visual understanding, storage and retrieval of users are facilitated. And combining the extracted information such as the bottom layer features, the key image frames, the advanced semantics and the like to perform feature analysis and description, logic expression and structured storage, realizing the structured and digital storage of the video stream, and providing basic services for the extraction of the video key frames and the identification and matching of multiple feature points.

As shown in fig. 6, the mobile intelligent terminal and the background server communicate through a wireless network, and the background server can register information of a plurality of electric devices, wherein the electric devices can be associated with text labels, three-dimensional models and the like in advance, the background server can store the text labels and the three-dimensional models in advance in a classified manner, and the rendering parameters of the three-dimensional models are predetermined and the three-dimensional models are lightened in advance.

The mobile intelligent terminal can download text labels and three-dimensional models of a plurality of electric devices from a background server, render the three-dimensional models, fuse virtual scenes of the three-dimensional models and actual scenes of the electric devices after the downloading is completed, and then display the three-dimensional models with the text labels superimposed and continuously track the corresponding electric devices.

An embodiment of the present invention provides an interaction device based on multi-feature recognition, as shown in fig. 7, including:

the target video stream data obtaining module 41 is configured to obtain target video stream data of the target device, and details of implementation can be found in the description related to step S11 in the above method embodiment.

The invoking module 42 is configured to invoke the three-dimensional model of the target device according to the target video stream data, and details of implementation can be found in the description related to step S12 in the above method embodiment.

The data transmitting module 43 is configured to transmit the data of the three-dimensional model to the remote device, and details of implementation can be found in the description of step S13 in the above method embodiment.

The change increment information receiving module 44 is configured to receive change increment information of the three-dimensional model fed back by the remote device, and details of implementation may be found in the description related to step S14 in the above method embodiment.

The interactive device based on multi-feature recognition comprises a target video stream data acquisition module 41 for acquiring target video stream data of target equipment, a calling module 42 for calling a three-dimensional model of the target equipment according to the target video stream data, a data sending module 43 for sending data of the three-dimensional model to remote equipment, and a change increment information receiving module 44 for receiving change increment information of the three-dimensional model fed back by the remote equipment. By implementing the invention, the field device can acquire accurate guiding information by combining the three-dimensional model generated according to the target video stream data and the received change increment information fed back by the remote device, and the interaction of the augmented reality mode and the remote virtual-real fusion of the three-dimensional model of the power device is realized.

An embodiment of the present invention provides an interaction device based on multi-feature recognition, as shown in fig. 8, including:

The data receiving module 51 is configured to receive the data of the three-dimensional model sent by the field device, and details of implementation can be found in the description related to step S31 in the above method embodiment.

The three-dimensional model generating module 52 is configured to generate a three-dimensional model according to the data of the three-dimensional model, and details of implementation can be found in the description related to step S32 in the above method embodiment.

The determining module 53 is configured to determine the change increment information according to the three-dimensional model and the preset database, and details of implementation can be found in the description related to step S33 in the above method embodiment.

The delta information sending module 54 is configured to feed back delta information to the field device. For details, see the description of step S34 in the above method embodiment.

The interaction device based on multi-feature recognition comprises a data receiving module 51, a three-dimensional model generating module 52, a determining module 53 and a change increment information sending module 54, wherein the data receiving module is used for receiving data of a three-dimensional model sent by field equipment, the three-dimensional model generating module 52 is used for generating the three-dimensional model according to the data of the three-dimensional model, the determining module 53 is used for determining change increment information according to the three-dimensional model and a preset database, and the change increment information sending module 54 is used for feeding the change increment information back to the field equipment. By implementing the method and the device, the target equipment can be marked by the remote equipment in a first view angle mode by combining the generated three-dimensional model, so that on-site operators are guided to operate accurately, the operation is efficient and accurate, and the interaction of the augmented reality mode and the remote virtual-real fusion of the three-dimensional model of the power equipment is realized.

The present invention also provides a computer device, as shown in fig. 9, which may include a processor 61 and a memory 62, where the processor 61 and the memory 62 may be connected by a bus 60 or otherwise, and in fig. 9, the connection is exemplified by the bus 60.

The processor 61 may be a central processing unit (Central Processing Unit, CPU). The Processor 61 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 62 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the multi-feature recognition-based interaction method in the embodiments of the present invention. The processor 61 executes various functional applications of the processor and data processing, i.e. implements the multi-feature recognition based interaction method in the above-described method embodiments by running non-transitory software programs, instructions and modules stored in the memory 62.

The memory 62 may include a storage program area that may store an operating system, application programs required for at least one function, and a storage data area that may store data created by the processor 61, etc. In addition, the memory 62 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 62 may optionally include memory located remotely from processor 61, which may be connected to processor 61 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The one or more modules are stored in the memory 62 and when executed by the processor 61 perform the multi-feature recognition based interaction method in the embodiment shown in fig. 2 and 4.

The details of the above-mentioned computer device may be understood correspondingly with reference to the corresponding relevant descriptions and effects in the embodiments shown in fig. 2 and fig. 4, which are not repeated here.

The embodiment of the invention also provides a non-transitory computer readable medium, which stores computer instructions for causing a computer to execute the interaction method based on multi-feature recognition as described in any of the above embodiments, wherein the storage medium may be a magnetic disk, a compact disc, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a hard disk (HARD DISK DRIVE, abbreviated as HDD), a Solid state disk (Solid-state-STATE DRIVE, SSD), or the like, and the storage medium may further include a combination of the above types of memories.

It is apparent that the above examples are given for clarity of illustration and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. An interaction method based on multi-feature recognition, comprising:

Acquiring target video stream data of target equipment; acquiring target video stream data of target equipment, wherein the acquisition of the target video stream data of the target equipment comprises the steps of acquiring and transmitting initial video stream data of the target equipment in a target area; receiving an initial key frame sent by a remote device; determining a target key frame according to the initial key frame, acquiring target video stream data of target equipment according to the target key frame, determining the target key frame according to the initial key frame, including extracting color characteristic information, texture characteristic information and motion characteristic information in the initial video key frame, respectively calculating similarity of the initial video key frames by fusing the color characteristic information, the texture characteristic information and the motion characteristic information, determining candidate video key frames according to the similarity of the initial video key frames, determining the target key frame according to a preset self-adaptive algorithm, determining the candidate video key frames according to the preset self-adaptive algorithm, namely, a candidate key frame sequence according to the initial video key frames, calculating interaction information between salient images of each adjacent candidate key frame sequence, calculating interaction information sequences, calculating joint probability according to normalized adjacent image overlapping areas and histograms, determining a clustering threshold value, arranging the interaction information sequences according to the interaction information value descending order, dividing the candidate video key frames into two clusters according to the first candidate key frames into two clusters, and generating a new cluster of candidate frames to be equal to the current cluster of the current cluster, if the current cluster is smaller than the current cluster of the current cluster is generated, the target key frames are orderly clustered, and the frames in each cluster are orderly arranged according to the relevance of the original video content;

Invoking a three-dimensional model of the target device according to the target video stream data;

transmitting data of the three-dimensional model to a remote device;

and receiving change increment information of the three-dimensional model fed back by the remote equipment.

2. The method as recited in claim 1, further comprising:

Displaying the change of the three-dimensional model according to the change increment information;

And controlling the target equipment according to the change of the three-dimensional model.

3. The method of claim 1, wherein the obtaining the target video stream data of the target device according to the target key frame comprises:

Acquiring first video stream data;

identifying a first characteristic point in the first video stream data according to a preset optical flow method, and identifying a second characteristic point in the target key frame;

when the similarity between the first feature point and the second feature point is larger than a preset similarity threshold, determining that the first video stream data is matched with the target key frame;

and determining the first video stream as target video stream data of a target device when the first video stream data is matched with the target key frame.

4. A method according to claim 3, further comprising:

Determining a first center position of the target key frame according to the first feature point and a preset relative distance;

determining a second center position of the first video stream data according to the second characteristic points and a preset relative distance;

and tracking and acquiring target video stream data of target equipment according to the first central position and the second central position.

5. An interaction method based on multi-feature recognition, comprising:

Receiving data of a three-dimensional model sent by field equipment; the data of the three-dimensional model is obtained by calling target video stream data in an interaction method based on multi-feature recognition as claimed in claim 1; the method comprises the steps of acquiring target video stream data of target equipment, acquiring and transmitting initial video stream data of the target equipment in a target area, receiving initial key frames transmitted by the remote equipment, determining target key frames according to the initial key frames, acquiring target video stream data of the target equipment according to the target key frames, determining target key frames according to the initial key frames, including extracting color feature information, texture feature information and motion feature information in the initial video key frames, fusing the color feature information, the texture feature information and the motion feature information, respectively calculating similarity of the initial video key frames, determining candidate video key frames according to the similarity of the initial video key frames, determining target key frames according to a preset adaptive algorithm, determining candidate video key frames according to the preset adaptive hierarchical clustering algorithm, specifically including determining candidate video key frames according to the initial video key frames, namely candidate key frame sequences, calculating interaction information between significant images of adjacent candidate key frame sequences, calculating and determining interaction information sequences according to a first clustering sequence, and then combining the first clustering probability sequence according to a first clustering sequence of the interaction probability, if the interaction information value between the two subsequent frames is smaller than or equal to the threshold value, a new cluster is generated, and conversely, the subsequent frames are divided into the current cluster, the target key frames are determined, the target key frames are orderly clustered, and the frames in each cluster are orderly arranged according to the relevance of the original video content;

generating a three-dimensional model according to the data of the three-dimensional model;

Determining change increment information according to the three-dimensional model and a preset database;

And feeding back the change increment information to the field device.

6. The method of claim 5, further comprising, prior to receiving the data of the three-dimensional model transmitted by the field device:

Receiving initial video stream data of a target area sent by the field device;

Determining a problem area according to the initial video stream data, and generating an initial key frame according to the problem area;

the initial keyframe is sent to a field device.

7. An interaction device based on multi-feature recognition, for performing the method of any of claims 1-4, comprising:

The target video stream data acquisition module is used for acquiring target video stream data of target equipment;

the calling module is used for calling the three-dimensional model of the target equipment according to the target video stream data;

the data transmission module is used for transmitting the data of the three-dimensional model to a remote device;

And the change increment information receiving module is used for receiving the change increment information of the three-dimensional model fed back by the remote equipment.

8. An interaction device based on multi-feature recognition, for performing the method of any of claims 5-6, comprising:

the data receiving module is used for receiving the data of the three-dimensional model sent by the field device;

The three-dimensional model generation module is used for generating a three-dimensional model according to the data of the three-dimensional model;

The determining module is used for determining change increment information according to the three-dimensional model and a preset database;

and the change increment information sending module is used for feeding back the change increment information to the field device.

9. A computer device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the multi-feature recognition based interaction method of any of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the multi-feature recognition based interaction method as claimed in any one of claims 1-6.