CN114565719B

CN114565719B - Image data processing method, image platform, computer device, and storage medium

Info

Publication number: CN114565719B
Application number: CN202210182065.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Microport Medbot Group Co Ltd
Current assignee: Shanghai Microport Medbot Group Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2025-07-08
Anticipated expiration: 2042-02-25
Also published as: CN114565719A

Abstract

The present specification provides an image data processing method, an image platform, a computer device, and a storage medium. Based on the method, after the target image is acquired, the target object in the target image is identified, the depth information of the target object can be acquired when the corresponding identification result is obtained, and then the identification result and the depth information of the target object are utilized to generate and display the target label at least comprising the object type information of the target object on the target image. The method and the device can automatically identify and utilize the target label to identify the related information of the target object focused by the user in the target image, and can accurately and intuitively convey the three-dimensional depth information of the target object to the user in the two-dimensional target image through the displayed target label, so that the user can obtain better interaction experience.

Description

Image data processing method, image platform, computer device, and storage medium

Technical Field

The present disclosure relates to medical devices, and more particularly, to an image data processing method, an image platform, a computer device, and a storage medium.

Background

In performing a surgical operation (e.g., abdominal surgery, etc.), a healthcare worker typically uses devices such as an endoscope (e.g., abdominal endoscope) to assist in viewing organs, surgical instruments, etc. in the surgical environment to perform the particular surgical operation.

However, based on the existing medical equipment, the medical staff can only observe the images containing organs and surgical instruments, and the medical staff also needs to identify specific organs and surgical instruments according to the displayed images. In addition, based on the existing medical equipment, the medical staff cannot perceive the three-dimensional depth information of the organ and the surgical equipment. And thus may affect the surgical operation of the healthcare worker.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The specification provides an image data processing method, an image platform, computer equipment and a storage medium, which can automatically identify and utilize target labels to identify relevant information of a target object focused by a user in an image, and simultaneously can accurately and intuitively convey three-dimensional depth information of the target object to the user in a two-dimensional target image through the displayed target labels, so that the user can obtain better interaction experience.

The embodiment of the specification provides an image data processing method, which comprises the steps of obtaining a target image, identifying a target object in the target image to obtain a corresponding identification result, obtaining depth information of the target object, and displaying a target label of the target object in the target image according to the identification result and the depth information of the target object, wherein the target label at least comprises object category information of the target object.

The embodiment of the specification also provides an image platform which comprises a binocular endoscope and an image data processing device, wherein the image data processing device is used for processing the target image in a mode that a target object in the target image is identified to obtain a corresponding identification result, depth information of the target object is obtained, and a target label of the target object is displayed in the target image according to the identification result and the depth information of the target object, wherein the target label at least comprises object category information of the target object.

Embodiments of the present disclosure also provide a computer device comprising a processor and a memory for storing processor-executable instructions, which when executed by the processor implement the relevant steps of the image data processing method.

The present description also provides a computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps associated with the image data processing method.

Based on the image data processing method, the image platform, the computer equipment and the storage medium provided by the specification, after the target image is acquired, the target object in the target image is identified, when the corresponding identification result is obtained, the depth information of the target object can also be acquired, and meanwhile, the identification result and the depth information of the target object are utilized to generate and display the target label at least comprising the object type information of the target object on the target image. Therefore, the method and the device can automatically identify and utilize the target label to identify the related information of the target object focused by the user in the target image, and can accurately and intuitively convey the three-dimensional depth information of the target object to the user in the two-dimensional target image through the displayed target label, so that the user can obtain better interaction experience.

The method comprises the steps of determining a tag size parameter matched with a target object based on a perspective principle according to depth information of the target object, generating a target tag capable of conveying depth information such as distance and the like of the target object relative to a lens according to the tag size parameter, and displaying the target tag in a target image so that a user can know the distance and the like of the target object relative to the lens more conveniently and effectively according to the intuition of the target tag in the image.

Further, according to the depth information of the target object, the display effect parameters of the target label such as the shadow parameter, the deformation parameter, the brightness parameter and the like of the target image can be adjusted in a targeted manner, so that the three-dimensional depth information of the target object can be clearly transmitted to a user through the target image and the target label more effectively.

In addition, an effect image layer can be built on the basis of an Augmented Reality (AR) technology by firstly acquiring and according to difference values among depth information of different target objects in a target image, then the target image and the effect image layer are subjected to superposition processing to obtain a superposed target image with a stereoscopic vision effect, and then the superposed target image is displayed to a user, so that the user can obtain relatively better interaction experience.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure, the drawings that are required for the embodiments will be briefly described below, and the drawings described below are only some embodiments described in the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow chart of an image data processing method according to an embodiment of the present disclosure;

FIG. 2 is a physical schematic of a doctor console applying an image data processing method according to an embodiment of the present disclosure;

FIG. 3 is a schematic representation of a binocular endoscope employing an image data processing method according to one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image platform to which an image data processing method is applied according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an embodiment of an image data processing method to which the embodiments of the present specification are applied, in one example of a scene;

FIG. 6 is a schematic diagram of an embodiment of an image data processing method to which the embodiments of the present specification are applied, in one example of a scene;

FIG. 7 is a schematic diagram of an embodiment of an image data processing method to which the embodiments of the present specification are applied, in one example of a scene;

FIG. 8 is a schematic diagram of an embodiment of an image data processing method to which the embodiments of the present specification are applied, in one example of a scene;

FIG. 9 is a schematic diagram of an embodiment of an image data processing method to which the embodiments of the present specification are applied, in one example of a scene;

FIG. 10 is a schematic diagram of an embodiment of an image data processing method to which the embodiments of the present specification are applied, in one example of a scene;

FIG. 11 is a schematic diagram of an embodiment of an image data processing method to which the embodiments of the present specification are applied, in one example of a scene;

FIG. 12 is a schematic structural diagram of an image platform according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of the structural composition of a computer device provided in one embodiment of the present disclosure;

fig. 14 is a schematic structural composition diagram of an image data processing apparatus provided in one embodiment of the present specification;

fig. 15 is a physical schematic of a surgical platform provided in one embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

Referring to fig. 1, an embodiment of the present disclosure provides an image data processing method. In particular implementations, the method may include the following.

S101, acquiring a target image.

S102, identifying a target object in the target image to obtain a corresponding identification result, and acquiring depth information of the target object.

And S103, displaying a target label of the target object in the target image according to the identification result and the depth information of the target object, wherein the target label at least comprises object category information of the target object.

In some implementations, the target image may be specifically understood as an image including at least the target object. The target object may specifically be an object focused by a user. The number of target objects included in the target image may be one or a plurality of.

In some embodiments, the target image may specifically include a first image, and/or a second image. The first image may specifically be an image including a target object collected by a left camera (or a first camera), and the second image may specifically be an image including a target object collected by a right camera (or a second camera). The left camera and the right camera are fixed in relative positions.

The target objects may be different types of data objects corresponding to different application scenarios. In particular, for example, in a surgical scenario, the target object may be an organ of a patient and/or a surgical instrument used by a healthcare worker, or the like. In the traffic monitoring scene, the target object may be a vehicle or the like traveling on a road.

Of course, it should be noted that the application scenario and the target object listed above are only illustrative. In specific implementation, the image data processing method can be applied to other types of application scenes according to specific situations and processing requirements, and correspondingly, the target object can also comprise other types of data objects in other types of application scenes. The present specification is not limited to this.

The image data processing method will be described in detail mainly with respect to a surgical scene. For other application scenarios, reference may be made to the following embodiments of the surgical scenario. In this regard, the description is not repeated.

Referring to fig. 4, the image data processing method provided in the embodiment of the present disclosure may be specifically applied to an image platform (or a surgical robot, a doctor console, etc.). Specifically, at least a binocular endoscope (see fig. 3) facing the patient to be operated is further connected to the image platform, and the image platform faces the medical care worker (or called user) who performs the operation.

Specifically, referring to fig. 3, the binocular endoscope at least includes two cameras, i.e., a left camera (or a first camera) and a right camera (or a second camera), which are fixed relative to each other, and related image sensors. The binocular endoscope can extend into a specific surgical environment to shoot a target object of interest to a healthcare worker. In practice, the binocular endoscope can image the photographed target object through the camera onto an image sensor for outputting an electrical signal according to the imaging of the camera, the electrical signal being used for generating a target image containing the photographed target object.

Referring to fig. 4, the image platform may include at least an image data processing device (e.g., a processor or an image processing host that supports image data processing), a display device (e.g., a display screen), and so on. In specific implementation, the image platform can process the target image acquired by the binocular endoscope by applying the image data processing method provided by the embodiment of the specification through the data processing device, and then display the target image through the display device and display the target label of the target object in the target image so as to convey the identification result of the target object and the three-dimensional depth information of the target object to the medical care worker. The display device specifically may include a two-dimensional display screen, a three-dimensional display screen, or the like.

Specifically, the image platform may also be connected to a doctor console (or called a doctor trolley), as shown in fig. 2. Wherein the doctor console includes at least a monitor (e.g., a stereoscopic monitor, etc.). Accordingly, the target image generated by the image platform and carrying the target tag can be transmitted to the doctor console. The physician console may present the healthcare worker with a target image containing the target label via the monitor.

Further, the above-described doctor consoles may also be configured with VR devices (e.g., VR glasses, etc.), supporting the presentation of target images based on virtual reality technology, in order for healthcare workers to better view organs, surgical instruments, etc. that appear during the procedure.

In addition, the doctor console may be coupled to a surgical platform, as shown in fig. 15. The operation platform at least comprises a mechanical arm, a mechanical arm control device and the like.

Correspondingly, the medical staff can accurately initiate corresponding operation instructions to the operation platform through the doctor control console according to the target image with the target label displayed by the monitor in the doctor control console, and the mechanical arm control device of the operation platform can respond to the operation instructions to control the mechanical arm to execute corresponding actions so as to complete specific operation.

In some embodiments, the target image may be specifically an image obtained by a binocular endoscope at a timing or in real time before or during the operation, including a target object such as an organ and/or a surgical instrument of interest in the operation.

In some embodiments, the target image may include a first image acquired through a binocular endoscope and/or a second image, and the target object may include a tissue organ and/or a surgical instrument, respectively.

In some embodiments, the method for acquiring the target image may include controlling the binocular endoscope to capture the target object to obtain a frame of image as the target image.

The method for acquiring the target image may further include controlling the binocular endoscope to shoot the target object at preset time intervals (for example, at intervals of 5 seconds) to obtain a multi-frame image. Wherein, the multi-frame images can be arranged according to the sequence of the acquisition time from front to back. And acquiring the current image to be processed from the multi-frame images frame by frame according to the sequence of the acquisition time, and taking the current image to be processed as the target image.

In some embodiments, in the process of identifying the target object in the target image and obtaining the identification result, one image of the first image or the second image may be used separately, and in the process of obtaining the depth information of the target object, two images of the first image and the second image need to be used simultaneously.

In some embodiments, the identifying the target object in the target image to obtain a corresponding identification result may include processing the target image using an image processing model, and determining an object class of the target object in the target image as the identification result.

The image processing model may be specifically understood as a neural network model capable of finding out a target object in a target image and identifying at least information such as an object type of the target object. The above-mentioned image processing model is constructed in a manner that will be described later.

In some embodiments, the image processing model comprises an image processing model trained based on a YOLO model.

The YOLO model can be specifically understood as a neural network model of a target detection algorithm, and has high accuracy while realizing rapid detection based on the target detection algorithm model.

In addition, the image processing model can also comprise an image processing model obtained by training based on Fast YOLO, R-CNN or other types of models.

In some embodiments, the processing the target image by using the image processing model to determine the object class of the target object in the target image, as shown in fig. 5, may include rasterizing the target image by using the image processing model to obtain a rasterized target image, and detecting and identifying the object class of the target object in the target image by processing the rasterized target image by using a convolutional neural network. The convolutional neural network may specifically be a model structure in an image processing model.

In some embodiments, the detecting and identifying the object class of the target object in the target image by processing the rasterized target image through the convolutional neural network may include the following when implemented:

s1, dividing a first sub-image area from the rasterized target image by using a convolutional neural network through a dividing frame, wherein the first sub-image area is a non-background image area;

S2, detecting whether a target object exists in the first sub-image area by using a convolutional neural network, and dividing a second sub-image area from the first sub-image area by using a detection frame according to a detection result, wherein the second sub-image area is an image area containing the target object to be identified;

and S3, determining the object type of the target object by utilizing a convolutional neural network to perform image recognition on the second sub-image region.

According to the embodiment, the background image area which does not contain the target object can be filtered from the rasterized target image by utilizing the dividing frame, the non-background image area which contains the target object is reserved to serve as a first sub-image area to be further processed, then the image area which possibly contains the target object can be further selected from the first sub-image area by utilizing the detecting frame to serve as a second sub-image area, and further, the convolutional neural network can only conduct image recognition on the second sub-image area to determine the object type of the target object in the target image, so that the data processing amount is reduced, and the overall recognition efficiency is improved.

In some embodiments, after the second sub-image region is segmented from the first sub-image region by using the detection frame, the method may further include filtering the second sub-image region segmented based on the redundant detection frame by using a non-maximum suppression algorithm (NMS) in the second sub-image region to obtain a corrected second sub-image region, and correspondingly, determining an object type of the target object by performing image recognition on the corrected second sub-image region by using a convolutional neural network.

By the embodiment, the second sub-image area divided by the detection frame with higher confidence coefficient can be obtained by further filtering based on the non-maximum suppression algorithm in the second sub-image area to serve as a corrected second sub-image area, and furthermore, the convolutional neural network can only conduct image recognition on the corrected dun second sub-image area in the second sub-image area, so that data processing is further reduced, and overall recognition efficiency is further improved.

In some embodiments, filtering the second sub-image region divided based on the redundant detection frames in the second sub-image region by using a non-maximum suppression algorithm (NMS) to obtain a modified second sub-image region may include, in particular, S1, calculating the areas of all the detection frames in the second sub-image region. And S2, calculating confidence degrees of all the detection frames, and sorting the detection frames in a descending order according to the confidence degrees. Wherein, when the matching degree of the detection frame and the sample image in the sample image set is higher, the confidence is relatively higher. And S3, reserving the detection frame with the highest confidence coefficient, and calculating the intersection between the detection frame and the rest detection frames. And S4, calculating the intersection ratio IoU between the detection frame with the highest confidence and the photographed detection frame. See fig. 6. And S5, reserving IoU detection frames smaller than a threshold value, and eliminating IoU large redundant detection frames. This is because, in general, the larger IoU, the higher the overlap with the detection frame with the highest confidence. S6, detecting IoU whether the number of detection frames smaller than the threshold value is 0. In case it is determined IoU that the number of detection frames smaller than the threshold is 0, a determination of the corrected second sub-image area according to the finally retained detection frames may be obtained. In the case where it is determined that IoU is not 0 in the number of detection frames smaller than the threshold, the above steps S3 to S5 are repeated until IoU is 0 in the number of detection frames smaller than the threshold.

In some embodiments, when the object class of the object in the object image is determined while the object image is processed by using the image processing model, the method may further include extracting a target state feature about the object from the second sub-image area, determining a state of the object according to the target state feature and a preset state database, and correspondingly, combining the object class of the object and the state of the object as a recognition result.

The target state features may specifically include a color feature, a texture feature, a brightness feature, a contour line feature, and the like of the target object in the second sub-image area.

The states of the target objects may include a plurality of different types of states corresponding to different target objects. In particular, for example, where the target object is a surgical knife (a surgical instrument), the state of the target object may include one or more of the use states listed below, in use, unused, damaged, undamaged, and the like. Also for example, where the target subject is the stomach (an organ), the state of the target subject may include one or more of the health conditions listed below, health, gastrorrhagia, gastric ulcers, and the like.

The preset state database may be specifically understood as a preset state database constructed by learning image data in various states of the target object in advance. The preset state database comprises a plurality of sub-databases, and each sub-database corresponds to one object type. Further, each sub-database includes a plurality of preset reference state features corresponding to a plurality of states of the target object of the object class.

In some embodiments, in practice, the target state characteristics for the target object may be extracted from the second sub-image region by employing an adaptive threshold. The self-adaptive threshold value mode specifically comprises the steps of calculating a reference value (such as a mean value, a median value, gaussian weighted average and the like of pixels of a local image area) of the local image area in the second sub-image area according to brightness distribution of different areas of the second sub-image area, correspondingly converting the threshold value of the corresponding local image area according to the reference value, and extracting target state characteristics of the local image area according to the threshold value of the local image area. Therefore, errors can be reduced, and the target state characteristics of the target object can be extracted more accurately.

In some embodiments, the determining the state of the target object according to the target state feature and the preset state database may include determining, from the preset state database, a sub-database matching with the target object according to the object type of the target object identified by the image processing model, as a target sub-database, respectively matching the target state feature with a plurality of preset reference state features stored in the target sub-database, finding a preset reference state feature with the highest matching degree with the target state feature, and determining the state corresponding to the preset reference state feature as the state of the target object.

In some embodiments, the object class of the target object and the state of the target object may be combined to obtain a relatively richer, similar recognition result. Accordingly, the recognition result may include information about the target object, such as an object class of the target object, a state of the target object, and the like. In addition, the above identification result may further include other related information such as the number of the target object.

In some embodiments, referring to fig. 5, the depth information of the target object in the target image may be obtained while the target image is processed by using the image processing model to obtain a corresponding recognition result, and then the recognition result and the depth information are combined to generate and display the target tag of the target object in the target image.

In some embodiments, referring to fig. 7, the obtaining depth information of the target object may include the following:

s1, determining a first distance between a first key pixel point and the optical center of a left camera and a second distance between a second key pixel point and the optical center of a right camera according to the first image and the second image, wherein the first key pixel point is a pixel point corresponding to a key point of a target object in the first image, and the second key pixel point is a pixel point corresponding to the key point of the target object in the second image;

and S2, calculating the vertical distance between the key point of the target object and the planes of the left camera and the right camera according to the first distance and the second distance, and taking the vertical distance as the depth information of the target object.

The key point may specifically be a center point of the target object.

In some embodiments, the calculating the vertical distance between the key point of the target object and the planes of the left camera and the right camera according to the first distance and the second distance may include obtaining a baseline distance between the left camera and the right camera and a camera focal length, and calculating the vertical distance between the key point of the target object and the planes of the left camera and the right camera by using the first distance, the second distance, the baseline distance and the camera focal length.

In some embodiments, in implementation, the perpendicular distance between the key point of the target object and the planes where the left camera and the right camera are located may be calculated according to the following formula:

Wherein z is the vertical distance between the key point of the target object and the plane where the left camera and the right camera are located, f is the focal length of the camera, u _L is the first distance, and u _R is the second distance.

In some embodiments, a larger value of depth information for the target object indicates a greater distance of the target object from the lens, and conversely, a smaller value of depth information for the target object indicates a lesser distance of the target object from the lens.

By the embodiment, the first image and the second image acquired by the binocular endoscope can be used at the same time, and the depth information of the target object can be calculated by calculating the parallax between the two images.

In some embodiments, the displaying the target label of the target object in the target image according to the recognition result and the depth information of the target object may include the following when implemented:

s1, generating a label of the target object according to the identification result of the target object;

s2, determining a tag size parameter matched with the target object by utilizing the depth information of the target object according to a preset matching rule;

And S3, adjusting the character size in the label of the target object according to the label size parameter to obtain the target label of the target object, and displaying the target label of the target object in the target image.

Wherein, the label of the target object at least comprises the object category of the target object. Further, other relevant information such as the state of the target object, the number of the target object, and the like may be included in the tag.

The preset matching rule can be specifically understood as a matching rule constructed based on perspective theory. Based on the preset matching rule, the label of the target object relatively close to the lens matches a relatively larger size parameter along with the visual sense of the target object, and conversely, the label of the target object relatively close to the lens matches a relatively smaller size parameter along with the visual sense of the target object, so that the distance between the target object and the lens can be reflected through the size of the label, and the three-dimensional depth information about the target object can be accurately transmitted to a user.

In some embodiments, the determining, according to the preset matching rule, the tag size parameter matching with the target object according to the depth information of the target object, referring to fig. 8, may include the following when implemented:

S1, projecting a pixel point of a key point of a target object in a target image into a reference plane to obtain a corresponding reference pixel point, wherein the reference plane is a plane in which the pixel point with the maximum depth information in the target image is located;

s2, acquiring the distance between the reference pixel point and the center of the reference plane and the maximum value of depth information in the target image;

and S3, calculating to obtain a label size parameter matched with the target object according to the preset matching relation parameter by utilizing the depth information of the target object, the distance between the reference pixel point and the center of the reference plane and the maximum value of the depth information in the target image.

The reference plane may be understood as a plane in which the identified distal-most data object in the target image is located. The maximum value of the depth information in the target image can be understood as the depth information of the data object at the furthest end.

According to the preset matching rule, in implementation, reference may be made to fig. 8. Firstly, projecting a pixel point P of a key point of a target object in a target image into a reference plane for imaging to obtain a corresponding reference pixel point P'. Then, the maximum value d ₁ of the depth information in the target image (for example, the vertical distance between the most far-end data object and the planes of the left camera and the right camera) and the depth information d ₂ of the key point of the target object can be obtained, and the vertical distance d ₃ between the reference pixel point and the center of the reference plane can be calculated. The view angle range of the lens can be marked as theta, and the included angle between the lens and the central axis relative to the P can be marked as beta.

In the specific calculation, the view angle θ of the lens can be obtained first, and the view range of the most remote data object can be calculated according to the view angle of the lens and the maximum value of the depth information in the target image: And then calculating the vertical distance D ₃ between the reference pixel point and the center of the reference plane by using the D.

Further, the label size parameter matched with the target object may be calculated according to the following formula by using the preset matching relation parameter:

Wherein DF is a label size parameter, B is a preset matching relation parameter, d ₂ is depth information of a target object, d ₃ is a vertical distance between a reference pixel point and a reference plane center, and d ₁ is a maximum value of the depth information in a target image.

In some embodiments, the tag size parameter may specifically be a character size (e.g., a font size of a character, etc.) in the target tag. The specific value of the preset matching relation parameter can be determined according to a preset matching rule. Specifically, the preset matching relation parameter may be determined according to the number of target objects contained in the target image and the maximum value of depth information in the target image.

In some embodiments, the size of the character in the label of the target object is adjusted according to the label size parameter, so that the target label of the target object which is relatively closer to the lens in the target image is relatively larger, and the target label of the target object which is relatively farther from the lens is relatively smaller, thereby enabling a user to intuitively feel layering between different target objects in the target image, and effectively conveying three-dimensional depth information of the target object to the user. See fig. 9.

In some embodiments, after obtaining the target label of the target object, the method may further include adjusting a display effect parameter of the target label of the target object according to the depth information of the target object to obtain the target label with the adjusted display effect, where the display effect parameter includes at least one of a shadow parameter, a deformation parameter, and a brightness parameter, and correspondingly, displaying the target label with the adjusted display effect in the target image.

Specifically, for example, for a target object with larger depth information, the shadow parameter and the deformation parameter of the target label of the target object can be specifically adjusted to be high, and the brightness parameter of the target label of the target object can be adjusted to be low. For the target object with smaller depth information, the shadow parameter and the deformation parameter of the target label of the target object can be reduced in a targeted manner, and meanwhile, the brightness parameter of the target label of the target object can be increased. Therefore, the user can more intuitively and comprehensively feel the depth information of the target objects through the target labels, and know the relative position relationship between the target objects.

It should be noted that the above-listed display effect parameters are only illustrative. In the implementation, other types of display effect parameters can be introduced according to specific application scenes and processing requirements. The present specification is not limited to this.

In some embodiments, the target label of the target object is displayed in the target image, and when implemented, the method may further include the following:

s1, acquiring a position parameter of a target object in a target image;

s2, determining display position coordinates according to the position parameters of the target object in the target image;

and S3, setting and displaying the target label of the target object at the corresponding position in the target image according to the display position coordinates.

In some embodiments, the above-mentioned position parameter of the target object in the target image may be specifically understood as a position coordinate of a key point of the target object in the target image, and may be represented as (x 1, y 1). In the implementation, the position parameter of the target object can be determined according to the detection frame of the target object.

In some embodiments, after the second sub-image region is segmented from the first sub-image region by using the detection frame, the method may further include, when implemented, acquiring position coordinates of the detection frame, and determining a position parameter of the target object in the target image according to the position coordinates of the detection frame.

Further, the matched offset parameter (for example, may be denoted as a) may be set according to the distance between the target object and the other adjacent target objects, the size of the target object, and the position parameter of the other adjacent target objects around the target object, and the corresponding display position coordinate may be calculated according to the position parameter and the offset parameter of the target object, and may be denoted as (x1+a, y1+a).

Further, the target tag of the target object may be set and displayed at a corresponding position in the target image according to the display position coordinates.

Therefore, the target label of the target object displayed in the target image can not cause shielding to the target object and other target objects adjacent to the periphery of the target object, and the interaction experience of the user is further improved.

In some embodiments, during implementation, the position parameters of the target object in the images of two adjacent frames shot at the adjacent acquisition time points can be tracked based on the target tracking algorithm according to the position parameters of the target object, so that a user can more quickly and accurately know the change condition of the same target object between the adjacent acquisition time points.

Specifically, for example, referring to fig. 10, the same surgical instrument in the nth and n+1st frames may be tracked according to the positional parameters of the surgical instrument used in the surgical procedure based on the target tracking algorithm.

In addition, the image platform and/or doctor control also support the user to track and set one or more target objects identified in the target image of the current frame. Accordingly, the target object which is indicated to be tracked by the user can be received and determined according to the tracking setting operation for the target image of the current frame, and the target object which is marked can be tracked according to the identification result in the process of processing the target image of the next frame.

In some embodiments, in implementation, the target image may be further processed based on the augmented reality technology, so that the user can more intuitively and vividly perceive the three-dimensional depth information of the target object in the target image and the layering sense formed by the depth information difference between different target objects by displaying the target image processed based on the augmented reality technology to the user, thereby further improving the interactive experience of the user.

In some embodiments, the method can further comprise the following steps of constructing an effect image layer according to difference values among depth information of different target objects in the target image, wherein the effect image layer comprises the target objects and stereoscopic effect data of target labels of the target objects, performing superposition processing on the target image and the effect image layer to obtain a superposed target image, and displaying the superposed target image.

The effect image layer may be specifically constructed based on an augmented reality technology.

The augmented reality (Augmented Reality, AR) technology described above may specifically refer to a technology that fuses virtual information with real image information, enabling "augmentation" of the real-world user perception.

Through overlapping the target image and the effect image layer constructed based on the augmented reality technology, the overlapped target image with more three-dimensional stereoscopic impression can be obtained, and further, a user can know the target object in the image more clearly by displaying the overlapped target image, so that relatively better interaction experience is obtained. For example, by adopting the AR technology, the hierarchy information between organs and surgical instruments in the surgical environment can be better displayed, so that illusions of medical workers when using an image platform and a doctor console are avoided, and the risk of surgical misoperation is reduced.

In some embodiments, the method may further include constructing a target image with a target label based on a virtual reality technology according to the target image and the target label of the target object.

The Virtual Reality (VR) technology may specifically refer to a three-dimensional simulated Reality technology developed by relying on three-dimensional real-time graphic display, three-dimensional positioning tracking, touch and smell sensing technologies, artificial intelligence technologies, high-speed computing and parallel computing technologies, and technologies such as human behaviours, which can enable a user to be placed in a Virtual environment with three-dimensional vision, hearing, touch and smell, and support information interaction of the user in the Virtual environment.

In the implementation process, the target image with the target label displayed based on the virtual reality technology can be constructed by processing the target image and the target label of the target object according to a virtual environment modeling algorithm.

Accordingly, the medical staff can watch the displayed target image based on the virtual reality technology and displayed with the target label through wearing VR equipment configured by the doctor console, so that hierarchical information between organs and surgical instruments in a surgical environment can be known more clearly, and relatively better interaction experience can be obtained.

In some embodiments, after the target label of the target object is displayed in the target image, the method further comprises the step of displaying the target image with the target label of the target object displayed to a user, so that the user can efficiently know related information of the target object such as the object category and the like through the displayed target image, and simultaneously intuitively sense depth information of the distance between the target object and the lens, and the user can perform specific data processing better based on the information.

For example, a healthcare worker can clearly know organs and surgical instruments of a patient in the current binocular endoscope view according to a target image displayed by the image platform and/or a display device of the doctor console and displaying a target label of a target object, so that the doctor console can control the mechanical arm to accurately perform specific surgical operation on the patient.

In some embodiments, before processing the target image using the image processing model, the method may further include, when implemented, the following:

s1, collecting a sample image set, wherein the sample image set comprises a plurality of sample images which are arranged according to the collection time and contain sample objects;

s2, marking an image area of the sample object in the sample image by using a marking frame, marking the object type of the sample object, and obtaining a marked sample image set;

And S3, training an initial model by using the noted sample image set to obtain an image processing model.

In some embodiments, in practice, a sample object (e.g., an organ or a surgical instrument, etc.) in a corresponding application scene (e.g., a surgical scene) may be photographed by a binocular endoscope according to a preset time interval to obtain a plurality of sample images arranged according to the collection time to construct a sample image set. The method can also record sample videos containing sample objects in an application scene, and then intercept a plurality of screenshots from the sample videos to be used as a plurality of sample images so as to construct a sample image set.

In some embodiments, after a plurality of sample images are acquired, the plurality of images may be preprocessed to remove error data in the sample images, and then a sample image set may be constructed according to the processed sample images.

In some embodiments, the preprocessing may include screening the sample image according to the sharpness, the size of the blurred region, the stability of the picture, etc. of the sample image, so as to reject the sample image that is too bright, too dark, and blurred, and retain the sample image with smaller error and higher precision.

In some embodiments, when labeling a sample image, an image area where a sample object is located may be selected in the sample image by using a labeling frame, and an object type of the sample object is labeled, so as to obtain a labeled sample image set.

Furthermore, the state, the number and other related information of the sample object can be marked in the sample image, so that the marked sample image containing more abundant data information can be obtained. Reference may be made to fig. 11.

In some embodiments, during specific training, an initial model based on YOLO can be built, a labeled sample image set is divided into a training set and a testing set, the initial model is continuously trained by the training set, the trained model is tested by the testing set, training is stopped until a test result meets corresponding precision requirements, and the model at the moment is determined to be the image processing model.

From the above, according to the image data processing method provided by the embodiment of the present disclosure, after the target image is obtained, the target object in the target image is identified, and when the corresponding identification result is obtained, the depth information of the target object may also be obtained at the same time, and then the identification result and the depth information of the target object are used at the same time to generate and display the target tag at least including the object type information of the target object on the target image. The method and the device can automatically identify and utilize the target label to identify the related information of the target object focused by the user in the target image, and can accurately and intuitively convey the three-dimensional depth information of the target object to the user in the two-dimensional target image through the displayed target label, so that the user can obtain better interaction experience. The method comprises the steps of determining a tag size parameter matched with a target object based on a perspective principle according to depth information of the target object, generating a target tag capable of expressing the distance between the target object and a lens according to the tag size parameter, and displaying the target tag in a target image, so that a user can intuitively and efficiently determine the distance between the target object and the lens according to the tag size of the target tag. Further, according to the depth information of the target object, the three-dimensional depth information of the target object can be clearly transmitted to the user by purposefully adjusting the display effect parameters of the target label such as the shadow parameter, the deformation parameter, the brightness parameter and the like and more effectively displaying the target label in the target image. In addition, an effect image layer can be built based on an Augmented Reality (AR) technology by firstly acquiring and according to difference values among depth information of different target objects in a target image, then the target image and the effect image layer are subjected to superposition processing to obtain a superposed target image, and then the superposed target image is displayed to a user, so that the user can obtain relatively better interaction experience. The target object focused by the user can be automatically tracked, so that the interaction experience of the user is further improved.

Referring to fig. 12, the present embodiment also provides a physician console 1200. Specifically, a binocular endoscope 1201 and an image data processing apparatus 1202 may be included. The binocular endoscope 1201 can be used for collecting target images, the image data processing device 1202 can be used for processing the target images in a mode that target objects in the target images are identified to obtain corresponding identification results, depth information of the target objects is obtained, target labels of the target objects are displayed in the target images according to the identification results and the depth information of the target objects, and the target labels at least comprise object category information of the target objects.

The embodiment of the specification also provides computer equipment, which comprises a processor and a memory for storing executable instructions of the processor, wherein the processor can be used for executing the following steps according to the instructions when in specific implementation, acquiring a target image, identifying a target object in the target image to obtain a corresponding identification result, acquiring depth information of the target object, and displaying a target label of the target object in the target image according to the identification result and the depth information of the target object, wherein the target label at least comprises object category information of the target object.

In order to more accurately complete the above instructions, referring to fig. 13, another specific computer device 1300 is provided in this embodiment of the present disclosure, where the computer device includes a network communication port 1301, a processor 1302, and a memory 1303, and the above structures are connected by an internal cable, so that each structure may perform specific data interaction.

The network communication port 1301 may be specifically configured to acquire a target image.

The processor 1302 may specifically be configured to identify a target object in a target image to obtain a corresponding identification result, obtain depth information of the target object, and display a target tag of the target object in the target image according to the identification result and the depth information of the target object, where the target tag at least includes object class information of the target object.

The memory 1303 may be specifically configured to store a corresponding instruction program.

In this embodiment, the network communication port 1301 may be a virtual port that binds with different communication protocols, so that different data may be sent or received. For example, the network communication port may be a port responsible for performing web data communication, a port responsible for performing FTP data communication, a port responsible for performing mail data communication, and a port responsible for performing industrial ethernet fieldbus EtherCAT communication. The network communication port may also be an entity's communication interface or a communication chip. For example, it may be a wireless mobile network communication chip such as GSM, CDMA, etc., it may also be a Wifi chip, it may also be a bluetooth chip.

In this embodiment, the processor 1302 may be implemented in any suitable manner. For example, a processor may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, among others. The description is not intended to be limiting.

In this embodiment, the memory 1303 may include multiple levels, and in a digital system, the memory may be any memory as long as binary data can be stored, in an integrated circuit, a circuit with a memory function without a physical form, such as a RAM, a FIFO, etc., and in a system, a memory device with a physical form, such as a memory bank, a TF memory card, etc.

The embodiment of the specification also provides a computer storage medium based on the image data processing method, wherein the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the method comprises the steps of obtaining a target image, identifying a target object in the target image to obtain a corresponding identification result, obtaining depth information of the target object, and displaying a target label of the target object in the target image according to the identification result and the depth information of the target object, wherein the target label at least comprises object category information of the target object.

In the present embodiment, the storage medium includes, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), a Cache (Cache), a hard disk (HARD DISK DRIVE, HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects of the program instructions stored in the computer storage medium may be explained in comparison with other embodiments, and are not described herein.

Referring to fig. 14, at a software level, the embodiment of the present disclosure further provides an image data processing apparatus 1400, which may specifically include the following structural modules:

An acquisition module 1401, which may be specifically configured to acquire a target image;

The first processing module 1402 may be specifically configured to identify a target object in the target image, obtain a corresponding identification result, and obtain depth information of the target object;

The second processing module 1403 may be specifically configured to display, in the target image, a target tag of the target object according to the recognition result and the depth information of the target object, where the target tag includes at least object class information of the target object.

In some embodiments, the target image specifically may include a first image and/or a second image, where the first image is an image acquired by a left camera, and the second image is an image acquired by a right camera, and the left camera and the right camera are fixed in relative positions.

In some embodiments, when the first processing module 1402 is specifically implemented, the depth information of the target object may be obtained according to a manner that a first distance between a first key pixel point and an optical center of the left camera and a second distance between a second key pixel point and an optical center of the right camera are determined according to the first image and the second image, where the first key pixel point is a pixel point corresponding to a key point of the target object in the first image, the second key pixel point is a pixel point corresponding to a key point of the target object in the second image, and a vertical distance between the key point of the target object and a plane where the left camera and the right camera are located is calculated according to the first distance and the second distance, as the depth information of the target object.

In some embodiments, when the first processing module 1402 is specifically implemented, the perpendicular distance between the key point of the target object and the planes of the left camera and the right camera may be calculated according to the first distance and the second distance by obtaining the baseline distance between the left camera and the right camera and the camera focal length, and calculating the perpendicular distance between the key point of the target object and the planes of the left camera and the right camera by using the first distance, the second distance, the baseline distance and the camera focal length.

In some embodiments, when the second processing module 1403 is specifically implemented, the target label of the target object may be displayed in the target image according to the recognition result and the depth information of the target object, by generating the label of the target object according to the recognition result of the target object, determining a label size parameter matched with the target object according to a preset matching rule by using the depth information of the target object, adjusting the character size in the label of the target object according to the label size parameter, to obtain the target label of the target object, and displaying the target label of the target object in the target image.

In some embodiments, when the second processing module 1403 is specifically implemented, the tag size parameter matched with the target object may be determined according to a preset matching rule by using the depth information of the target object, where the reference plane is a plane where a pixel point with a maximum value of the depth information in the target image is located, acquiring a distance between the reference pixel point and a center of the reference plane, and a maximum value of the depth information in the target image, and calculating the tag size parameter matched with the target object by using the depth information of the target object, the distance between the reference pixel point and the center of the reference plane, and the maximum value of the depth information in the target image according to a preset matching relation parameter.

In some embodiments, when the second processing module 1403 is implemented, after obtaining the target label of the target object, in implementation, the display effect parameter of the target label of the target object may be adjusted according to the depth information of the target object, so as to obtain the target label with the adjusted display effect, where the display effect parameter includes at least one of a shadow parameter, a deformation parameter, and a brightness parameter, and the corresponding target label with the adjusted display effect is displayed in the target image.

In some embodiments, the second processing module 1403 may be specifically configured to display the target tag of the target object in the target image by acquiring a position parameter of the target object in the target image, determining a display position coordinate according to the position parameter of the target object in the target image, and setting and displaying the target tag of the target object at a corresponding position in the target image according to the display position coordinate.

In some embodiments, the second processing module 1403 may be further configured to construct an effect image layer according to a difference value between depth information of different target objects in the target image when the second processing module is specifically implemented, where the effect image layer includes the target object and stereoscopic effect data of a target tag of the target object, perform superposition processing on the target image and the effect image layer to obtain a superimposed target image, and display the superimposed target image.

In some embodiments, the second processing module 1403 may be further configured to construct, based on the target image and the target label of the target object, a target image with the target label displayed based on the virtual reality technology when the second processing module is specifically implemented.

In some embodiments, when the first processing module 1402 is specifically implemented, the target object in the target image may be identified, so as to obtain a corresponding identification result, where the target image is processed by using an image processing model, and an object class of the target object in the target image is determined as the identification result.

In some embodiments, the image processing model includes at least one of an image processing model trained based on a Yolo model, an image processing model trained based on a Fast Yolo model, and an image processing model trained based on an R-CNN model.

In some embodiments, when the first processing module 1402 is specifically implemented, the target image may be processed by using an image processing model, so as to determine an object class of a target object in the target image, where the image processing model is used to rasterize the target image to obtain a rasterized target image, and a convolutional neural network is used to detect and identify the object class of the target object in the target image by processing the rasterized target image, where the convolutional neural network is a model structure in the image processing model.

In some embodiments, when the first processing module 1402 is specifically implemented, the object class of the object in the object image may be detected and identified by processing the rasterized object image using a convolutional neural network, where the first sub-image area is a non-background image area, detecting whether the object exists in the first sub-image area using the convolutional neural network, and dividing a second sub-image area from the first sub-image area using a detection frame according to a detection result, where the second sub-image area is an image area including the object to be identified, and determining the object class of the object by performing image identification on the second sub-image area using the convolutional neural network.

In some embodiments, after the second sub-image region is segmented from the first sub-image region by using the detection frame, the first processing module 1402 may be further configured to filter the second sub-image region segmented based on the redundant detection frame from the second sub-image region by using a non-maximum suppression algorithm when the first processing module 1402 is implemented, so as to obtain a modified second sub-image region, and accordingly, the first processing module 1402 may determine the object class of the target object by performing image recognition on the modified second sub-image region by using a convolutional neural network.

In some embodiments, the first processing module 1402 may be further configured to extract a target state feature from the second sub-image area while processing the target image using the image processing model to determine an object class of a target object in the target image, determine a state of the target object according to the target state feature and a preset state database, and accordingly, the first processing module 1402 may combine the object class of the target object and the state of the target object as a recognition result.

In some embodiments, the image data processing apparatus 1400 may be further configured to acquire a sample image set before processing the target image using the image processing model, where the sample image set includes a plurality of sample images including sample objects arranged according to an acquisition time, annotate an image region of the sample object in the sample image using an annotation frame, annotate an object class of the sample object, and obtain an annotated sample image set, and train an initial model using the annotated sample image set to obtain the image processing model.

It should be noted that, the units, devices, or modules described in the above embodiments may be implemented by a computer chip or entity, or may be implemented by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

From the above, it can be seen that the image data processing apparatus provided in the embodiments of the present disclosure identifies a target object in a target image after the target image is acquired, and may also simultaneously acquire depth information of the target object when a corresponding identification result is obtained, and then generate and display a target tag including at least object category information of the target object on the target image by using the identification result and the depth information of the target object. The method and the device can automatically identify and utilize the target label to identify the related information of the target object focused by the user in the target image, and can accurately and intuitively convey the three-dimensional depth information of the target object to the user in the two-dimensional target image through the displayed target label, so that the user can obtain better interaction experience.

Although the present description provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an apparatus or client product in practice, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. The terms first, second, etc. are used to denote a name, but not any particular order.

Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of embodiments, it will be apparent to those skilled in the art that the present description may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be embodied essentially in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present specification.

Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. The specification is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable electronic device, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like.

Although the present specification has been described by way of example, it will be appreciated by those skilled in the art that there are many variations and modifications to the specification without departing from the spirit of the specification, and it is intended that the appended claims encompass such variations and modifications as do not depart from the spirit of the specification.

Claims

1. An image data processing method, comprising:

The method comprises the steps of acquiring a target image, wherein the target image comprises a first image and/or a second image, the first image and the second image are images acquired through a camera, and the target image is an image in a surgical scene;

Identifying a target object in the target image according to the first image or the second image to obtain a corresponding identification result;

The method comprises the steps of determining a tag size parameter matched with a target object according to preset matching rules by utilizing the depth information of the target object, adjusting the character size in the tag of the target object according to the tag size parameter to obtain the target tag of the target object for conveying the distance between the target object and a lens, and displaying the target tag of the target object in a target image, wherein the target tag at least comprises object category information of the target object;

The method comprises the steps of determining a tag size parameter matched with a target object by utilizing depth information of the target object according to a preset matching rule, and calculating the tag size parameter matched with the target object by utilizing the depth information of the target object, the distance between the reference pixel point and the center of the reference plane and the maximum value of the depth information in the target image according to the preset matching relation parameter, and the maximum value of the depth information in the target image according to the depth information of the target object and the maximum value of the depth information in the reference plane.

2. The image data processing method according to claim 1, wherein the first image is an image acquired by a left camera, the second image is an image acquired by a right camera, and the left camera and the right camera are fixed in relative positions.

3. The image data processing method according to claim 2, wherein the target image includes a first image acquired by a binocular endoscope, and/or a second image;

Accordingly, the target object comprises a tissue organ, and/or a surgical instrument.

4. The image data processing method according to claim 2, wherein acquiring depth information of the target object includes:

Determining a first distance between a first key pixel point and the optical center of the left camera and a second distance between a second key pixel point and the optical center of the right camera according to the first image and the second image, wherein the first key pixel point is a pixel point corresponding to a key point of a target object in the first image, and the second key pixel point is a pixel point corresponding to a key point of the target object in the second image;

and calculating the vertical distance between the key point of the target object and the planes of the left camera and the right camera according to the first distance and the second distance, and taking the vertical distance as the depth information of the target object.

5. The image data processing method according to claim 4, wherein calculating the vertical distance between the key point of the target object and the planes of the left camera and the right camera based on the first distance and the second distance comprises:

acquiring a baseline distance between a left camera and a right camera and a camera focal length;

And calculating the vertical distance between the key point of the target object and the planes of the left camera and the right camera by using the first distance, the second distance, the baseline distance and the camera focal length.

6. The image data processing method according to claim 1, wherein after obtaining the target tag of the target object, the method further comprises:

According to the depth information of the target object, adjusting display effect parameters of a target label of the target object to obtain the target label with the adjusted display effect, wherein the display effect parameters comprise at least one of shadow parameters, deformation parameters and brightness parameters;

correspondingly, displaying the target label with the adjusted display effect in the target image.

7. The image data processing method according to claim 1, wherein displaying the target tag of the target object in the target image includes:

acquiring a position parameter of a target object in a target image;

Determining display position coordinates according to the position parameters of the target object in the target image;

And setting and displaying the target label of the target object at the corresponding position in the target image according to the display position coordinates.

8. The image data processing method according to claim 7, characterized in that the method further comprises:

constructing an effect image layer according to difference values among depth information of different target objects in a target image, wherein the effect image layer comprises three-dimensional effect data of the target objects and target labels of the target objects;

performing superposition processing on the target image and the effect image layer to obtain a superposed target image;

And displaying the superimposed target image.

9. The image data processing method according to claim 2, wherein identifying the target object in the target image to obtain the corresponding identification result includes:

and processing the target image by using an image processing model, and determining the object type of the target object in the target image as the identification result.

10. The image data processing method according to claim 9, wherein the image processing model comprises at least one of an image processing model trained based on a YOLO model, an image processing model trained based on a Fast YOLO model, and an image processing model trained based on an R-CNN model.

11. The image data processing method according to claim 10, wherein processing the target image using an image processing model, determining an object class of a target object in the target image, comprises:

performing rasterization on the target image by using an image processing model to obtain a rasterized target image;

and detecting and identifying the object type of the target object in the target image by processing the rasterized target image by using a convolutional neural network, wherein the convolutional neural network is a model structure in an image processing model.

12. The image data processing method according to claim 11, wherein detecting and identifying the object class of the target object in the target image by processing the rasterized target image using a convolutional neural network, comprises:

Dividing a first sub-image area from the rasterized target image by using a convolutional neural network by using a dividing frame, wherein the first sub-image area is a non-background image area;

detecting whether a target object exists in the first sub-image area by utilizing a convolutional neural network, and dividing a second sub-image area from the first sub-image area by utilizing a detection frame according to a detection result, wherein the second sub-image area is an image area containing the target object to be identified;

and determining the object category of the target object by carrying out image recognition on the second sub-image area by using a convolutional neural network.

13. The image data processing method according to claim 12, wherein after the second sub-image area is divided from the first sub-image area using the detection frame, the method further comprises:

filtering the second sub-image region divided based on the redundancy detection frame from the second sub-image region by using a non-maximum suppression algorithm to obtain a corrected second sub-image region;

In a corresponding manner,

And determining the object type of the target object by carrying out image recognition on the corrected second sub-image area by using a convolutional neural network.

14. The image data processing method according to claim 12, wherein while processing the target image with the image processing model, determining an object class of a target object in the target image, the method further comprises:

Extracting a target state feature from the second sub-image region;

determining the state of the target object according to the target state characteristics and a preset state database;

In a corresponding manner,

And combining the object category of the target object and the state of the target object as a recognition result.

15. The image data processing method according to claim 9, wherein before processing the target image using an image processing model, the method further comprises:

collecting a sample image set, wherein the sample image set comprises a plurality of sample images which are arranged according to the collection time and contain sample objects;

marking an image area of the sample object in the sample image by using a marking frame, marking the object category of the sample object, and obtaining a marked sample image set;

and training an initial model by using the noted sample image set to obtain an image processing model.

16. The image data processing method according to claim 7, characterized in that the method further comprises:

And constructing a target image with the target label based on the virtual reality technology according to the target image and the target label of the target object.

17. An image platform comprising a binocular endoscope for acquiring a target image and an image data processing apparatus for processing the target image using the image data processing method of any one of claims 1 to 16.

18. A computer device comprising a processor and a memory for storing processor-executable instructions which when executed by the processor implement the associated steps of the image data processing method of any one of claims 1 to 16.

19. A computer readable storage medium having stored thereon computer instructions which when executed perform the relevant steps of the image data processing method of any of claims 1 to 16.