CN113763569B

CN113763569B - Image labeling method and device used in three-dimensional simulation and electronic equipment

Info

Publication number: CN113763569B
Application number: CN202111003690.2A
Authority: CN
Inventors: 陈培俊; 朱永东; 赵志峰; 赵旋; 时强; 刘云涛; 朱凯男; 杨斌
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2024-10-01
Anticipated expiration: 2041-08-30
Also published as: CN113763569A

Abstract

The invention discloses an image labeling method and device used in three-dimensional simulation and electronic equipment, wherein the method comprises the following steps: constructing a three-dimensional scene, wherein the three-dimensional scene comprises a target object, a virtual camera and a shielding object; obtaining an image in a visual field range through a virtual camera; judging whether the target object is in the visual field of the virtual camera or not, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object; and if any detection point is not blocked by the blocking object, marking the target object in the image.

Description

Image labeling method and device used in three-dimensional simulation and electronic equipment

Technical Field

The present application relates to the field of data labeling, and in particular, to an image labeling method and apparatus used in three-dimensional simulation, and an electronic device.

Background

Image annotation, in short, annotates the type, position, speed and other attributes of the target object on the image as training data of a depth model or other classifier. It is important to train an excellent algorithm, have more data and obtain more accurate annotation information.

The existing training data mainly come from the field collection of the physical sensor, the data collection cost is high, the difficulty is high, and the later labeling is mainly performed manually; the method has the advantages that the environment for collecting training data is highly consistent with the working environment of the final model, and the method is very helpful to the recognition accuracy of the model. Recently, three-dimensional simulation technology is also used for generating training data, and three-dimensional simulation has the advantages of low data generation cost and high speed, and has the defect that the recognition accuracy of a model can be affected to a certain extent due to the fact that the simulation data are different from a real environment to some extent. The simulation data is taken as a supplement, so that the simulation data is a good choice.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: the existing labeling method mainly uses manual labeling and is assisted by semi-automatic labeling. The semi-automatic labeling is to train a less accurate primary model with a small amount of manually labeled data, then automatically identify the targets in the images by using the primary model, output the types, positions and other attribute data of the target objects, inaccurately output labels and manually and accurately adjust the output labels. Training a more accurate model … … with more data can be repeated multiple times, spiraling up.

Disclosure of Invention

The embodiment of the application aims to provide a method, a device and electronic equipment, which are used for solving the technical problem that the image cannot be fully-automatically marked in the related technology.

According to a first aspect of an embodiment of the present application, there is provided an image labeling method used in three-dimensional simulation, including:

constructing a three-dimensional scene, wherein the three-dimensional scene comprises a target object, a virtual camera and a shielding object;

obtaining an image in a visual field range through a virtual camera;

judging whether the target object is in the visual field of the virtual camera or not, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object;

and if any detection point is not blocked by the blocking object, marking the target object in the image.

Further, determining whether the target object is in the field of view of the virtual camera, and if the target object is in the field of view of the virtual camera, generating a detection point of the target object includes:

acquiring a three-dimensional boundary frame of a target object, and calculating physical coordinates of each vertex on the three-dimensional boundary frame in an image according to the three-dimensional boundary frame;

calculating to obtain a physical coordinate boundary of the virtual camera according to the image and the view angle of the virtual camera;

if the physical coordinates of the image of any vertex are within the physical coordinate boundary of the virtual camera, the target object is within the visual field of the virtual camera;

and generating a detection point of the target object according to the physical coordinates of the vertex, the physical coordinate boundary of the virtual camera and the image.

Specifically, a three-dimensional boundary box of a target object is obtained, and according to the three-dimensional boundary box, the physical coordinates of each vertex on the three-dimensional boundary box in an image are calculated, wherein the method comprises the following steps:

acquiring a first world coordinate, a first direction and a length, width and height of a three-dimensional boundary frame of a target object;

Calculating a second world coordinate of the vertex on the three-dimensional boundary frame of the target object according to the first world coordinate, the first direction and the length, width and height of the three-dimensional boundary frame;

Converting the second world coordinates into camera coordinates;

And calculating the physical coordinates of the image of the vertex according to the coordinates of the camera.

Specifically, generating a detection point of the target object according to the physical coordinates of the vertex, the physical coordinate boundary of the virtual camera and the image, including:

calculating the image pixel coordinates of the vertex according to the image physical coordinates of the vertex, the physical coordinate boundary of the virtual camera and the width and height of the output image pixel;

Calculating the length and width of a two-dimensional boundary frame of the target object according to the image pixel coordinates;

according to the length, width and height of the three-dimensional boundary frame, calculating and obtaining the body diagonal length of the three-dimensional boundary frame;

Calculating to obtain a contraction step length according to the length and width of the two-dimensional boundary frame and the body diagonal length of the three-dimensional boundary frame;

Shrinking the three-dimensional boundary frame according to the shrinkage step length and the body diagonal length of the three-dimensional boundary frame;

and generating a detection point of the target object according to the contracted three-dimensional boundary frame and the center point of the target object.

Further, shrinking the three-dimensional bounding box according to the shrinkage step size and the body diagonal length of the three-dimensional bounding box, including:

calculating to obtain shrinkage parameters according to the shrinkage step length and the body diagonal length of the three-dimensional boundary frame;

setting a shrinkage ratio;

According to the shrinkage proportion, shrinking the three-dimensional boundary frame along the body diagonal direction of the target object; updating the shrinkage ratio, and taking the difference of the shrinkage ratio minus the shrinkage parameter as a new shrinkage ratio; this step is repeated until the shrinkage ratio is less than or equal to zero.

Specifically, if any one of the detection points is not blocked by the blocking object, labeling the target object in the image includes:

acquiring a third world coordinate of the virtual camera;

judging whether a shielding object exists between the virtual camera and the detection point according to the third world coordinate and the world coordinate of the detection point, and if any detection point is not shielded, marking a two-dimensional boundary box of the target object in the image.

According to a second aspect of an embodiment of the present application, there is provided an image labeling apparatus for use in three-dimensional simulation, including:

the construction module is used for constructing a three-dimensional scene, wherein the three-dimensional scene comprises a target object, a virtual camera and a shielding object;

The acquisition module acquires an image in a visual field range through the virtual camera;

The generation module is used for judging whether the target object is in the visual field of the virtual camera or not, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object;

and the labeling module is used for labeling the target object in the image if any detection point is not blocked by the blocking object.

According to a third aspect of an embodiment of the present application, there is provided an electronic apparatus including:

One or more processors;

A memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.

According to a fourth aspect of embodiments of the present application there is provided a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method according to the first aspect.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

as can be seen from the above embodiments, the present invention is an automatic image labeling method used in three-dimensional simulation application, in which a target object, a virtual camera, and a shielding object are added into a three-dimensional scene; obtaining an image in a visual field range through a virtual camera; calculating whether a target object is in the visual field of the virtual camera, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object; calculating whether shielding exists between a virtual camera and a detection point of the target object, and if any detection point of the target object is not shielded by the shielding object, marking the target object in the image when the target appears in the image. The invention realizes continuous output of the labeling information and the image data in a low-cost three-dimensional simulation mode, and is an important source for acquiring intelligent model training data. Compared with the traditional method, the marking method has the advantages of low cost, high precision, high speed and the like, and can be applied to the fields of automatic driving of vehicles, cooperative technology of vehicles and roads, digital twinning, intelligent traffic intersections and the like.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart illustrating a method of image annotation for use in three-dimensional simulation, according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating step S103 according to an exemplary embodiment;

fig. 3 is a flowchart illustrating step S201, according to an exemplary embodiment;

FIG. 4 is a schematic diagram illustrating the perspective of a virtual camera according to an exemplary embodiment;

FIG. 5 is a flowchart illustrating step S204, according to an exemplary embodiment;

FIG. 6 is a flowchart illustrating step S405, according to an exemplary embodiment;

FIG. 7 is a flowchart illustrating step S104, according to an exemplary embodiment;

FIG. 8 is a block diagram illustrating an image annotation device for use in three-dimensional simulation, according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

FIG. 1 is a flow chart illustrating a method of image annotation used in three-dimensional simulation, according to an exemplary embodiment, as shown in FIG. 1, may include the steps of:

Step S101: constructing a three-dimensional scene, wherein the three-dimensional scene comprises a target object, a virtual camera and a shielding object;

Step S102: obtaining an image in a visual field range through a virtual camera;

step S103: judging whether the target object is in the visual field of the virtual camera or not, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object;

step S104: and if any detection point is not blocked by the blocking object, marking the target object in the image.

As can be seen from the above embodiments, the present application is an automatic image labeling method used in three-dimensional simulation application, in which a target object, a virtual camera and a shielding object are added into a three-dimensional scene; obtaining an image in a visual field range through a virtual camera; calculating whether a target object is in the visual field of the virtual camera, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object; calculating whether shielding exists between a virtual camera and a detection point of the target object, and if any detection point of the target object is not shielded by the shielding object, marking the target object in the image when the target appears in the image. The application realizes continuous output of the labeling information and the image data by a low-cost three-dimensional simulation mode. Compared with the traditional method, the marking method has the advantages of low cost, high precision, high speed and the like, and can be applied to the fields of automatic driving of vehicles, cooperative technology of vehicles and roads, digital twinning, intelligent traffic intersections and the like.

In the implementation of step S101, a three-dimensional scene is constructed, where the three-dimensional scene includes a target object, a virtual camera, and a shielding object;

Specifically, a three-dimensional scene reconstructed by virtual or real environment is created and imported into a virtual engine, and the virtual engine UE4 (Unreal Engine 4) is used in the present embodiment; adding a plurality of movable target objects (mainly referred to as people, non-motor vehicles and motor vehicles) in a scene, wherein the target objects have autonomous movement capability, such as movement along a set route, so as to ensure the diversity of output images; the virtual camera is installed, camera parameters (including image resolution, field angle and the like) are set, and the installation position (including coordinates and orientation) is set.

In order to provide rich scenes and accurately annotated image data. By means of creating the virtual scene, the scene of the image can be enriched; modeling for a specific real environment can improve the algorithm performance in this scenario. The autonomous movement of the target object is to ensure the diversity of the images so as to ensure that the training data is rich enough.

In the implementation of step S102, an image in a field of view is acquired by a virtual camera;

Specifically, a virtual camera is placed at a specific position in the three-dimensional scene through the UE4, and a corresponding image is output through graphics card operation.

In the implementation of step S103, determining whether the target object is within the field of view of the virtual camera, and if the target object is within the field of view of the virtual camera, generating a detection point of the target object; specifically, as shown in fig. 2, this step includes the sub-steps of:

step S201: acquiring a three-dimensional boundary frame of a target object, and calculating physical coordinates of each vertex on the three-dimensional boundary frame in an image according to the three-dimensional boundary frame; specifically, as shown in fig. 3, this step may include the following process:

Step S301: acquiring a first world coordinate, a first direction and a length, width and height of a three-dimensional boundary frame of a target object;

Specifically, the first world coordinate (X _i,Y_i,Z_i), the orientation (Pitch _i,Yaw_i,Roll_i), the length, width and height (W _i,H_i,L_i) of the three-dimensional bounding box of the target object. The three-dimensional bounding box is specified when the target object is added in step S101, such as the length, width, height of the vehicle, and the coordinates and orientation of the target object are updated continuously in the simulation.

Step S302: calculating a second world coordinate of the vertex on the three-dimensional boundary frame of the target object according to the first world coordinate, the first direction and the length, width and height of the three-dimensional boundary frame;

Specifically, according to the first world coordinate (X _i,Y_i,Z_i), the orientation (Pitch _i,Yaw_i,Roll_i) and the length, width and height (W _i,H_i,L_i) of the three-dimensional bounding box of the target object. Converting the target object to a local coordinate system:

under the local coordinate system, the center point (X _i,Y_i,Z_i) of the target object is respectively translated along the positive and negative directions of the X, Y, Z axis, as follows,

World coordinates of 8 vertices of the three-dimensional bounding box are obtained. The bounding box is the smallest cuboid surrounding the target, and the three-dimensional bounding box of the target object can enable the subsequent calculation for judging whether the target object is in the field of view of the camera to be more accurate.

Step S303: converting the second world coordinates into camera coordinates;

Specifically, the coordinate of the position of the camera is a third world coordinate (X _s,Y_s,Z_s); taking vertex (X ₀,Y₀,Z₀) as an example, converting to camera coordinates

Wherein R is the same as S302.

Other vertices are transformed in the same way, all vertices being represented as

Step S304: and calculating the physical coordinates of the image of the vertex according to the coordinates of the camera.

Specifically, the virtual camera defaults to orient the view angle to the x-axis, as shown in fig. 4 below, the imaging plane of the camera corresponds to the yz plane, and according to the imaging principle of the camera, the image physical coordinates (x _j,y_j) of the vertex are calculated by using the following formula:

The physical coordinates of the image are obtained to calculate the pixel coordinates of the image.

Step S202: calculating to obtain a physical coordinate boundary of the virtual camera according to the image and the view angle of the virtual camera;

Specifically, according to the camera imaging principle, the physical coordinate boundary of the virtual camera is calculated by using the following formula

Wherein (W _s,H_s) is the width and height of the pixel of the output image, and is an adjustable parameter, and the virtual camera needs to be manually set during initialization, and generally (1920,1080), (1280, 720) and the like are preferable. Fov is the angle of view, and is also a setting parameter, and 60 degrees, 90 degrees, etc. can be more than 0 and less than 180. These parameters must be set when the simulation begins to place the camera. The physical coordinate boundary is calculated to compare whether the coordinates of the target object fall within the camera range.

Step S203: if the physical coordinates of the image of any vertex are within the physical coordinate boundary of the virtual camera, the target object is within the visual field of the virtual camera;

Specifically, if x _j is in the range [ -x _limit,x_limit ] and y _j is in the range [ -y _limit,y_limit ], then the current vertex is within the camera field of view;

Step S204: and generating a detection point of the target object according to the physical coordinates of the vertex, the physical coordinate boundary of the virtual camera and the image. Specifically, as shown in fig. 5, this step may include the sub-steps of:

step S401: calculating the image pixel coordinates of the vertex according to the image physical coordinates of the vertex, the physical coordinate boundary of the virtual camera and the width and height of the output image pixel;

specifically, according to the principle of similar triangle imaged by a camera, the image pixel coordinates of the vertex are calculated by using the following formula:

It is deduced that,

Wherein (x _j,y_j) is the physical coordinates of the image of the vertex, x _limit、y_limit is the physical coordinate boundary of the virtual camera, and (W _s,H_s) is the width and height of the pixel of the output image.

Step S402: calculating the length and width of a two-dimensional boundary frame of the target object according to the image pixel coordinates;

Specifically, the two-dimensional bounding box is represented as: [ u _min,v_min,u_max,v_max ] calculated according to the following formula:

wherein C _F is the number of vertices of the target object in the virtual camera field of view, (u _j,v_j) is the image pixel coordinates of the vertices; namely, in the two-dimensional image, taking a minimum rectangle which can surround a target object; the portion exceeding the image boundary is truncated by the boundary.

The length and width calculation mode of the two-dimensional boundary box of the target object is as follows:

The above equation is obviously true given the maximum and minimum coordinates of the rectangular frame in the lateral and longitudinal directions. The length and width of the two-dimensional boundary box are calculated for use in generating detection points, and the larger the length and width value is, the closer the target object is to the camera, the fewer detection points need to be generated.

Step S403: according to the length, width and height of the three-dimensional boundary frame, calculating and obtaining the body diagonal length of the three-dimensional boundary frame;

specifically, the body diagonal length of the three-dimensional bounding box of the target object may be calculated according to the following equation:

Where W _i is the length of the three-dimensional bounding box, H _i is the width of the three-dimensional bounding box, and L _i is the height of the three-dimensional bounding box.

Step S404: calculating to obtain a contraction step length according to the length and width of the two-dimensional boundary frame and the body diagonal length of the three-dimensional boundary frame;

Specifically, the calculation formula of the contraction step is as follows:

Where α may take any positive value, and in this embodiment α takes a fixed value between [10,100 ].

The smaller the value of alpha is, the smaller d is, the more detection is generated, the higher the calculation precision is, and the larger the calculation amount is; conversely, the smaller the value of alpha is, the larger d is, the smaller the generated detection is, the lower the calculation accuracy is, and the smaller the calculation amount is; meanwhile, the smaller L _u,L_v is, the larger d is, and the fewer detection points are generated; otherwise, the more detection points are generated; it can be seen that α is a parameter for manually adjusting the number of detection points, and the imaging size (distance from the camera) of the target object automatically adjusts the number of detection points.

Step S405: according to the contraction step length, the three-dimensional boundary frame is contracted; further, as shown in fig. 6, this step may further include:

Step S501: calculating to obtain shrinkage parameters according to the shrinkage step length and the body diagonal length of the three-dimensional boundary frame;

specifically, the shrinkage parameter k is calculated according to the following formula:

Where D is the contraction step size and D is the body diagonal length of the three-dimensional bounding box.

Step S502: setting a shrinkage ratio;

Specifically, the shrinkage ratio r=1 is set, and the initial value is cycled, i.e., no shrinkage.

Step S503: according to the shrinkage proportion, shrinking the three-dimensional boundary frame along the body diagonal direction of the target object; updating the shrinkage ratio, and taking the difference of the shrinkage ratio minus the shrinkage parameter as a new shrinkage ratio; repeating this step until the shrinkage ratio is less than or equal to zero;

Specifically, the three-dimensional bounding box is contracted from (W _i,H_i,L_i) to (r×w _i,r×H_i,r×L_i) along the body diagonal direction of the target object; after the shrinkage is completed, the shrinkage ratio r=r-k is updated; repeating the process of the step until r is less than or equal to 0; the bounding box is shrunk to generate more evenly distributed detection points so that partially occluded objects can still be detected.

Step S406: generating a detection point of the target object according to the contracted three-dimensional boundary frame or the center point of the target object;

specifically, 8 vertexes of the contracted three-dimensional boundary frame or the center point of the target object (r is less than or equal to 0) is used as a detection point of the target object; the generated detection point is used as a premise of detecting whether shielding exists between the camera and the detection point or not.

In step S502, after each time of contraction, the three-dimensional bounding box after contraction generates detection points of the target object as described in step S406, that is, if the target object is contracted n times in step S502, 8×n+1 detection points are generated in step S406.

In the implementation of step S104, if any one of the detection points is not blocked by the blocking object, the target object in the image is marked. Specifically, as shown in fig. 7, this step includes the following sub-steps:

Step S601: acquiring a third world coordinate of the virtual camera;

Specifically, a third world coordinate (X _s,Y_s,Z_s) of the virtual camera is acquired; the camera coordinates are placed at the time of creation of the three-dimensional simulation environment and the coordinates are also determined.

Step S602: judging whether a shielding object exists between the virtual camera and the detection point according to the third world coordinate and the world coordinate of the detection point, and if any detection point is not shielded, marking a two-dimensional boundary box of the target object in the image.

Specifically, using a ray collision detection algorithm in a game engine to detect whether an occlusion object exists between the virtual camera and a detection point, wherein for a transparent occlusion object (such as glass) and a grid occlusion object (such as wire mesh), the detection method can be set to be non-occlusion so as to ensure that an object behind the transparent occlusion object (such as wire mesh) can be detected; if any one of all detection points of the target object is not blocked, it is indicated that the target object is not completely blocked, and the two-dimensional bounding box of the target object is marked in the image in step S102.

The application also provides an embodiment of an image labeling device used in the three-dimensional simulation, corresponding to the embodiment of the image labeling method used in the three-dimensional simulation.

FIG. 8 is a block diagram illustrating an image annotation device for use in three-dimensional simulation, according to an exemplary embodiment. Referring to fig. 8, the apparatus includes:

The construction module 21 constructs a three-dimensional scene, wherein the three-dimensional scene comprises a target object, a virtual camera and a shielding object;

The acquisition module 22 acquires an image in a visual field range through the virtual camera;

A generating module 23, configured to determine whether the target object is within the field of view of the virtual camera, and if the target object is within the field of view of the virtual camera, generate a detection point of the target object;

and the labeling module 24 is used for labeling the target object in the image if any detection point is not blocked by the blocking object.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.

Correspondingly, the application also provides electronic equipment, which comprises: one or more processors; a memory for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement an image annotation method for use in three-dimensional simulation as described above.

Correspondingly, the application also provides a computer readable storage medium, on which computer instructions are stored, characterized in that the instructions, when executed by a processor, implement an image labeling method for use in three-dimensional simulation as described above.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An image labeling method for use in three-dimensional simulation, comprising:

obtaining an image in a visual field range through a virtual camera;

Judging whether the target object is in the visual field of the virtual camera or not, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object; comprising the following steps:

Acquiring a three-dimensional boundary frame of a target object, and calculating physical coordinates of each vertex on the three-dimensional boundary frame in an image according to the three-dimensional boundary frame; comprising the following steps:

Converting the second world coordinates into camera coordinates;

calculating the physical coordinates of the image of the vertex according to the coordinates of the camera;

generating a detection point of the target object according to the physical coordinates of the vertex, the physical coordinate boundary of the virtual camera and the image; comprising the following steps:

generating a detection point of the target object according to the contracted three-dimensional boundary frame and the center point of the target object;

If any detection point is not blocked by the blocking object, marking the target object in the image, including:

acquiring a third world coordinate of the virtual camera;

2. The method of claim 1, wherein shrinking the three-dimensional bounding box according to the shrink step size and the body diagonal length of the three-dimensional bounding box comprises:

setting a shrinkage ratio;

3. An image annotation device for use in three-dimensional simulation, comprising:

the generation module is used for judging whether the target object is in the visual field of the virtual camera or not, and if the target object is in the visual field of the virtual camera, generating a detection point of the target object; comprising the following steps:

Converting the second world coordinates into camera coordinates;

The labeling module is used for labeling the target object in the image if any detection point is not blocked by the blocking object, and comprises the following steps:

acquiring a third world coordinate of the virtual camera;

4. An electronic device, comprising:

One or more processors;

A memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-2.

5. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to any of claims 1-2.