WO2024146365A1

WO2024146365A1 - Object detection method and apparatus and storage medium

Info

Publication number: WO2024146365A1
Application number: PCT/CN2023/139598
Authority: WO
Inventors: 王梓楠; 黄河
Original assignee: 中兴通讯股份有限公司
Priority date: 2023-01-03
Filing date: 2023-12-18
Publication date: 2024-07-11
Also published as: CN118334113A

Abstract

Provided in the embodiments of the present disclosure are an object detection method and apparatus and a storage medium. The method comprises: acquiring a detection image from a camera apparatus, the detection image comprising an image corresponding to an object; according to the detection image, determining the position of the imaging point of the object on the image plane of the camera apparatus; according to the position of the imaging point of the object on the image plane of the camera apparatus and the position of the optical center of the camera apparatus, determining an object ray corresponding to the object, the object ray passing through the position of the imaging point of the object and the position of the optical center of the camera apparatus; and, according to the object ray corresponding to the object, determining a first position of the object in a 3D space.

Description

Target detection method, device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure is based on Chinese patent application 202310002393.9 filed on January 3, 2023, with the invention name “Target Detection Method, Device and Storage Medium”, and claims the priority of the patent application, and all the contents disclosed therein are incorporated into the present disclosure by reference.

Technical Field

The embodiments of the present disclosure relate to the field of visual detection technology, and in particular to a target detection method, a target detection device, and a storage medium.

Background technique

3D object detection algorithms based on pure vision are an important research direction in the field of object detection. However, pure vision detection schemes generally have large errors in depth estimation, which leads to inaccurate 3D position estimation. In order to improve this shortcoming, academia and industry have tried to use multi-cameras to improve the accuracy of depth estimation, but the improvement effect of multi-cameras on depth measurement is not obvious, which makes 3D position estimation still inaccurate.

Summary of the invention

Based on this, the embodiments of the present disclosure provide a target detection method, a target detection device and a storage medium, which can improve the estimation accuracy of the 3D position of the target.

In a first aspect, an embodiment of the present disclosure provides a target detection method, the method comprising:

Acquire a detection image of the camera device, wherein the detection image includes an image corresponding to the target;

Determining the position of the imaging point of the target on the image plane of the camera device according to the detection image;

Determine a target ray corresponding to the target according to the position of the imaging point of the target on the image plane of the imaging device and the position of the optical center of the imaging device, wherein the target ray passes through the position of the imaging point of the target and the position of the optical center of the imaging device;

A first position of the target in the 3D space is determined according to the target ray corresponding to the target.

In a second aspect, an embodiment of the present disclosure provides a target detection device, which includes a memory and a processor, wherein the memory is configured to store a computer program; the processor is configured to execute the computer program and implement the target detection method as described above when executing the computer program.

In a third aspect, an embodiment of the present disclosure provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor enables the processor to implement the target detection method as described above.

The embodiments of the present disclosure provide a target detection method, a target detection device, and a storage medium, which obtain a detection image of a camera device, wherein the detection image includes an image corresponding to a target; determine the position of an imaging point of the target on an image plane of the camera device according to the detection image; determine a target ray corresponding to the target according to the position of the imaging point of the target on the image plane of the camera device and the position of the optical center of the camera device, wherein the target ray passes through the target. The position of the imaging point and the position of the optical center of the camera device; and determining the first position of the target in the 3D space according to the target ray corresponding to the target.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a schematic diagram of a flow chart of an embodiment of a target detection method disclosed herein;

FIG2 is a flow chart of another embodiment of the target detection method disclosed herein;

FIG3 is a flow chart of another embodiment of the target detection method disclosed herein;

FIG4 is a schematic diagram of a flow chart of another embodiment of the target detection method disclosed herein;

FIG5 is a schematic diagram of a flow chart of another embodiment of the target detection method disclosed herein;

FIG6 is a flow chart of another embodiment of the target detection method disclosed herein;

FIG7 is a schematic diagram of the structure of an application example of the target detection method disclosed herein;

FIG8 is a schematic diagram of a structure of an embodiment of an application of a 3D rough detection result in a target detection method of the present disclosure combined with a system of an embodiment of the present disclosure;

FIG9 is a schematic diagram showing the principle of solving the position of a target by using multiple target rays in the target detection method disclosed in the present invention;

FIG10 is a schematic diagram of a structure of an embodiment of an application of a combination of a coarse depth estimation method of a target detection method of the present disclosure and a system of an embodiment of the present disclosure;

FIG11 is a schematic diagram of a structure of an embodiment of an application of a ground equation in a target detection method of the present disclosure combined with a system of an embodiment of the present disclosure;

FIG12 is a schematic diagram showing the principle of solving the position of a target by combining target rays with ground equations in the target detection method disclosed in the present invention;

FIG. 13 is a schematic diagram of the structure of an object detection device according to an embodiment of the present disclosure.

Detailed ways

The following will be combined with the drawings in the embodiments of the present disclosure to clearly and completely describe the technical solutions in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.

The flowcharts shown in the accompanying drawings are only examples and do not necessarily include all the contents and operations/steps, nor must they be executed in the order described. For example, some operations/steps may also be decomposed, combined or partially merged, so the actual execution order may change according to actual conditions.

In the subsequent description, the suffixes such as "module", "component" or "unit" used to represent elements are only used to facilitate the description of the present disclosure, and have no special meanings. Therefore, "module", "component" or "unit" can be used in a mixed manner.

Before introducing the embodiments of the present disclosure in detail, the related technology is first introduced.

3D target detection algorithms based on pure vision are an important research direction in the field of target detection. In engineering, target detection schemes based on pure vision are valued for their low cost and sensitivity to color and texture. However, since visual detectors such as cameras do not have absolute distance measurement capabilities, target detection schemes based on pure vision generally have large errors in depth estimation, which makes it difficult for the estimated 3D position of the target to compete with lidar target detection schemes in terms of accuracy. The current mainstream method of 3D target detection schemes based on pure vision is to use the 2D detection results of the target, plus the predicted target depth value, to obtain the target's position information in 3D space through camera parameters, or to directly predict the target's 3D position. No Regardless of the method, due to the inherent defects of the camera in detecting distance, the depth estimation error is usually large, which leads to inaccurate 3D position estimation. In order to improve this shortcoming, academia and industry have tried to use multi-cameras to improve the accuracy of depth estimation. The current methods are mostly to use multi-cameras as a tool for depth estimation, and restore the depth of each pixel in the image as much as possible. This results in the camera perspective difference not being too large, but precisely because of the large correlation of the camera perspective, the improvement effect on depth measurement is not obvious.

The embodiments of the present disclosure provide a target detection method, a target detection device, and a storage medium, which obtain a detection image of a camera device, wherein the detection image includes an image corresponding to a target; determine the position of an imaging point of the target on an image plane of the camera device according to the detection image; determine a target ray corresponding to the target according to the position of the imaging point of the target on the image plane of the camera device and the position of the optical center of the camera device, wherein the target ray passes through the position of the imaging point of the target and the position of the optical center of the camera device; and determine a first position of the target in 3D space according to the target ray corresponding to the target. Compared with the related art that estimates the depth and then estimates the position of the target based on a pure visual detection scheme, the embodiments of the present disclosure avoid the technical idea of estimating the depth, which has a technical bias. Instead, the position of the imaging point of the target on the image plane of the camera device is determined based on the detected image, and the target ray passing through the position of the imaging point of the target and the position of the optical center of the camera device is determined. The first position of the target in the 3D space is determined based on the target ray. Since the estimation of the position of the imaging point of the target on the image plane of the camera device is more accurate than the estimation of the depth, according to the imaging principle, it can be known that theoretically the target must be on the target ray determined by the imaging point and the optical center. Therefore, the first position of the target in the 3D space can be determined more accurately based on the target ray.

The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Referring to FIG. 1 , FIG. 1 is a flow chart of an embodiment of a target detection method disclosed herein, wherein the method comprises: step S101 , step S102 , step S103 , and step S104 .

Step S101: Acquire a detection image of a camera device, wherein the detection image includes an image corresponding to a target.

In the disclosed embodiment, the detection image is obtained by photographing the target by a camera device, so the detection image includes an image of the target. The number of camera devices can be one or more; when there is one camera device, the detection image is the detection image of one camera device, and when there are more than one camera devices, the detection image is the detection image of more than one camera devices.

The camera device of the embodiment of the present disclosure may be a camera device under a fixed viewing angle such as a road test or a factory building.

Step S102: determining the position of the imaging point of the target on the image plane of the camera device according to the detection image.

For a target, the camera device captures the target to obtain a detection image, and the position of the imaging point of the target is theoretically determined. The detection image is generated on the image plane, so the position of the imaging point of the target on the image plane of the camera device can be easily and accurately determined based on the detection image.

Step S103: Determine a target ray corresponding to the target according to the position of the imaging point of the target on the image plane of the imaging device and the position of the optical center of the imaging device, wherein the target ray passes through the position of the imaging point of the target and the position of the optical center of the imaging device.

According to the pinhole imaging principle of the camera device, the position of the imaging point of the target on the image plane of the camera device, the position of the optical center of the camera device, and the position of the target in the 3D space are theoretically a straight line. If this straight line is known, the position of the target in the 3D space can be determined. Two points determine a straight line, so the target ray corresponding to the target can be determined according to the position of the imaging point of the target on the image plane of the camera device and the position of the optical center of the camera device.

Step S104: determining a first position of the target in the 3D space according to the target ray corresponding to the target.

Theoretically, the position of the target in 3D space must be on the target ray, so according to the target ray corresponding to the target, To more accurately determine the first position of the target in the 3D space.

Compared with the related art that estimates the depth and then estimates the position of the target based on a pure visual detection scheme, the embodiment of the present disclosure avoids the technical idea of estimating the depth which has a technical bias, and instead determines the position of the imaging point of the target on the image plane of the camera device based on the detected image, determines the target ray passing through the position of the imaging point of the target and the position of the optical center of the camera device, and determines the first position of the target in the 3D space based on the target ray. Since the estimation of the position of the imaging point of the target on the image plane of the camera device is more accurate than the estimation of the depth, according to the imaging principle, it can be known that theoretically the target must be on the target ray determined by the imaging point and the optical center. Therefore, the first position of the target in the 3D space can be more accurately determined based on the target ray, which can improve the 3D detection accuracy of the target.

In some embodiments, there are multiple cameras; step S104, determining the first position of the target in the 3D space according to the target ray corresponding to the target, may also include: determining the first position of the target in the 3D space according to multiple target rays corresponding to the target.

In the embodiment of the present disclosure, when there are multiple cameras, there is a position of an imaging point of a target and a position of an optical center of the camera corresponding to each camera, so each camera can correspond to a target ray. Since these multiple cameras are all shooting the target, it is obvious that the target is theoretically located at the intersection of multiple target rays. Therefore, the first position of the target in 3D space can be determined based on the multiple target rays corresponding to the target.

In some embodiments, step S104, determining the first position of the target in the 3D space according to the multiple target rays corresponding to the target, may also include: sub-step S104A1 and sub-step S104A2, as shown in FIG. 2 .

Sub-step S104A1: finding the minimum distance point in the 3D space where the sum of distances to each of the target rays is the smallest.

Sub-step S104A2: taking the position of the minimum distance point as the first position of the target in the 3D space.

Although theoretically the first position of the target in 3D space can be obtained by solving the intersection of multiple target rays, the target rays actually determined based on the position of the imaging point of the target on the image plane of the camera device and the position of the optical center of the camera device may deviate from the theoretical target rays, that is, the actually determined target rays may not be completely consistent with the theoretical target rays, so multiple target rays may not necessarily intersect in 3D space. At this time, the method for solving the first position of the target in 3D space can adopt the minimum distance sum method, that is, regardless of whether multiple target rays will intersect in 3D space, the sum of the distances from the target in space to each target ray is the smallest, so the minimum distance point in the 3D space where the sum of the distances to each target ray is the smallest is solved, and the position of the minimum distance point is the first position of the target in 3D space. This method of determining the first position of the target in 3D space is simple and convenient, and the solution result is more accurate.

It should be noted that the determination of the first position of the target in the 3D space based on the multiple target rays corresponding to the target may also be carried out in other ways. For example, in some embodiments, the point in the space with the shortest distance to each target ray may be directly solved, and this point is the first position of the target in the 3D space; or, in some embodiments, multiple positions are obtained based on each target ray and the plane where the target is located, and then the position with the shortest distance to each target ray is selected from these positions as the first position of the target in the 3D space; and so on.

In some embodiments, sub-step S104A2, taking the position of the minimum distance point as the first position of the target in the 3D space, may also include: taking the position of the intersection of the plurality of target rays as the first position of the target in the 3D space. If the target ray actually determined has no deviation from the theoretical target ray, that is, the target ray actually determined is completely consistent with the theoretical target ray, then the plurality of target rays will intersect in space, and the distance from the intersection point to each target ray is 0, and the sum of the distances is also 0. At this time, the position of the intersection of the plurality of target rays is the first position of the target in the 3D space.

In some embodiments, the number of the camera device is one; step S104, the target corresponding to the target The method of using a target ray to determine a first position of the target in the 3D space may include: determining the first position of the target in the 3D space according to a target ray corresponding to the target and a plane where the target is located.

In the embodiment of the present disclosure, when the number of camera devices is one, the corresponding target ray is also one, and the plane where the target is located in the space (for example, the plane where the target is located is the ground) is known, then the first position of the target in the 3D space can be determined based on a target ray corresponding to the target and the plane where the target is located.

In some embodiments, step S104, determining the first position of the target in the 3D space according to a target ray corresponding to the target and the plane where the target is located, may also include: sub-step S104B1 and sub-step S104B2, as shown in FIG. 3 .

Sub-step S104B1: solving the intersection point between the target ray and the plane where the target is located.

Sub-step S104B2: taking the position of the intersection of the target ray and the plane where the target is located as the first position of the target in the 3D space.

The target ray and the plane where the target is located (such as the ground) will theoretically intersect, so the intersection point of the target ray and the plane where the target is located is solved, and the position of the intersection point is the first position of the target in the 3D space.

It should be noted that other methods can also be used to determine the first position of the target in the 3D space based on a target ray corresponding to the target and the plane where the target is located. For example, in some embodiments, if the target is on a specific straight line in the plane (such as a pedestrian on a zebra crossing on the ground), then the first position of the target in the 3D space can also be determined based on a target ray corresponding to the target and the straight line; and so on.

The first position of the target in the 3D space exists objectively, but the representation method of the first position of the target in the 3D space is related to the selected coordinate system. In different coordinate systems, the representation method of the first position of the target in the 3D space is different.

In some embodiments, step S104, determining the first position of the target in the 3D space according to the target ray corresponding to the target, may include: determining the first position of the target in the world coordinate system according to the target ray corresponding to the target.

The camera device can be placed at any position in the environment. A reference coordinate system is selected in the environment to describe the position of the camera device and any target (object) in the environment. This coordinate system is called the world coordinate system. Coordinate systems related to the world coordinate system also include the camera coordinate system, the pixel coordinate system, and the retinal coordinate system.

The detection image (i.e., digital image) captured by the camera device can be stored as an array in the computer. The value of each element (pixel) in the array is the brightness (grayscale) of the image point; a rectangular coordinate system u-v is defined on the image, and the coordinates (u, v) of each pixel are the column number and row number of the pixel in the array respectively; therefore, (u, v) is the coordinate of the image coordinate system (also called pixel coordinate system) with pixels as the unit.

Since the image coordinate system only indicates the number of columns and rows of the pixel in the digital image, and does not use physical units to indicate the physical position of the pixel in the image, it is necessary to establish an image plane coordinate system xy expressed in physical units (e.g. centimeters); (x, y) is used to represent the coordinates of the image plane coordinate system measured in physical units. In the xy coordinate system, the origin is defined at the intersection of the optical axis of the camera device and the image plane, which is called the principal point of the image. This point is generally located at the center of the image, and its coordinates in the image coordinate system are (u0, v0). The physical dimensions of each pixel in the x-axis and y-axis directions are dx and dy, and the relationship between the two coordinate systems is as follows:

Wherein s' represents a skew factor caused by the non-orthogonality of the image plane coordinate axes of the camera device.

The origin of the camera coordinate system is the optical center of the camera device. Its x-axis and y-axis are parallel to the X and Y axes of the image. The z-axis is the optical axis of the camera device, which is perpendicular to the image plane. The spatial rectangular coordinate system formed by this is called the camera coordinate system. The camera coordinate system is a three-dimensional coordinate system. The intersection of the optical axis and the image plane is the origin of the image coordinate system. The rectangular coordinate system formed by the origin of the image coordinate system and the X and Y axes of the image is the image coordinate system. The image coordinate system is a two-dimensional coordinate system. The relationship between the camera coordinate system and the world coordinate system can be described by the rotation matrix R and the translation vector t.

In some embodiments, step S103, determining the target ray corresponding to the target based on the position of the imaging point of the target on the image plane of the camera device and the position of the optical center of the camera device, may also include: sub-step S1031, sub-step S1032 and sub-step S1033, as shown in Figure 4.

Sub-step S1031: determining the first coordinate of the imaging point of the target in the camera coordinate system according to the position of the imaging point of the target on the image plane of the imaging device and the internal parameter formula of the imaging device.

The position of the imaging point of the target on the image plane of the camera device can be understood as the coordinates of the imaging point of the target in the image plane coordinate system. To convert the coordinates of the imaging point of the target in the image plane coordinate system to the first coordinates of the imaging point of the target in the camera coordinate system, the internal parameter formula of the camera device is required. If the coordinates of the target O _i detected in the camera device _Ak in the image plane coordinate system are (x _i , y _i , 1) _k , according to the internal parameter formula K of the camera device, the following relationship can be obtained:

Since the depth is not required in the subsequent target ray calculation, _Zi can be eliminated on both sides of the equation, or _Zi can be substituted as an arbitrary constant. If the elimination method is used, the following relationship can be obtained:

Among them, (X _i ',Y _i ',1) _k can be regarded as the first coordinate of the imaging point of the target in the camera coordinate system.

Sub-step S1032: Determine the third coordinate of the imaging point of the target in the world coordinate system and the fourth coordinate of the optical center of the camera device in the world coordinate system based on the first coordinate of the imaging point of the target in the camera coordinate system, the second coordinate of the optical center of the camera device in the camera coordinate system, and the extrinsic parameter matrix of the camera device.

Taking the first coordinate of the imaging point of the target in the camera coordinate system as an example, (X _i ',Y _i ',1) _k , the second coordinate of the optical center of the camera device in the camera coordinate system is the origin of the camera coordinate system (0,0,0) _k . The external parameter matrix of the camera device can be used to obtain the first coordinate of the imaging point of the target in the camera coordinate system (X _i ',Y _i ',1) _k and the origin of the camera coordinate system (0,0,0) _k in the world coordinate system. The fourth coordinate

Sub-step S1033: determining a target ray corresponding to the target in the world coordinate system according to the third coordinate of the imaging point of the target in the world coordinate system and the fourth coordinate of the optical center of the camera device in the world coordinate system.

In the third coordinate representation The fourth coordinate is expressed as For example, using the third coordinate The fourth coordinate The target ray of the camera device _Ak for the target _Oi can be obtained:

The coordinates of the first position of the final target in the world coordinate system are:

in Represents the distance from a point in space to the target ray. The coordinates of the first position of the target in the world coordinate system can be solved by optimization methods such as but not limited to gradient descent method, Newton method, conjugate gradient method, approximate number (value) solution method, etc.

In some embodiments, the method further includes: step S105 and step S106, as shown in FIG5 .

Step S105: determining a second position of the target in the 3D space according to the detection image.

The disclosed embodiment can determine the second position of the target in the 3D space (ie, not the position of the target in the 3D space according to the target ray) based on the detection image according to current related technologies (including a 3D target detection algorithm based on pure vision).

Step S106: Determine the final position of the target in the 3D space according to the first position and the second position.

The first position of the target in the 3D space determined according to the target ray and the second position of the target in the 3D space determined according to the relevant technology are combined to determine the final position of the target in the 3D space, so that the position of the target with higher accuracy can be further obtained.

In some embodiments, the method further includes: step S107 and step S108, as shown in FIG6 .

Step S107: Obtain a first confidence level or a first variance corresponding to the first position and a second confidence level or a second variance corresponding to the second position.

Step S108: Determine a first weight corresponding to the first position and a second weight corresponding to the second position according to the first confidence or first variance and the second confidence or second variance, wherein the sum of the first weight and the second weight is equal to 1.

At this time, step S106, determining the final position of the target in the 3D space according to the first position and the second position, may also include: determining the final position of the target in the 3D space according to the first position, the second position, the first weight and the second weight.

Confidence is also called reliability, confidence level, or confidence coefficient. It is a probability value. That is, when sampling is used to estimate the population parameter, the conclusion is always uncertain due to the randomness of the sample. Therefore, a probability statement method is used, which is the interval estimation method in mathematical statistics. That is, the corresponding probability that the estimated value and the population parameter are within a certain allowable error range is called confidence (or the probability that the population parameter value falls within a certain range of the sample statistical value). Confidence is one of the important indicators to describe uncertainty. Confidence indicates the degree of certainty of interval estimation. The first confidence level can be the reliability corresponding to the first position, and the second confidence level can be the reliability corresponding to the second position.

In probability theory, variance is used to measure the degree of deviation between a random variable and its mathematical expectation (i.e., mean). The first variance can be the degree of deviation corresponding to the first position, and the second variance can be the degree of deviation corresponding to the second position. If there are multiple cameras, confidence can be used, and if there is only one camera, variance can be used.

The first weight may refer to the ratio of the first position of the target in the 3D space to the final position of the target in the 3D space, and the second weight may refer to the ratio of the second position of the target in the 3D space to the final position of the target in the 3D space. The sum of the first weight and the second weight is equal to 1. The final position of the target in the 3D space can be determined according to the first position, the second position, the first weight, and the second weight.

For example: assuming that the first confidence level of the first position within the predetermined spatial range is 90%, and the second confidence level of the second position within the predetermined spatial range is 80%, the first weight is determined to be 0.55, the second weight is 0.45, and the final position of the target in the 3D space is first position * 0.55 + second position * 0.45.

The application of the method of the embodiment of the present disclosure is described in detail below.

The embodiment of the present disclosure may include three modules, as shown in FIG7 :

Modules A1-An are independent visual detection modules. Each visual detection module independently outputs its 2D detection results of the target (including information such as the target's size, category, orientation, center point position, etc.) and the corresponding confidence level, as well as its 3D rough detection results of the target (the method used is the current relevant technology, which includes not only information such as the target's size, but also the target's position in 3D space, or rough depth estimation, etc.) and the corresponding confidence level.

Module B: It is the target matching module, which matches the target data output by modules A1-An, that is, finds the data corresponding to the same target in each visual inspection module.

Module C: target ray estimation module.

Case 1: This module fits the positions of the imaging points of the target under different camera viewing angles of the same target and the position of the optical center of the camera as target rays, and calculates the point in space with the smallest sum of distances to these target rays or the intersection of these target rays, thereby obtaining an accurate 3D position estimate.

Case 2: This module fits the position of the imaging point of a target under the viewing angle of a camera and the position of the optical center of the camera as a target ray, and solves the intersection of the target ray and the ground equation to obtain an accurate 3D position estimate.

Module D: is a weighted correction module. Since the accuracy of the target's 2D detection result is basically the same as the target's 3D detection result obtained by the target ray in the embodiment of the present disclosure, the confidence of the target's 2D detection result is used as the confidence of the target's 3D detection result obtained by the target ray in the embodiment of the present disclosure. This module is based on the confidence of the 2D detection result and the 3D rough detection result. First, a neural network is trained using the confidence of multiple known 2D detection results and the 3D rough detection result as training data. Then, the first weight and the second weight are obtained through the trained neural network based on the confidence of the 2D detection result and the 3D rough detection result. The output of module C and the 3D rough detection result directly obtained by module A are weighted and summed to obtain the final position of the target with higher estimation accuracy. In addition to using a neural network to obtain weights, the first weight and the second weight can also be obtained by methods such as Kalman filtering.

Embodiment 1: A solution for combining a 3D target detection system based on 3D rough detection results and multi-camera target rays, the structure of which is shown in FIG8 .

The first step, module A1-An, visual detection module, takes the detection image as input, and outputs the 2D detection result and corresponding confidence of the target in the detection image, as well as its 3D rough detection result (i.e., the second position) and corresponding confidence of the target.

The second step, module B, the target matching module, uses the target 3D rough detection results to match the data of the same target detected in modules A1-An. The specific method can be: when the target detected by different camera devices in the world coordinate system exists When they overlap or the distance between two targets is less than a certain threshold Y, the data corresponding to these targets are matched.

The third step, module C, the target ray position estimation module, uses the position of the imaging point of the target under the perspective of multiple cameras and the position of the optical center of the camera to construct multiple target rays, and calculate the first position of the target in the 3D space. The principle is shown in Figure 9: Specifically, the target ray from the position of the imaging point of the target and the position of the optical center of the camera is constructed, and the equation of the target ray in the world coordinate system is obtained; after obtaining at least two target rays according to the results of different cameras, the intersection of these target rays or the minimum distance point with the minimum sum of distances to these target rays is calculated, and the intersection or the minimum distance point is used as the first position of the target in the 3D space.

The coordinates of the final target O _i are:

Since there are only two cameras in the embodiment of the present disclosure, the coordinates P _i are solved by a numerical solution of analytic geometry.

In the fourth step, module D inputs the confidence of the 2D detection result obtained by module A1-An and the confidence of the 3D rough detection result into the trained neural network to obtain the first weight and the second weight. The first position of the target in the 3D space obtained by module C and the 3D rough detection result (i.e., the second position) obtained by module A1-An are weighted and summed to obtain the final result, i.e., the final position of the target in the 3D space.

Embodiment 2: A combination solution of a 3D target detection system based on coarse depth estimation and multi-camera target rays, the structure of which is shown in FIG10 .

The first step, module A1-An, the visual detection module, takes the detection image as input and outputs the 2D detection result and corresponding confidence of the target in the detection image, as well as the rough depth estimation.

In the second step, module B, the target matching module, uses the coarse depth estimation to infer the 3D coarse detection result (i.e., the second position) and matches the data of the same target detected in modules A1-An. The specific method can be: when the targets detected by different camera devices in the world coordinate system overlap, or the distance between two targets is less than a certain threshold Y, the data corresponding to these targets are matched.

The coordinates of the final target O _i are:

In the embodiment of the present disclosure, since there are many cameras and it is difficult to solve, the cameras can be grouped in pairs, and each group can solve the coordinates P _i,j using the numerical solution of analytic geometry, and finally P _i can be obtained by weighted summation of P _i,j .

In the fourth step, module D inputs the confidence of the 2D detection results obtained by modules A1-An and the confidence of the 3D rough detection results obtained by module B using rough depth estimation into the trained neural network to obtain the first weight and the second weight. The first position of the target in the 3D space obtained by module C and the 3D rough detection result obtained by module B using rough depth estimation are weighted and summed to obtain the final result, that is, the final position of the target in the 3D space.

Embodiment 3: A combination solution of a 3D target detection system based on ground equations and monocular camera target rays, wherein the The structure is shown in Figure 11.

The first step is to use module A, the visual detection module, which takes the detection image as input and outputs the 2D detection result and the corresponding variance of the target in the detection image, as well as the 3D rough detection result and the corresponding variance.

The second step, module C, the ray position estimation module, uses the position of the imaging point of the target under the viewing angle of a camera device and the position of the optical center of the camera device to construct a target ray, and combines it with the ground equation to calculate the first position of the target in the 3D space. The principle is shown in Figure 12: Specifically, the target ray is constructed from the position of the imaging point of the target and the position of the optical center of the camera device, and the equation of the target ray in the world coordinate system is obtained; then the equation of the target ray in the world coordinate system is combined with the ground equation to solve the first position of the target in the 3D space.

The equation of the target ray in the world coordinate system is:

Ground equation:
Ux+Vy+Wz+D＝0

The first position of the target O _i in the 3D space can be solved by combining the equation of the target ray in the world coordinate system with the ground equation.

The fourth step, module D, the weighted correction module, uses the variance of the 2D detection result obtained by module A and the variance of the 3D rough detection result to obtain the first weight and the second weight through the Kalman filtering method, and performs weighted summation on the first position of the target in the 3D space obtained by module C and the 3D rough detection result directly output by module A to obtain the final result, that is, the final position of the target in the 3D space.

Refer to Figure 13, which is a structural diagram of an embodiment of the target detection device of the present disclosure. It should be noted that the target detection device of the embodiment of the present disclosure can implement the above-mentioned target detection method. For detailed description of related contents, please refer to the above-mentioned method part, which will not be repeated here.

The device 100 includes a memory 1 and a processor 2, wherein the memory 1 is configured to store a computer program; the processor 2 is configured to execute the computer program and implement any of the above target detection methods when executing the computer program. The memory 1 is connected to the processor 2 via a bus.

The processor 2 may be a microcontroller unit, a central processing unit or a digital signal processor, etc. The memory 1 may be a Flash chip, a read-only memory, a disk, an optical disk, a USB flash drive or a mobile hard disk, etc.

An embodiment of the present disclosure further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements any of the target detection methods described above.

The computer-readable storage medium may be an internal storage unit of the target detection device, such as a hard disk or a memory. The computer-readable storage medium may also be an external storage device of the target detection device, such as a plug-in hard disk, a smart memory card, a secure digital card, a flash memory card, etc.

Those skilled in the art will appreciate that all or some of the steps in the methods disclosed above, and the functional modules/units in the systems and devices may be implemented as software, firmware, hardware, or a suitable combination thereof.

In hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit. Circuit. Such software can be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or temporary media). As known to those of ordinary skill in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology configured to store information (such as computer-readable instructions, data structures, program modules or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, disk storage or other magnetic storage devices, or any other medium that can be configured to store desired information and can be accessed by a computer. In addition, it is known to those of ordinary skill in the art that communication media typically contain computer-readable instructions, data structures, program modules or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.

The preferred embodiments of the present disclosure are described above with reference to the accompanying drawings, but the scope of the present disclosure is not limited thereby. Any modification, equivalent substitution and improvement made by those skilled in the art without departing from the scope and essence of the present disclosure shall be within the scope of the present disclosure.

Claims

A target detection method, the method comprising:

Acquire a detection image of the camera device, wherein the detection image includes an image corresponding to the target;

Determining the position of the imaging point of the target on the image plane of the camera device according to the detection image;

Determine a target ray corresponding to the target according to the position of the imaging point of the target on the image plane of the imaging device and the position of the optical center of the imaging device, wherein the target ray passes through the position of the imaging point of the target and the position of the optical center of the imaging device;

A first position of the target in the 3D space is determined according to the target ray corresponding to the target.
The method according to claim 1, wherein the number of the camera devices is multiple;

The determining, according to the target ray corresponding to the target, a first position of the target in the 3D space includes:

A first position of the target in the 3D space is determined according to a plurality of target rays corresponding to the target.
The method according to claim 2, wherein determining the first position of the target in the 3D space according to the multiple target rays corresponding to the target comprises:

Finding the minimum distance point in the 3D space where the sum of distances to each of the target rays is the smallest;

The position of the minimum distance point is used as the first position of the target in the 3D space.
The method according to claim 3, wherein taking the position of the minimum distance point as the first position of the target in the 3D space comprises:

The position of the intersection of the plurality of target rays is used as the first position of the target in the 3D space.
The method according to claim 1, wherein the number of the camera device is one;

The determining, according to the target ray corresponding to the target, a first position of the target in the 3D space includes:

A first position of the target in the 3D space is determined according to a target ray corresponding to the target and a plane where the target is located.
The method according to claim 5, wherein determining the first position of the target in the 3D space according to a target ray corresponding to the target and the plane where the target is located comprises:

Solving the intersection point between the target ray and the plane where the target is located;

The position of the intersection of the target ray and the plane where the target is located is used as the first position of the target in the 3D space.
The method according to any one of claims 1 to 6, wherein determining the first position of the target in the 3D space according to the target ray corresponding to the target comprises:

A first position of the target in a world coordinate system is determined according to a target ray corresponding to the target.
The method according to claim 7, wherein the step of determining the target ray corresponding to the target based on the position of the imaging point of the target on the image plane of the imaging device and the position of the optical center of the imaging device comprises:

Determine a first coordinate of the imaging point of the target in a camera coordinate system according to a position of the imaging point of the target on an image plane of the imaging device and an internal parameter formula of the imaging device;

Determine, according to the first coordinate of the imaging point of the target in the camera coordinate system, the second coordinate of the optical center of the imaging device in the camera coordinate system, and the extrinsic parameter matrix of the imaging device, the third coordinate of the imaging point of the target in the world coordinate system and the fourth coordinate of the optical center of the imaging device in the world coordinate system;

According to the third coordinate of the imaging point of the target in the world coordinate system and the optical center of the camera device in the The fourth coordinate in the world coordinate system determines the target ray corresponding to the target in the world coordinate system.
The method according to any one of claims 1 to 6, wherein the method further comprises:

Determine a second position of the target in the 3D space according to the detection image;

A final position of the target in the 3D space is determined according to the first position and the second position.
The method according to claim 9, wherein the method further comprises:

Obtaining a first confidence level or a first variance corresponding to the first position and a second confidence level or a second variance corresponding to the second position;

Determine, according to the first confidence or the first variance and the second confidence or the second variance, a first weight corresponding to the first position and a second weight corresponding to the second position, wherein the sum of the first weight and the second weight is equal to 1;

Determining a final position of the target in the 3D space according to the first position and the second position includes:

A final position of the target in the 3D space is determined according to the first position, the second position, the first weight, and the second weight.
A target detection device, comprising a memory and a processor, wherein the memory is configured to store a computer program; the processor is configured to execute the computer program and implement the target detection method according to any one of claims 1 to 10 when executing the computer program.
A computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the processor implements the target detection method according to any one of claims 1 to 10.