[go: up one dir, main page]

CN116343165A - 3D target detection system, method, terminal equipment and storage medium - Google Patents

3D target detection system, method, terminal equipment and storage medium Download PDF

Info

Publication number
CN116343165A
CN116343165A CN202310132301.9A CN202310132301A CN116343165A CN 116343165 A CN116343165 A CN 116343165A CN 202310132301 A CN202310132301 A CN 202310132301A CN 116343165 A CN116343165 A CN 116343165A
Authority
CN
China
Prior art keywords
target
distance
camera
detection
pedestrians
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310132301.9A
Other languages
Chinese (zh)
Inventor
陈祥勇
柯英杰
陈卫强
苏亮
刘强生
邹雪莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen King Long United Automotive Industry Co Ltd
Original Assignee
Xiamen King Long United Automotive Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen King Long United Automotive Industry Co Ltd filed Critical Xiamen King Long United Automotive Industry Co Ltd
Priority to CN202310132301.9A priority Critical patent/CN116343165A/en
Publication of CN116343165A publication Critical patent/CN116343165A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a 3D target detection system, a method, terminal equipment and a storage medium, and belongs to the technical field of visual detection. According to the invention, two neural network models are trained and are respectively used for detecting a far small target and a near large target, wherein the far small target is detected by a monocular ranging algorithm, the near large target is detected by a binocular ranging algorithm, and finally, the two model target detection frames, the distances, the labels and the sizes are fused, so that the state information of the three-dimensional coordinates, the sizes, the categories and the like of the obstacle can be accurately obtained, and the detection precision of the 3D target can be effectively improved.

Description

3D target detection system, method, terminal equipment and storage medium
Technical Field
The invention relates to the technical field of visual detection, in particular to a 3D target detection system, a 3D target detection method, terminal equipment and a storage medium.
Background
The visual target detection mainly has two difficulties, namely small target detection and close-range local target detection, such as pedestrian ghost probes, aiming at detection of partial target area entering the field of view, such as detection and identification of feet, hands, partial bodies, heads and the like of pedestrians. Because of the large difference between semantic information of near and far targets, small targets at far and local targets at near cannot be accurately detected at the same time.
Visual ranging mainly includes monocular ranging and binocular ranging. The monocular ranging is based on the ground parallel assumption, the target position is obtained through camera-world coordinate conversion calculation, and the distance of a distant target can be obtained, but the range accuracy is not as good as that of binocular ranging due to the sensitivity to the inner parameter and the outer parameter of a camera; the binocular ranging is based on parallax matching calculation of the left and right cameras to obtain depth information, and long-distance target distance cannot be obtained effectively due to the limitation of the base line distance of the cameras and sensitivity to light rays.
To this end, we provide a 3D object detection system, method, terminal device and storage medium.
Disclosure of Invention
The invention provides a 3D target detection system, a 3D target detection method, terminal equipment and a storage medium, which are used for overcoming the defect that the existing visual detection method is difficult to accurately acquire the state information of an obstacle.
The invention adopts the following technical scheme:
a 3D object detection system, comprising:
and the image reading module is used for reading the video images around the vehicle and acquiring the left and right parallax information of the target.
The remote target detection module is used for detecting vehicles, pedestrians, riders and other obstacles at a distance, and training the vehicles, pedestrians, riders and other obstacles with small targets by adopting a yolov5 neural network model.
The short-distance target detection module is used for detecting vehicles, pedestrians, riders and other obstacles, and training the vehicles, pedestrians, riders and other obstacles with large targets by adopting a mobilet-ssd neural network model.
And the monocular distance measuring module is used for calculating the target distances of vehicles, pedestrians, riders and other obstacles at a distance.
The binocular range module is used for calculating the target distances of nearby vehicles, pedestrians, riders and other obstacles.
And the fusion module is used for fusing the far and near target detection frames, the distance, the labels and the sizes and accurately outputting the state information of the positions, the sizes and the categories of the targets.
The image reading module comprises a middle RGB camera and left and right binocular cameras, wherein the RGB camera acquires video images around the vehicle, and the left and right binocular cameras acquire left and right parallax.
The resolution of the input image of the yolov5 neural network model is 640 x 640, and the resolution of the input image of the mobilet-ssd neural network model is 384 x 384.
Training tags for the mobilet-ssd neural network model described above also include, but are not limited to, localized targets for the foot, hand, part of the body, head of a pedestrian.
The monocular distance measuring module measures distance by adopting a camera aperture imaging principle and a coordinate conversion relation, and calculates target distances of vehicles, pedestrians, riders and other obstacles at a distance.
The binocular distance measuring module calculates depth information through the left-right parallax of the target and combining the base line distance and the focal length of the camera, and obtains the target distances of vehicles, pedestrians, riders and other obstacles at a distance.
The invention also provides a 3D target detection method, which adopts the 3D target detection system and comprises the following steps:
(1) Three cameras are arranged at the front windshield of the vehicle, each camera comprises an RGB camera positioned in the middle and two binocular cameras positioned at the left side and the right side, and the three cameras collect video images around the vehicle after the system is started;
(2) The middle RGB camera reads the video frame image and transmits the image information to the long-distance target detection module and the short-distance target detection module, and the left and right binocular cameras respectively read the left and right views and calculate the left and right parallax of the target;
(3) Remote target detection, namely, detecting target category and boundary box information of vehicles, pedestrians and riders by adopting a yolov5 neural network model, inputting image resolution, optimizing and reasoning and accelerating;
(4) Distance measurement is carried out on a remote target, a single-eye distance measurement algorithm is used for carrying out distance calculation on a yolov5 detection target, and remote target position, category and boundary frame information are output;
(5) Short-distance target detection, namely detecting target types and boundary frame information of vehicles, pedestrians and riders by adopting a mobile-ssd neural network model, inputting image resolution, optimizing and reasoning and accelerating;
(6) Distance measurement is carried out on a near target, a binocular distance algorithm is used for carrying out distance calculation on a mobilent-ssd detection target, and the near target position, the category and the boundary box information are output;
(7) Converting the resolution and the bounding box, converting the mobilet-ssd model output image into yolov5 model output image resolution, and scaling the corresponding bounding box proportionally;
(8) Converting the labels, namely converting the labels of the two model detection results into uniform labels, wherein the labels are different;
(9) Setting an image boundary and an overlapping area, wherein the parts above the boundary and the overlapping area adopt long-distance target detection results, and the parts below the boundary and the overlapping area adopt short-distance target detection results;
(10) The method comprises the steps of removing the weight of an overlapping area target, wherein the overlapping area target has a long-distance target and a short-distance target, traversing the long-distance target and the short-distance target, calculating a boundary box IOU and Euclidean distance to perform fusion and weight removal, keeping a mobilet-ssd detection result when the IOU is larger than a threshold value or the Euclidean distance is smaller than the threshold value as the same target, deleting a yolov5 detection result, keeping the detection result when the IOU is smaller than the threshold value and the Euclidean distance is larger than the threshold value as different targets;
(11) Fusing targets in the non-overlapping area, and fusing a long-distance target detection result with a short-distance detection result;
(12) Fusing the target of the overlapping region and the non-overlapping region, and fusing the detection results of the overlapping region and the non-overlapping region;
(13) And according to the fused detection result label and the position information, assigning the three-dimensional size of the target model and outputting the three-dimensional size.
The monocular ranging algorithm in the step (4) is calculated by adopting the following formula:
Figure SMS_1
Figure SMS_2
wherein K is called an internal reference matrix, P is called an external reference matrix, R, T is a rotation matrix and a translation matrix of a world coordinate system to a camera coordinate system respectively, fx and fy are focal lengths of relative unit pixels in the horizontal direction and the vertical direction of the camera, (u 0 and v 0) optical center coordinates of the camera, (u and v) pixel coordinates of a target point on an image, zc is a distance from the target point to the camera, and Xw, yw and Zw are distances from the target point in three directions of X, Y and Z in the world coordinate system respectively;
knowing the pixel coordinates (u, v) of the target point on the image, obtaining an internal reference matrix K and an external reference matrix P through camera calibration, and solving to obtain the distances Xw, yw and Zw between the target and the vehicle, wherein zw=0.
The binocular distance measuring algorithm in the step (6) is calculated by adopting the following formula: d=f×b/(xl-xr), where d is the target detection object depth distance, f is the camera focal length, b is the left and right monocular camera baseline distance, xl-xr is the corresponding pixel disparity value.
The IOU calculation formula is as follows:
Figure SMS_3
wherein A is the bounding box of object 1 and B is the bounding box of object 2; the Euclidean distance calculation formula is as follows: />
Figure SMS_4
Wherein (x) 1 ,y 1 ) Is the position coordinates of object 1, (x 2 ,y 2 ) Is the position coordinates of the object 2.
The invention also provides 3D target detection terminal equipment, which comprises a processor, a memory and a computer program stored in the memory and running on the processor, wherein the steps of the 3D target detection method are realized when the processor executes the computer program.
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described 3D object detection method.
From the above description of the invention, it is clear that the invention has the following advantages over the prior art:
according to the invention, two neural network models are trained and are respectively used for detecting a far small target and a near large target, wherein the far small target is detected by a monocular ranging algorithm, the near large target is detected by a binocular ranging algorithm, and finally, the two model target detection frames, the distances, the labels and the sizes are fused, so that the state information of the three-dimensional coordinates, the sizes, the categories and the like of the obstacle can be accurately obtained, and the detection precision of the 3D target can be effectively improved.
Drawings
Fig. 1 is a block diagram of a first embodiment of the present invention.
Fig. 2 is a flowchart of a first embodiment of the present invention.
Detailed Description
A specific embodiment of the present invention will be described below with reference to fig. 1. Numerous details are set forth in the following description in order to provide a thorough understanding of the present invention, but it will be apparent to one skilled in the art that the present invention may be practiced without these details. Well-known components, methods and procedures are not described in detail.
Example 1
A 3D object detection system, referring to fig. 1, includes an image reading module 10, a long-range object detection module 20, a short-range object detection module 30, a monocular ranging module 40, a binocular ranging module 50, and a fusion module 60. Wherein:
the image reading module is used for reading the video images around the vehicle and acquiring the left and right parallax information of the target. The device comprises a middle RGB camera, a left binocular camera and a right binocular camera, wherein the RGB camera is used for acquiring video images around a vehicle, and the left binocular camera and the right binocular camera are used for acquiring left parallax and right parallax.
A remote target detection module for detecting remote vehicles, pedestrians, riders and other obstacles. The remote target detection module adopts a yolov5 neural network model, the resolution of an input image is 640 x 640, and the training label is a small target such as a vehicle, a pedestrian, a rider and other obstacles.
And the short-distance target detection module is used for detecting nearby vehicles, pedestrians, riders and other obstacles. The short-distance target detection module adopts a mobile-ssd neural network model, inputs image resolution 384 and 384, and trains the large targets such as vehicles, pedestrians, riders and other obstacles, and also comprises local targets such as feet, hands, partial bodies, heads and the like of pedestrians, but not limited to.
And the monocular distance measuring module is used for measuring distance by adopting a camera aperture imaging principle and a coordinate conversion relation, and calculating the target distances of vehicles, pedestrians, riders and other obstacles at a distance.
And the binocular distance measuring module is used for calculating depth information through the left-right parallax of the target and combining the base line distance and the focal length of the camera to obtain the target distances of vehicles, pedestrians, riders and other obstacles at a distance.
And the fusion module is used for fusing the far and near target detection frames, the distance, the labels and the sizes and accurately outputting the state information of the positions, the sizes and the categories of the targets. The fusion module comprises visual detection frame fusion, label fusion, distance fusion and size fusion, wherein the target detection frame fusion is carried out, and a small target detection frame at a far position and a large target detection frame at a near position are fused; label fusion, namely converting and fusing two target detection model labels; distance fusion, namely carrying out association fusion on a far small target distance and a near large target distance; and fusing the sizes, and fusing the acquired target sizes.
Referring to fig. 2, the present invention further provides a 3D target detection method, which adopts the detection system, and specifically includes the following steps:
(1) Three cameras are arranged at the front windshield of the vehicle, each camera comprises an RGB camera positioned in the middle and two binocular cameras positioned at the left side and the right side, and the three cameras collect video images around the vehicle after the system is started.
(2) The middle RGB camera reads the video frame image and transmits the image information to the long-distance target detection module and the short-distance target detection module, and the left and right binocular cameras respectively read the left and right views and calculate the left and right parallax of the target.
(3) Remote target detection adopts a yolov5 neural network model, inputs image resolution 640 x 640, and detects target category and bounding box information of vehicles, pedestrians and riders through optimization and reasoning acceleration.
(4) Distance measurement is carried out on a remote target, a single-eye distance measurement algorithm is used for carrying out distance calculation on a yolov5 detection target, remote target position, category and boundary box information are output, and the calculation is specifically carried out by adopting the following formula:
Figure SMS_5
Figure SMS_6
wherein K is called an internal reference matrix, P is called an external reference matrix, R, T is a rotation matrix and a translation matrix of a world coordinate system to a camera coordinate system respectively, fx and fy are focal lengths of relative unit pixels in the horizontal direction and the vertical direction of the camera, (u 0 and v 0) optical center coordinates of the camera, (u and v) pixel coordinates of a target point on an image, zc is a distance from the target point to the camera, and Xw, yw and Zw are distances from the target point in three directions of X, Y and Z in the world coordinate system respectively;
knowing the pixel coordinates (u, v) of the target point on the image, obtaining an internal reference matrix K and an external reference matrix P through camera calibration, and solving to obtain the distances Xw, yw and Zw between the target and the vehicle, wherein zw=0.
(5) Short-distance target detection adopts a mobilet-ssd neural network model, inputs image resolution 384 and 384, and detects target category and bounding box information of vehicles, pedestrians and riders through optimization and reasoning acceleration.
(6) And (3) ranging the near target, performing distance calculation on the mobilent-ssd detection target by using a binocular ranging algorithm, and outputting the position, the category and the bounding box information of the near target. The method is specifically calculated by adopting the following formula:
d=f×b/(xl-xr), where d is the target detection object depth distance, f is the camera focal length, b is the left and right monocular camera baseline distance, xl-xr is the corresponding pixel disparity value.
(7) And converting the resolution and the bounding box, converting the mobilet-ssd model output image into yolov5 model output image resolution, and scaling the corresponding bounding box proportionally.
(8) And converting the labels into unified labels, wherein the labels of the two model detection results are different.
(9) Setting an image boundary and an overlapping area, wherein the parts above the boundary and the overlapping area adopt long-distance target detection results, and the parts below the boundary and the overlapping area adopt short-distance target detection results.
(10) And (3) removing the weight of the target in the overlapping area, wherein the target in the overlapping area is provided with a long-distance target and a short-distance target, traversing the long-distance target and the short-distance target, calculating the boundary box IOU and the Euclidean distance to perform fusion and weight removal, keeping the detection result of the mobilet-ssd as the same target, deleting the detection result of yolov5, keeping the targets with the IOU smaller than the threshold and the Euclidean distance larger than the threshold as different targets, and keeping the detection result.
The IOU calculation formula is as follows:
Figure SMS_7
wherein A is the bounding box of object 1 and B is the bounding box of object 2; the Euclidean distance calculation formula is as follows: />
Figure SMS_8
Wherein (x) 1 ,y 1 ) Is the position coordinates of object 1, (x 2 ,y 2 ) Is the position coordinates of the object 2.
(11) And fusing targets in the non-overlapping area, and fusing a long-distance target detection result with a short-distance detection result.
(12) And fusing the overlapping region and the non-overlapping region targets, and fusing the detection results of the overlapping region and the non-overlapping region.
(13) And according to the fused detection result label and the position information, assigning the three-dimensional size of the target model and outputting the three-dimensional size.
Example two
The invention also provides 3D target detection terminal equipment, which comprises a processor, a memory and a computer program stored in the memory and running on the processor, wherein the steps of the 3D target detection method are realized when the processor executes the computer program.
Further, as an executable scheme, the 3D object detection terminal device may be a computing device such as a vehicle-mounted computer or a cloud server. The 3D object detection terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the above-described composition structure of the 3D object detection terminal device is merely an example of the 3D object detection terminal device, and does not constitute limitation of the 3D object detection terminal device, and may include more or less components than the above, or may combine some components, or different components, for example, the 3D object detection terminal device may further include an input/output device, a network access device, a bus, etc., which is not limited in the embodiment of the present invention.
Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASiC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the 3D object detection terminal device, and connects various parts of the entire 3D object detection terminal device using various interfaces and lines.
The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the 3D object detection terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described 3D object detection method.
The above-described modules/units integrated with the 3D object detection terminal device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.
The foregoing is merely illustrative of specific embodiments of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modification of the present invention by using the design concept shall fall within the scope of the present invention.

Claims (12)

1. A 3D object detection system, comprising:
the image reading module is used for reading the video images around the vehicle and acquiring the left and right parallax information of the target;
the remote target detection module is used for detecting vehicles, pedestrians, riders and other obstacles at a distance, and training the vehicles, pedestrians, riders and other obstacles with small targets by adopting a yolov5 neural network model;
the short-distance target detection module is used for detecting nearby vehicles, pedestrians, riders and other obstacles, and training vehicles, pedestrians, riders and other obstacles with large targets by adopting a mobilet-ssd neural network model;
the monocular distance measuring module is used for calculating the target distances of vehicles, pedestrians, riders and other obstacles at a distance;
the binocular range finding module is used for calculating the target distances of nearby vehicles, pedestrians, riders and other obstacles;
and the fusion module is used for fusing the far and near target detection frames, the distance, the labels and the sizes and accurately outputting the state information of the positions, the sizes and the categories of the targets.
2. A 3D object detection system according to claim 1, wherein: the image reading module comprises a middle RGB camera and a left binocular camera and a right binocular camera, wherein the RGB camera is used for acquiring video images around a vehicle, and the left binocular camera and the right binocular camera are used for acquiring left parallax and right parallax.
3. A 3D object detection system according to claim 1, wherein: the resolution of the input image of the yolov5 neural network model is 640 x 640, and the resolution of the input image of the mobilet-ssd neural network model is 384 x 384.
4. A 3D object detection system according to claim 1, wherein: the training tags of the mobilet-ssd neural network model also include, but are not limited to, the local targets of the foot, hand, part of the body, head of a pedestrian.
5. A 3D object detection system according to claim 1, wherein: the monocular distance measuring module measures distance by adopting a camera aperture imaging principle and a coordinate conversion relation, and calculates target distances of vehicles, pedestrians, riders and other obstacles at a distance.
6. A 3D object detection system according to claim 1, wherein: the binocular distance measuring module calculates depth information through the left-right parallax of the target and combining the base line distance and the focal length of the camera, and obtains the target distances of vehicles, pedestrians, riders and other obstacles at a distance.
7. A 3D object detection method employing the 3D object detection system according to claim 2, comprising the steps of:
(1) Three cameras are arranged at the front windshield of the vehicle, each camera comprises an RGB camera positioned in the middle and two binocular cameras positioned at the left side and the right side, and the three cameras collect video images around the vehicle after the system is started;
(2) The middle RGB camera reads the video frame image and transmits the image information to the long-distance target detection module and the short-distance target detection module, and the left and right binocular cameras respectively read the left and right views and calculate the left and right parallax of the target;
(3) Remote target detection, namely, detecting target category and boundary box information of vehicles, pedestrians and riders by adopting a yolov5 neural network model, inputting image resolution, optimizing and reasoning and accelerating;
(4) Distance measurement is carried out on a remote target, a single-eye distance measurement algorithm is used for carrying out distance calculation on a yolov5 detection target, and remote target position, category and boundary frame information are output;
(5) Short-distance target detection, namely detecting target types and boundary frame information of vehicles, pedestrians and riders by adopting a mobile-ssd neural network model, inputting image resolution, optimizing and reasoning and accelerating;
(6) Distance measurement is carried out on a near target, a binocular distance algorithm is used for carrying out distance calculation on a mobilent-ssd detection target, and the near target position, the category and the boundary box information are output;
(7) Converting the resolution and the bounding box, converting the mobilet-ssd model output image into yolov5 model output image resolution, and scaling the corresponding bounding box proportionally;
(8) Converting the labels, namely converting the labels of the two model detection results into uniform labels, wherein the labels are different;
(9) Setting an image boundary and an overlapping area, wherein the parts above the boundary and the overlapping area adopt long-distance target detection results, and the parts below the boundary and the overlapping area adopt short-distance target detection results;
(10) The method comprises the steps of removing the weight of an overlapping area target, wherein the overlapping area target has a long-distance target and a short-distance target, traversing the long-distance target and the short-distance target, calculating a boundary box IOU and Euclidean distance to perform fusion and weight removal, keeping a mobilet-ssd detection result when the IOU is larger than a threshold value or the Euclidean distance is smaller than the threshold value as the same target, deleting a yolov5 detection result, keeping the detection result when the IOU is smaller than the threshold value and the Euclidean distance is larger than the threshold value as different targets;
(11) Fusing targets in the non-overlapping area, and fusing a long-distance target detection result with a short-distance detection result;
(12) Fusing the target of the overlapping region and the non-overlapping region, and fusing the detection results of the overlapping region and the non-overlapping region;
(13) And according to the fused detection result label and the position information, assigning the three-dimensional size of the target model and outputting the three-dimensional size.
8. The 3D object detection method according to claim 7, wherein the monocular ranging algorithm in step (4) is calculated using the following formula:
Figure QLYQS_1
Figure QLYQS_2
wherein K is called an internal reference matrix, P is called an external reference matrix, R, T is a rotation matrix and a translation matrix of a world coordinate system to a camera coordinate system respectively, fx and fy are focal lengths of relative unit pixels in the horizontal direction and the vertical direction of the camera, (u 0 and v 0) optical center coordinates of the camera, (u and v) pixel coordinates of a target point on an image, zc is a distance from the target point to the camera, and Xw, yw and Zw are distances from the target point in three directions of X, Y and Z in the world coordinate system respectively;
knowing the pixel coordinates (u, v) of the target point on the image, obtaining an internal reference matrix K and an external reference matrix P through camera calibration, and solving to obtain the distances Xw, yw and Zw between the target and the vehicle, wherein zw=0.
9. The 3D object detection method according to claim 7, wherein the binocular ranging algorithm of step (6) is calculated using the following formula: d=f×b/(xl-xr), where d is the target detection object depth distance, f is the camera focal length, b is the left and right monocular camera baseline distance, xl-xr is the corresponding pixel disparity value.
10. The 3D object detection method of claim 7, wherein the IOU calculation formula is as follows:
Figure QLYQS_3
wherein A is the bounding box of object 1 and B is the bounding box of object 2; the Euclidean distance calculation formula is as follows:
Figure QLYQS_4
wherein (x) 1 ,y 1 ) Is the position coordinates of object 1, (x 2 ,y 2 ) Is the position coordinates of the object 2.
11. The 3D target detection terminal device is characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, implements the steps of the 3D object detection method according to any of claims 7-10.
12. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor performs the steps of the 3D object detection method according to any of claims 7-10.
CN202310132301.9A 2023-02-17 2023-02-17 3D target detection system, method, terminal equipment and storage medium Pending CN116343165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310132301.9A CN116343165A (en) 2023-02-17 2023-02-17 3D target detection system, method, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310132301.9A CN116343165A (en) 2023-02-17 2023-02-17 3D target detection system, method, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116343165A true CN116343165A (en) 2023-06-27

Family

ID=86879879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310132301.9A Pending CN116343165A (en) 2023-02-17 2023-02-17 3D target detection system, method, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116343165A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116953680A (en) * 2023-09-15 2023-10-27 成都中轨轨道设备有限公司 Image-based real-time ranging method and system for target object
CN119152478A (en) * 2024-11-18 2024-12-17 中国第一汽车股份有限公司 Target detection method, device, storage medium and equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116953680A (en) * 2023-09-15 2023-10-27 成都中轨轨道设备有限公司 Image-based real-time ranging method and system for target object
CN116953680B (en) * 2023-09-15 2023-11-24 成都中轨轨道设备有限公司 Image-based real-time ranging method and system for target object
CN119152478A (en) * 2024-11-18 2024-12-17 中国第一汽车股份有限公司 Target detection method, device, storage medium and equipment

Similar Documents

Publication Publication Date Title
US11632536B2 (en) Method and apparatus for generating three-dimensional (3D) road model
CN112861653B (en) Method, system, equipment and storage medium for detecting fused image and point cloud information
CN115082924B (en) A 3D target detection method based on monocular vision and radar pseudo-image fusion
WO2021227645A1 (en) Target detection method and device
CN110060297B (en) Information processing apparatus, information processing system, information processing method, and storage medium
CN110555407B (en) Pavement vehicle space identification method and electronic equipment
CN111310708B (en) Traffic signal lamp state identification method, device, equipment and storage medium
JP2020525809A (en) System and method for updating high resolution maps based on binocular images
CN111950426A (en) Target detection method, device and vehicle
CN110969064B (en) An image detection method, device and storage device based on monocular vision
JP2019096072A (en) Object detection device, object detection method and program
CN110378919B (en) Narrow-road passing obstacle detection method based on SLAM
US20210287022A1 (en) Method for estimating a relative position of an object in the surroundings of a vehicle and electronic control unit for a vehicle and vehicle
CN116343165A (en) 3D target detection system, method, terminal equipment and storage medium
CN111950428A (en) Target obstacle identification method, device and vehicle
CN116917936A (en) Binocular camera external parameter calibration methods and devices
CN112329678A (en) Monocular pedestrian 3D positioning method based on information fusion
WO2021110497A1 (en) Estimating a three-dimensional position of an object
CN114463713A (en) Information detection method and device of vehicle in 3D space and electronic equipment
Petrovai et al. A stereovision based approach for detecting and tracking lane and forward obstacles on mobile devices
CN112733678A (en) Ranging method, ranging device, computer equipment and storage medium
CN114969221A (en) A method for updating a map and related equipment
CN112528918A (en) Road element identification method, map marking method and device and vehicle
CN113874681B (en) Evaluation method and system for point cloud map quality
KR102003387B1 (en) Method for detecting and locating traffic participants using bird's-eye view image, computer-readerble recording medium storing traffic participants detecting and locating program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination