CN115115978A - Object recognition method, device, storage medium and processor - Google Patents
Object recognition method, device, storage medium and processor Download PDFInfo
- Publication number
- CN115115978A CN115115978A CN202210663391.XA CN202210663391A CN115115978A CN 115115978 A CN115115978 A CN 115115978A CN 202210663391 A CN202210663391 A CN 202210663391A CN 115115978 A CN115115978 A CN 115115978A
- Authority
- CN
- China
- Prior art keywords
- detection
- frame
- frames
- detection frame
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000001514 detection method Methods 0.000 claims abstract description 556
- 230000004044 response Effects 0.000 claims description 14
- 230000001629 suppression Effects 0.000 description 26
- 238000001914 filtration Methods 0.000 description 25
- 230000004927 fusion Effects 0.000 description 15
- 238000012545 processing Methods 0.000 description 14
- 230000008447 perception Effects 0.000 description 13
- 230000000694 effects Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000007500 overflow downdraw method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000009191 jumping Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种对象的识别方法、装置、存储介质和处理器。其中,该方法包括:获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置;分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度;基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框;基于目标检测框在视频中识别出对象。本发明解决了对对象识别的准确率低的技术问题。
The invention discloses an object identification method, device, storage medium and processor. Wherein, the method includes: acquiring a plurality of detection frames of an object in a video, wherein each detection frame is used to represent the position of the object in the video; respectively determining the number of time frames of the plurality of detection frames in the video, wherein the time frame The number is used to represent the length of time that each detection frame appears continuously in the video; based on the time frame number and the frame number difference threshold, the target detection frame is determined in multiple detection frames; based on the target detection frame, the object is identified in the video. The invention solves the technical problem of low accuracy of object recognition.
Description
技术领域technical field
本发明涉及车辆领域,具体而言,涉及一种对象的识别方法、装置、存储介质和处理器。The present invention relates to the field of vehicles, and in particular, to an object recognition method, device, storage medium and processor.
背景技术Background technique
目前,在自动驾驶的过程中,基于对交通道路上的对象进行检测的结果,可以准确确定对象在空间的位置、大小、方向和类别等信息,从而达到三维建模、路径规划等目的。At present, in the process of autonomous driving, based on the results of the detection of objects on the traffic road, the information such as the location, size, direction and category of the objects in space can be accurately determined, so as to achieve 3D modeling, path planning and other purposes.
在相关技术中,针对对象的检测问题,通常是采用非极大值抑制(Non-MaximumSuppression,简称为NMS)的检测方法,但该方法仅适用于二维场景中的检测,从而针对空间上的检测,存在对对象识别的准确率低的技术问题。In the related art, for the detection of objects, the detection method of Non-Maximum Suppression (NMS for short) is usually used, but this method is only suitable for detection in two-dimensional scenes, so it is not suitable for spatial detection. Detection, there is a technical problem of low accuracy of object recognition.
针对上述相关技术中对对象识别的准确率低的问题,目前尚未提出有效的解决方案。For the problem of low accuracy of object recognition in the above-mentioned related art, no effective solution has been proposed yet.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供了一种对象的识别方法、装置、存储介质和处理器,以至少解决对目标识别的准确率低的技术问题。Embodiments of the present invention provide an object recognition method, device, storage medium and processor, so as to at least solve the technical problem of low target recognition accuracy.
根据本发明实施例的一个方面,提供了一种对象的识别方法,包括:获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置;分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度;基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框;基于目标检测框在视频中识别出对象。According to an aspect of the embodiments of the present invention, an object identification method is provided, including: acquiring a plurality of detection frames of an object in a video, wherein each detection frame is used to represent the position of the object in the video; The number of time frames of the detection frame in the video, where the number of time frames is used to represent the length of time that each detection frame appears continuously in the video; based on the number of time frames and the frame number difference threshold, target detection is determined in multiple detection frames Box; identifies objects in video based on object detection boxes.
可选地,基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框,包括:确定多个检测框中第一检测框和多个检测框中第二检测框的交并比,其中,第一检测框和第二检测框为多个检测框中处于同一时刻任意两个检测框;响应于交并比大于交并比阈值,获取第一检测框的时间帧数和第二检测框的时间帧数;基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,在第一检测框和第二检测框中确定目标检测框。Optionally, based on the time frame number and the frame number difference threshold, determining the target detection frame in the multiple detection frames includes: determining the intersection of the first detection frame in the multiple detection frames and the second detection frame in the multiple detection frames. ratio, wherein, the first detection frame and the second detection frame are any two detection frames in multiple detection frames at the same time; in response to the intersection ratio being greater than the intersection ratio threshold, the time frame number and the first detection frame of the first detection frame are obtained. 2. The time frame number of the detection frame; the target detection frame is determined in the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame and the frame number difference threshold.
可选地,基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,从第一检测框和第二检测框中确定目标检测框,包括:确定第一检测框的时间帧数和第二检测框的时间帧数二者之间的第一帧数差;基于第一帧数差和帧数差阈值,从第一检测框和第二检测框中确定目标检测框。Optionally, based on the time frame number of the first detection frame, the time frame number of the second detection frame, and the frame number difference threshold, determining the target detection frame from the first detection frame and the second detection frame includes: determining the first detection frame. The first frame number difference between the time frame number of the frame and the time frame number of the second detection frame; the target is determined from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold Check box.
可选地,基于第一帧数差和帧数差阈值,从第一检测框和第二检测框中确定目标检测框,包括:响应于第一帧数差的绝对值大于帧数差阈值,将第一检测框和第二检测框中时间帧数长的检测框确定为目标检测框。Optionally, based on the first frame number difference and the frame number difference threshold, determining the target detection frame from the first detection frame and the second detection frame, comprising: in response to the absolute value of the first frame number difference being greater than the frame number difference threshold, A detection frame with a long number of time frames in the first detection frame and the second detection frame is determined as a target detection frame.
可选地,基于第一帧数差和帧数差阈值,从第一检测框和第二检测框中确定目标检测框,包括:响应于第一帧数差的绝对值不大于帧数差阈值,将第一检测框和第二检测框中匹配度大的检测框确定为目标检测框,其中,匹配度用于表征对应的检测框与对象的匹配程度。Optionally, based on the first frame number difference and the frame number difference threshold, the target detection frame is determined from the first detection frame and the second detection frame, including: in response to the absolute value of the first frame number difference being not greater than the frame number difference threshold value , a detection frame with a large degree of matching between the first detection frame and the second detection frame is determined as a target detection frame, wherein the matching degree is used to represent the matching degree between the corresponding detection frame and the object.
可选地,获取每个检测框在视频中每一时刻对应的匹配度,得到每个检测框的至少一匹配度;将检测框的至少一匹配度之和与至少一匹配度的数量二者之间的商,确定为检测框的目标匹配度。Optionally, obtain the matching degree corresponding to each detection frame at each moment in the video, and obtain at least one matching degree of each detection frame; the sum of the at least one matching degree of the detection frame and the number of the at least one matching degree are both obtained. The quotient between is determined as the target matching degree of the detection frame.
可选地,基于对对象处理的历史检测数据,确定帧数差阈值。Optionally, the frame number difference threshold is determined based on historical detection data of object processing.
根据本发明实施例的另一方面,还提供了一种对象的识别装置,包括:获取单元,用于获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置;第一确定单元,用于分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度;第二确定单元,用于基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框;识别单元,用于基于目标检测框在视频中识别出对象。According to another aspect of the embodiments of the present invention, there is also provided an object identification device, comprising: an acquisition unit configured to acquire a plurality of detection frames of an object in a video, wherein each detection frame is used to indicate that the object is in the video The first determining unit is used to respectively determine the number of time frames of multiple detection frames in the video, wherein the number of time frames is used to characterize the time length of each detection frame appearing continuously in the video; the second determining unit, It is used to determine the target detection frame in multiple detection frames based on the time frame number and the frame number difference threshold; the recognition unit is used to identify the object in the video based on the target detection frame.
根据本发明实施例的另一方面,还提供了一种计算机可读存储介质。该计算机可读存储介质包括存储的程序,其中,在程序运行时控制计算机可读存储介质所在设备执行本发明实施例的对象的识别方法。According to another aspect of the embodiments of the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium includes a stored program, wherein when the program runs, the device where the computer-readable storage medium is located is controlled to execute the object identification method according to the embodiment of the present invention.
根据本发明实施例的另一方面,还提供了一种处理器。该处理器用于运行程序,其中,程序运行时执行本发明实施例的对象的识别方法。According to another aspect of the embodiments of the present invention, a processor is also provided. The processor is used for running a program, wherein the object identification method of the embodiment of the present invention is executed when the program is running.
在本发明实施例中,获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置;分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度;基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框;基于目标检测框在视频中识别出对象。也就是说,本发明实施例通过将多个检测框之间的时间帧数进行对比,达到过滤部分出现时间较短的重叠对象的目的,从而实现了提高对对象识别的准确率技术效果,解决了对对象识别的准确率低技术问题。In the embodiment of the present invention, a plurality of detection frames of an object in a video are obtained, wherein each detection frame is used to represent the position of the object in the video; the number of time frames of the plurality of detection frames in the video is determined respectively, wherein the time frame The number of frames is used to characterize the length of time that each detection frame appears continuously in the video; the target detection frame is determined in multiple detection frames based on the number of time frames and the difference threshold of the frame number; the object is identified in the video based on the target detection frame. That is to say, the embodiment of the present invention achieves the purpose of filtering some overlapping objects with a short appearance time by comparing the number of time frames between multiple detection frames, thereby achieving the technical effect of improving the accuracy of object recognition, and solving the problem of The technical problem of low accuracy of object recognition is solved.
附图说明Description of drawings
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached image:
图1是根据本发明实施例的一种对象的识别方法的流程图;1 is a flowchart of a method for identifying an object according to an embodiment of the present invention;
图2是根据本发明实施例的一种检测框的获取方法的流程图;2 is a flowchart of a method for acquiring a detection frame according to an embodiment of the present invention;
图3是根据相关技术的一种非极大值抑制识别过程的示意图;3 is a schematic diagram of a non-maximum suppression identification process according to the related art;
图4(a)是根据本发明实施例的一种未经过非极大值抑制的处理结果的示意图,Fig. 4(a) is a schematic diagram of a processing result without non-maximum suppression according to an embodiment of the present invention,
图4(b)是根据本发明实施例的一种未经过非极大值抑制的处理结果的示意图;Figure 4(b) is a schematic diagram of a processing result without non-maximum suppression according to an embodiment of the present invention;
图5是根据本发明实施例的一种非极大值抑制的处理方法的流程图;5 is a flowchart of a processing method for non-maximum suppression according to an embodiment of the present invention;
图6是根据本发明实施例的一种对象的识别装置的示意图。FIG. 6 is a schematic diagram of a device for recognizing an object according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Embodiments are part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.
实施例1Example 1
根据本发明实施例,提供了一种对象的识别方法的实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of an object identification method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings may be executed in a computer system such as a set of computer-executable instructions, and, Although a logical order is shown in the flowcharts, in some cases steps shown or described may be performed in an order different from that herein.
图1是根据本发明实施例的一种对象的识别方法的流程图,如图1所示的对象的识别方法的流程图,该方法包括如下步骤:1 is a flowchart of a method for recognizing an object according to an embodiment of the present invention, and the flowchart of the method for recognizing an object as shown in FIG. 1 includes the following steps:
步骤S102,获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置。Step S102, acquiring multiple detection frames of the object in the video, wherein each detection frame is used to represent the position of the object in the video.
在本发明上述步骤S102提供的技术方案中,可以通过多种传感器等方式获取一段时间的图像,得到待处理的视频,可以对视频进行处理,达到获取视频中对象的多个检测框的目的,其中,视频可以为一段时间的图像,可以为通过相机等采集设备采集到的一段时间的图像,比如,可以为车辆的车载摄像头在行驶过程中拍摄到的一段连续时间的图片等;对象可以为视频中存在的对象,比如,可以为在高速路段上的卡车,可以为在十字路口的行人等,此处不对视频获取方式和视频中的对象的种类做具体限制;检测框可以为倾斜框(rotate box,简称为rbox),可以为三维立体的三维检测框(3D rbox),比如,可以为正方体,长方体,也可以为二维平面的二维检测框(2D rbox),比如,可以为矩形。In the technical solution provided in the above step S102 of the present invention, an image for a period of time can be obtained by means of various sensors, etc., to obtain a video to be processed, and the video can be processed to achieve the purpose of obtaining multiple detection frames of objects in the video, The video may be an image of a period of time, or an image of a period of time collected by a collection device such as a camera. For example, it may be a picture of a continuous period of time captured by the vehicle's on-board camera during driving; the object may be The objects present in the video, for example, can be trucks on high-speed road sections, pedestrians at intersections, etc. There are no specific restrictions on the video acquisition method and the types of objects in the video; the detection frame can be a slanted frame ( rotate box, referred to as rbox), can be a three-dimensional three-dimensional detection box (3D rbox), for example, it can be a cube, a cuboid, or a two-dimensional plane two-dimensional detection box (2D rbox), for example, it can be a rectangle .
可选地,可以通过安装在不同位置的传感器获取到视频中对象的多个检测框,比如,可以通过激光雷达(lidar)和视觉(camera)传感器获取到三维检测框,可以通过毫米波雷达获得二维检测框。Optionally, multiple detection frames of objects in the video can be obtained through sensors installed in different positions. For example, three-dimensional detection frames can be obtained through lidar and vision (camera) sensors, and can be obtained through millimeter wave radar. 2D detection frame.
举例而言,当车辆行驶在环岛、施工路段、闹市街区和笔直路段等路段行驶时,获取车辆在一段时间行驶的视频,可以通过激光雷达、毫米波雷达和视觉等设备对视频中每一帧画面进行处理,得到视频的每一帧中的对象对应的检测框,得到多个检测框。For example, when a vehicle is driving on a roundabout, a construction road, a downtown area, a straight road, etc., to obtain a video of the vehicle driving for a period of time, each frame in the video can be analyzed by devices such as lidar, millimeter-wave radar, and vision. The picture is processed to obtain a detection frame corresponding to an object in each frame of the video, and a plurality of detection frames are obtained.
可选地,可以通过视觉(camera)和激光雷达(lidar)获取到三维检测框,将检测的三维检测框投影到地面获得二维的检测框,从而获取检测框的中心点、大小、朝向等信息,其中,检测框的信息可以用于表示视频中的对象在视频中的位置。Optionally, the three-dimensional detection frame can be obtained through vision (camera) and lidar (lidar), and the detected three-dimensional detection frame can be projected onto the ground to obtain a two-dimensional detection frame, thereby obtaining the center point, size, orientation, etc. of the detection frame. information, where the information of the detection frame can be used to represent the position of the object in the video in the video.
在相关技术中,只能通过非极大值抑制的检测方法实现二维检测框的检测,但是该方法对三维空间的对象在场景中的过滤存在局限,而在本发明实施例中,可以获取视频中每个对象在每一帧的三维立体的检测框,并将三维检测框投影至平面,从而达到获取视频中对象的多个二维检测框的目的。In the related art, the detection of the two-dimensional detection frame can only be realized by the non-maximum suppression detection method, but this method has limitations on the filtering of objects in the three-dimensional space in the scene. The three-dimensional detection frame of each object in the video in each frame, and the three-dimensional detection frame is projected to the plane, so as to achieve the purpose of obtaining multiple two-dimensional detection frames of the object in the video.
步骤S104,分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度。Step S104, respectively determining the number of time frames in the video for the multiple detection frames, where the number of time frames is used to represent the time length of each detection frame appearing continuously in the video.
在本发明上述步骤S104提供的技术方案中,可以分别确定每个检测框在视频中连续出现的时间帧数,其中,时间帧数可以用frame_cnt表示,可以用于表征每个检测框在视频中连续出现的时间长度,比如,可以为1秒,2秒等,此处仅为举例说明,不做具体限制。In the technical solution provided by the above step S104 of the present invention, the number of time frames in which each detection frame appears continuously in the video can be determined respectively, wherein the number of time frames can be represented by frame_cnt, which can be used to represent each detection frame in the video. The length of time for continuous occurrence, for example, may be 1 second, 2 seconds, etc., which is only for illustration here, and is not specifically limited.
举例而言,对象一对应的检测框从视频的第二帧起连续出现了三帧,则对象一对应的检测框在视频中出现的时间帧数可以为三帧。For example, if the detection frame corresponding to object 1 appears three frames consecutively from the second frame of the video, the number of time frames in which the detection frame corresponding to object 1 appears in the video may be three frames.
步骤S106,基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框。Step S106 , based on the time frame number and the frame number difference threshold, determine a target detection frame in a plurality of detection frames.
在本发明上述步骤S106提供的技术方案中,基于每个检测框对应的时间帧数,确定两个检测框二者之间的帧数差,将时间帧数和帧数差阈值进行比较,可以利用比较结果,从多个检测框中确定目标检测框,其中,目标检测框可以为与对象匹配程度最高的检测框,帧数差阈值可以称为时间阈值,可以通过time_threshold表示,可以为根据经验设定的值,也可以为根据实际情况设定的值。In the technical solution provided by the above step S106 of the present invention, based on the number of time frames corresponding to each detection frame, the difference in the number of frames between the two detection frames is determined, and the number of time frames is compared with the threshold of the difference in the number of frames. Using the comparison results, the target detection frame is determined from multiple detection frames, wherein the target detection frame can be the detection frame with the highest degree of matching with the object, and the frame number difference threshold can be called the time threshold, which can be represented by time_threshold, which can be based on experience The set value can also be the value set according to the actual situation.
举例而言,从视频的第二帧开始,对象一对应的检测框在视频中连续出现了三帧,则对象一在视频中的时间帧数为三帧,对象二对应的检测框在视频中连续出现了五帧,则对象二在视频中的时间帧数为五帧,假设帧数差阈值为一帧,则可以基于对象一的时间帧数、对象二的时间帧数和设定的帧数差阈值,对对象一和对象二之间进行过滤筛选,从而达到从多个检测框中确定目标检测框的目的。For example, starting from the second frame of the video, the detection frame corresponding to object 1 appears in the video for three consecutive frames, then the number of time frames in the video for object 1 is three frames, and the detection frame corresponding to object 2 appears in the video. If five frames appear in a row, the number of time frames of object two in the video is five frames. Assuming that the frame number difference threshold is one frame, it can be based on the number of time frames of object one, the number of time frames of object two, and the set frame. The number difference threshold is used to filter between object 1 and object 2, so as to achieve the purpose of determining the target detection frame from multiple detection frames.
在相关技术中,通过对单时间帧数的图像进行检测,忽略了在一段时间中,其他时间帧信息对感知结果的影响,而在本发明实施例中,对引入其余时间帧信息,根据对象出现的时间帧数设定时间阈值,从而过滤掉出现时间较短的重叠对象,进而提高了对对象识别的准确性。In the related art, by detecting images with a single time frame number, the influence of other time frame information on the perception result in a period of time is ignored. The number of time frames that appear sets a time threshold, thereby filtering out overlapping objects that appear for a short time, thereby improving the accuracy of object recognition.
步骤S108,基于目标检测框在视频中识别出对象。Step S108, an object is identified in the video based on the target detection frame.
在本发明上述步骤S108技术方案中,基于目标检测框,从而识别出在视频的每一帧中对象的位置、方向等信息。In the technical solution of the above step S108 of the present invention, information such as the position and direction of the object in each frame of the video is identified based on the target detection frame.
本申请上述步骤S102至步骤S108,获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置;分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度;基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框;基于目标检测框在视频中识别出对象,从而实现了提高对目标识别的准确率技术效果,解决了对目标识别的准确率低技术问题。In the above steps S102 to S108 of the present application, multiple detection frames of the object in the video are obtained, wherein each detection frame is used to represent the position of the object in the video; the time frame numbers of the multiple detection frames in the video are respectively determined, wherein , the number of time frames is used to characterize the length of time that each detection frame appears continuously in the video; based on the number of time frames and the frame number difference threshold, the target detection frame is determined in multiple detection frames; based on the target detection frame, the target detection frame is identified in the video object, so as to achieve the technical effect of improving the accuracy of target recognition, and solve the technical problem of low accuracy of target recognition.
下面对该实施例的上述方法进行进一步介绍。The above method of this embodiment will be further described below.
作为一种可选的实施例,基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框,包括:确定多个检测框中第一检测框和多个检测框中第二检测框的交并比,其中,第一检测框和第二检测框为多个检测框中处于同一时刻任意两个检测框;响应于交并比大于交并比阈值,获取第一检测框的时间帧数和第二检测框的时间帧数;基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,在第一检测框和第二检测框中确定目标检测框。As an optional embodiment, determining the target detection frame in multiple detection frames based on the time frame number and the frame number difference threshold includes: determining a first detection frame in the multiple detection frames and a second detection frame in the multiple detection frames The intersection ratio of the detection frame, wherein the first detection frame and the second detection frame are any two detection frames at the same time in the multiple detection frames; in response to the intersection ratio being greater than the intersection ratio threshold, obtain the first detection frame. The number of time frames and the number of time frames of the second detection frame; based on the number of time frames of the first detection frame, the number of time frames of the second detection frame and the frame number difference threshold, the target is determined in the first detection frame and the second detection frame Check box.
在该实施例中,可以在多个检测框中选择处于同一时刻任意的两个检测框,得到第一检测框和第二检测框,确定第一检测框和第二检测框二者之间的交并比,判断第一检测框和第二检测框二者之间的交并比是否大于交并比阈值,响应于交并比大于交并比阈值,说明第一检测框和第二检测框重叠,可以分别确定第一检测框和第二检测框的时间帧数,基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,在第一检测框和第二检测框中确定目标检测框,其中,交并比可以用于表征检测框之间的重叠程度,可以通过IoU进行表示;交并比阈值可以为根据实际情况设定的值,也可以为根据经验得到的值,可以通过T进行表示;第一检测框和第二检测框可以为同一帧下的任意两个检测框。In this embodiment, any two detection frames at the same time can be selected from multiple detection frames to obtain the first detection frame and the second detection frame, and the difference between the first detection frame and the second detection frame can be determined. The intersection ratio, judges whether the intersection ratio between the first detection frame and the second detection frame is greater than the intersection ratio threshold, and in response to the intersection ratio being greater than the intersection ratio threshold, indicating that the first detection frame and the second detection frame Overlapping, the number of time frames of the first detection frame and the second detection frame can be determined respectively, based on the number of time frames of the first detection frame, the number of time frames of the second detection frame and the frame number difference threshold, The target detection frame is determined in the second detection frame, wherein the intersection ratio can be used to characterize the degree of overlap between detection frames, which can be represented by IoU; the intersection ratio threshold can be a value set according to the actual situation, or can be based on The value obtained by experience can be represented by T; the first detection frame and the second detection frame can be any two detection frames in the same frame.
可选地,从多个检测框中选取第一检测框和第二检测框,对第一检测框与第二检测框的交集进行运算,并根据判断相交点集合组成的凸边形的面积,确定两个检测框之间的交并比,若交并比大于交并比阈值,说明两个检测框之间重叠,则通过时间帧数对检测框进行筛选,确定第一检测框的时间帧数和第二检测框的时间帧数,基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,在第一检测框和第二检测框中确定目标检测框。Optionally, selecting a first detection frame and a second detection frame from a plurality of detection frames, performing an operation on the intersection of the first detection frame and the second detection frame, and judging the area of the convex edge formed by the set of intersection points, Determine the intersection ratio between the two detection frames. If the intersection ratio is greater than the intersection ratio threshold, it means that the two detection frames overlap. Then the detection frames are screened by the number of time frames to determine the time frame of the first detection frame. and the number of time frames of the second detection frame, based on the number of time frames of the first detection frame, the number of time frames of the second detection frame and the frame number difference threshold, determine the target detection frame in the first detection frame and the second detection frame .
可选地,从多个检测框中选取第一检测框和第二检测框,对第一检测框和第二检测框进行交集运算得到相应的相交点集合,并根据判断相交点集合组成的凸边形的面积,得到两个检测框之间的交并比,即,交并比可以通过以下公式进行计算:Optionally, select a first detection frame and a second detection frame from a plurality of detection frames, perform an intersection operation on the first detection frame and the second detection frame to obtain a corresponding intersection point set, and judge the convexity formed by the intersection point set. The area of the edge is obtained to obtain the intersection ratio between the two detection frames, that is, the intersection ratio can be calculated by the following formula:
IoU=area_r1r2/(area_r1+area_r2-arear1r2)IoU=area_r1r2/(area_r1+area_r2-arear1r2)
其中,area_r1和area_r2分别为第一检测框和第二检测框的面积,area_r1r2为两个检测框相交组成的凸边形面积。Wherein, area_r1 and area_r2 are the areas of the first detection frame and the second detection frame, respectively, and area_r1r2 is the area of the convex edge formed by the intersection of the two detection frames.
可选地,若IoU大于阈值T,则确定第一检测框的时间帧数和第二检测框的时间帧数,基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,在第一检测框和第二检测框中确定目标检测框。Optionally, if the IoU is greater than the threshold T, then determine the number of time frames of the first detection frame and the number of time frames of the second detection frame, based on the number of time frames of the first detection frame, the number of time frames of the second detection frame and the frame number of the second detection frame. The number difference threshold is used to determine the target detection frame in the first detection frame and the second detection frame.
作为一种可选的实施例,基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,从第一检测框和第二检测框中确定目标检测框,包括:确定第一检测框的时间帧数和第二检测框的时间帧数二者之间的第一帧数差;基于第一帧数差和帧数差阈值,从第一检测框和第二检测框中确定目标检测框。As an optional embodiment, the target detection frame is determined from the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame, and the frame number difference threshold, including : Determine the first frame number difference between the time frame number of the first detection frame and the time frame number of the second detection frame; The detection frame determines the target detection frame.
在该实施例中,选取多个检测框中第一检测框和第二检测框,确定第一检测框的时间帧数和第二检测框的时间帧数,确定第一检测框和第二检测框之间的时间帧数差,得到第一帧数差,基于帧数差阈值对第一帧数差进行判断,从而达到从第一检测框和第二检测框中确定目标检测框的目的。In this embodiment, the first detection frame and the second detection frame are selected from multiple detection frames, the time frame number of the first detection frame and the time frame number of the second detection frame are determined, and the first detection frame and the second detection frame are determined. The time frame number difference between the frames is obtained to obtain the first frame number difference, and the first frame number difference is judged based on the frame number difference threshold, so as to achieve the purpose of determining the target detection frame from the first detection frame and the second detection frame.
作为一种可选的实施例,基于第一帧数差和帧数差阈值,从第一检测框和第二检测框中确定目标检测框,包括:响应于第一帧数差的绝对值大于帧数差阈值,将第一检测框和第二检测框中时间帧数长的检测框确定为目标检测框。As an optional embodiment, determining the target detection frame from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold includes: in response to the absolute value of the first frame number difference being greater than The frame number difference threshold, and the detection frame with a long time frame number in the first detection frame and the second detection frame is determined as the target detection frame.
在该实施例中,在多个检测框中选取第一检测框和第二检测框,确定第一检测框的时间帧数和第二检测框的时间帧数,确定第一检测框和第二检测框之间的时间帧数差,得到第一帧数差,判断第一帧数差的绝对值是否大于帧数差阈值,响应于第一帧数差的绝对值大于帧数差阈值,则将第一检测框和第二检测框中时间帧数长的检测框确定为目标检测框。In this embodiment, the first detection frame and the second detection frame are selected from multiple detection frames, the time frame number of the first detection frame and the time frame number of the second detection frame are determined, and the first detection frame and the second detection frame are determined. Detect the time frame number difference between the frames, obtain the first frame number difference, and determine whether the absolute value of the first frame number difference is greater than the frame number difference threshold. In response to the absolute value of the first frame number difference being greater than the frame number difference threshold, then A detection frame with a long number of time frames in the first detection frame and the second detection frame is determined as a target detection frame.
可选地,从多个检测框中选取第一检测框和第二检测框,对第一检测框与第二检测框的交集进行运算,并根据判断相交点集合组成的凸边形的面积,确定两个检测框之间的交并比,若交并比大于交并比阈值,说明两个检测框之间重叠,则可以通过时间帧数对检测框进行筛选;进一步的,确定第一检测框的时间帧数和第二检测框的时间帧数,确定两个检测框之间的帧数差,得到第一帧数差,如果第一帧数差的绝对值大于帧数差阈值,说明两个检测框之间不但重叠且重叠时间长,将两个检测框中时间帧数长的检测框确定为目标检测框。可选地,可以在多个检测框之间通过上述描述进行迭代操作,达到从多个检测框中确定目标检测框的目的。Optionally, selecting a first detection frame and a second detection frame from a plurality of detection frames, performing an operation on the intersection of the first detection frame and the second detection frame, and judging the area of the convex edge formed by the set of intersection points, Determine the intersection ratio between the two detection frames. If the intersection ratio is greater than the intersection ratio threshold, it means that the two detection frames overlap, and the detection frames can be screened by the number of time frames; further, determine the first detection frame. The time frame number of the frame and the time frame number of the second detection frame, determine the frame number difference between the two detection frames, and obtain the first frame number difference. If the absolute value of the first frame number difference is greater than the frame number difference threshold, explain The two detection frames not only overlap but also overlap for a long time, and the detection frame with a long number of time frames in the two detection frames is determined as the target detection frame. Optionally, an iterative operation may be performed between multiple detection frames through the above description, so as to achieve the purpose of determining the target detection frame from the multiple detection frames.
作为一种可选的实施例,基于第一帧数差和帧数差阈值,从第一检测框和第二检测框中确定目标检测框,包括:响应于第一帧数差的绝对值不大于帧数差阈值,将第一检测框和第二检测框中目标匹配度大的检测框确定为目标检测框,其中,目标匹配度用于表征对应的检测框与对象的匹配程度。As an optional embodiment, determining the target detection frame from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold includes: in response to the difference in the absolute value of the first frame number difference being different. If the frame number difference threshold is greater than the frame number difference threshold, the detection frame with a large target matching degree between the first detection frame and the second detection frame is determined as the target detection frame, wherein the target matching degree is used to represent the matching degree between the corresponding detection frame and the object.
在该实施例中,在多个检测框中选取第一检测框和第二检测框,确定第一检测框的时间帧数和第二检测框的时间帧数,确定第一检测框和第二检测框之间的时间帧数差,得到第一帧数差,判断第一帧数差的绝对值是否大于帧数差阈值,响应于第一帧数差的绝对值不大于帧数差阈值,则将第一检测框和第二检测框中目标匹配度大的检测框确定为目标检测框,其中,目标匹配度可以用于表征对应的检测框与对象的匹配程度,可以为检测框与对象在一段时间内的匹配均值,比如,可以为置信度均值,可以通过score均值来表示。In this embodiment, the first detection frame and the second detection frame are selected from multiple detection frames, the time frame number of the first detection frame and the time frame number of the second detection frame are determined, and the first detection frame and the second detection frame are determined. Detecting the time frame number difference between the frames, obtaining the first frame number difference, judging whether the absolute value of the first frame number difference is greater than the frame number difference threshold, and responding that the absolute value of the first frame number difference is not greater than the frame number difference threshold, Then, the detection frame with a large target matching degree between the first detection frame and the second detection frame is determined as the target detection frame, wherein the target matching degree can be used to represent the matching degree between the corresponding detection frame and the object, which can be the detection frame and the object. The matching mean over a period of time, for example, can be the confidence mean, which can be represented by the score mean.
可选地,多个检测框中选取第一检测框和第二检测框,对第一检测框与第二检测框的交集进行运算,并根据判断相交点集合组成的凸边形的面积,确定两个检测框之间的交并比,若交并比大于交并比阈值,说明两个检测框之间重叠,则可以通过时间帧数对检测框进行筛选;进一步的,确定第一检测框的时间帧数和第二检测框的时间帧数,确定两个检测框之间的帧数差,得到第一帧数差,如果第一帧数差的绝对值不大于帧数差阈值,说明两个检测框之间虽然重叠但重叠时间不长,则将第一检测框和第二检测框中目标匹配度大的检测框确定为目标检测框。可选地,将多个检测框之间通过上述描述进行迭代操作,达到从多个检测框中确定目标检测框的目的。Optionally, a first detection frame and a second detection frame are selected from multiple detection frames, the intersection of the first detection frame and the second detection frame is calculated, and the area of the convex polygon formed by the set of judging intersection points is determined. The intersection ratio between the two detection frames, if the intersection ratio is greater than the intersection ratio threshold, indicating that the two detection frames overlap, the detection frames can be screened by the number of time frames; further, determine the first detection frame The number of time frames and the number of time frames of the second detection frame, determine the frame number difference between the two detection frames, and obtain the first frame number difference. If the absolute value of the first frame number difference is not greater than the frame number difference threshold, explain Although the two detection frames overlap, but the overlapping time is not long, the detection frame with a large target matching degree between the first detection frame and the second detection frame is determined as the target detection frame. Optionally, an iterative operation is performed between multiple detection frames according to the above description, so as to achieve the purpose of determining the target detection frame from the multiple detection frames.
在本发明实施例中,确定检测框之间的重叠关系,过滤掉目标匹配度低的检测框,从而过滤相应的三维空间的对象,从而能够有效改善检测对象的重复、跳变、关联,导致检测结果不稳定的问题,从而提升多传感器三维空间目标的融合性能,提高对对象进行检测的准确率。In the embodiment of the present invention, the overlapping relationship between the detection frames is determined, and the detection frames with low target matching degree are filtered out, so as to filter the corresponding objects in the three-dimensional space, thereby effectively improving the repetition, jumping, and association of the detection objects, resulting in The problem of unstable detection results can improve the fusion performance of multi-sensor three-dimensional space targets and improve the accuracy of object detection.
作为一种可选的实施例,获取每个检测框在视频中每一时刻对应的匹配度,得到每个检测框的至少一匹配度;将检测框的至少一匹配度之和与至少一匹配度的数量二者之间的商,确定为检测框的目标匹配度。As an optional embodiment, the matching degree corresponding to each detection frame at each moment in the video is obtained, and at least one matching degree of each detection frame is obtained; the sum of the at least one matching degree of the detection frame is matched with at least one matching degree. The quotient between the number of degrees is determined as the target matching degree of the detection frame.
在该实施例中,获取每个检测框在视频中每一时刻对应的匹配度,得到每个检测框的至少一匹配度,将每个检测框对应的所有匹配度相加,将检测框对应的所有匹配度的和与匹配度数量二者之间的商确定为检测框的目标匹配度,其中,匹配度可以用于表示检测框在每帧的与待识别对象的匹配程度,可以用score进行表示。In this embodiment, the matching degree corresponding to each detection frame at each moment in the video is obtained, at least one matching degree of each detection frame is obtained, all the matching degrees corresponding to each detection frame are added up, and the detection frame corresponding to The quotient between the sum of all matching degrees and the number of matching degrees is determined as the target matching degree of the detection frame, where the matching degree can be used to indicate the matching degree of the detection frame and the object to be recognized in each frame, and score can be used to express.
在相关技术中,非极大值抑制是目标检测框架中的后处理模块,主要用于删除高度冗余的检测框,可选地,对于一帧图像中的每个对象在检测的时候会产生多个检测框,通过非极大值抑制对每个目标的多个检测框去冗余,得到最终的目标检测框,一般情况下,检测框一般通过水平矩形框进行表示,由于传感器输出的三维空间的对象具有方向不确定性,因此,在本发明实施例中,针对带方向矩形框(rotate box,rbox)采用倾斜非极大值抑制(Incline Non-Maximum Suppression,INMS)。In the related art, non-maximum suppression is a post-processing module in the target detection framework, which is mainly used to delete highly redundant detection frames. For multiple detection frames, the multiple detection frames of each target are de-redundant through non-maximum suppression, and the final target detection frame is obtained. In general, the detection frame is generally represented by a horizontal rectangular frame. Due to the three-dimensional output of the sensor Objects in space have orientation uncertainty. Therefore, in this embodiment of the present invention, Incline Non-Maximum Suppression (INMS) is adopted for a directional rectangular box (rotate box, rbox).
可选地,在本发明实施例中,可以通过多个传感器得到对象对应的多个检测框,利用速度阈值对毫米波雷达获得的检测库进行过滤,将视觉传感器和激光雷达检测得到的三维检测框投影到地面得到二维检测框,得到对象的中心点、大小、朝向等信息,分别确定各个检测框出现的帧数(frame_cnt)和每帧的匹配度(score),得到多个检测框的匹配度,将每个检测框对应的所有匹配度相加,将检测框对应的所有匹配度的和与匹配度数量二者之间的商确定为检测框的目标匹配度,将同一检测框在每帧的匹配度更改为目标匹配度。Optionally, in this embodiment of the present invention, multiple detection frames corresponding to the object may be obtained through multiple sensors, the detection library obtained by the millimeter-wave radar may be filtered by using the speed threshold, and the three-dimensional detection data obtained by the visual sensor and the lidar detection may be filtered. The frame is projected onto the ground to obtain a two-dimensional detection frame, and information such as the center point, size, and orientation of the object is obtained, and the number of frames (frame_cnt) and the matching degree of each frame (score) appearing in each detection frame are determined respectively, and multiple detection frames are obtained. Matching degree, add all the matching degrees corresponding to each detection frame, and determine the quotient between the sum of all matching degrees corresponding to the detection frame and the number of matching degrees as the target matching degree of the detection frame. The match for each frame changes to the target match.
可选地,可以利用目标匹配度对每帧的检测框进行排序,得到检测框序列(rbox_list),从而对检测框序列中的检测框进行比较,得到目标检测框,此处对检测框进行排序只是为了保证可以对检测框进行有序处理,从而避免遗漏,因而,不对排序方法做具体限制。Optionally, the detection frames of each frame can be sorted by the target matching degree to obtain the detection frame sequence (rbox_list), so as to compare the detection frames in the detection frame sequence to obtain the target detection frame, where the detection frames are sorted. It is only to ensure that the detection frames can be processed in an orderly manner so as to avoid omission, and therefore, no specific restrictions are imposed on the sorting method.
在本发明实施例中,将检测框的目标匹配度设置为其所有出现帧的匹配度均值,有效改善融合中关联目标跳变、关联结果不稳定的问题,从而提升多传感器对象检测过程中结果的融合。In the embodiment of the present invention, the target matching degree of the detection frame is set as the average matching degree of all the appearing frames, which effectively improves the problems of associated target jumping and unstable association results during fusion, thereby improving the results in the multi-sensor object detection process. fusion.
作为一种可选的实施例,基于对对象处理的历史检测数据,确定帧数差阈值。As an optional embodiment, the frame number difference threshold is determined based on historical detection data of object processing.
在该实施例中,可以通过多次试验进行选择,达到确定合适的帧数差阈值的目的。In this embodiment, selection can be made through multiple trials to achieve the purpose of determining an appropriate frame number difference threshold.
本发明实施例通过将多个检测框之间的时间帧数进行对比,达到过滤部分出现时间较短的重叠对象的目的,从而实现了提高对目标识别的准确率技术效果,解决了对目标识别的准确率低技术问题。The embodiment of the present invention achieves the purpose of filtering some overlapping objects with a short appearance time by comparing the number of time frames between multiple detection frames, thereby achieving the technical effect of improving the accuracy of target recognition, and solving the problem of target recognition. Low accuracy technical issues.
实施例2Example 2
下面结合优选的实施方式对本发明实施例的技术方案进行举例说明。The technical solutions of the embodiments of the present invention are illustrated below with reference to the preferred embodiments.
自动驾驶领域中的核心功能模块可以分为感知、决策和控制模块,其中,感知模块需要实现对周围环境的精确感知才可以为后续其他模块提供分析的基础,因此,提高对象检测的精度具有重要作用,同时,不同传感器感知到的对象属性和感知范围存在差异,相同类型的不同传感器器如果安装位置不同,感知区域也会存在差异,因此,需要对各个传感器的感知结果进行融合,从而获得关于周围对象更精确的信息。The core functional modules in the field of autonomous driving can be divided into perception, decision-making and control modules. Among them, the perception module needs to achieve accurate perception of the surrounding environment to provide the basis for analysis for other subsequent modules. Therefore, it is important to improve the accuracy of object detection. At the same time, there are differences in the object attributes and sensing ranges perceived by different sensors. If different sensors of the same type are installed in different positions, the sensing areas will also be different. Therefore, it is necessary to fuse the sensing results of each sensor to obtain information about the sensor. More precise information about surrounding objects.
目前,三维对象检测作为自动驾驶感知模块的关键技术之一,能够提供对象的在三维空间的位置、大小、方向和类别等信息,从而完成三维空间的建模、路径规划等,但是,由于单一传感器的感知存在一定局限性,多传感器(比如,激光雷达、毫米波雷达和视觉)数据的处理与融合成为自动驾驶领域中对象检测技术的趋势。At present, 3D object detection, as one of the key technologies of the autonomous driving perception module, can provide information such as the position, size, direction and category of objects in 3D space, so as to complete 3D space modeling, path planning, etc. However, due to a single The perception of sensors has certain limitations, and the processing and fusion of multi-sensor (such as lidar, millimeter-wave radar, and vision) data has become the trend of object detection technology in the field of autonomous driving.
在相关技术中,传感器对三维对象进行检测,存在叠框问题,导致误检、漏检,从而对多传感器融合的关联精度和融合鲁棒性产生较大影响,针对传感器对三维对象的检测存在的叠框问题,非极大值抑制是一种常用的检测过滤方法,用于提取置信度高的对象检测框,而抑制置信度低的误检框,其中,标准的非极大值抑制适用于水平二维矩形框的检测,可以利用平滑非极大值抑制(soft NMS)和适应非极大值抑制(adaptive NMS)优化非极大值抑制在对象较密集和存在遮挡情况下易产生漏检的问题。In related technologies, sensors detect three-dimensional objects, and there is a problem of overlapping frames, which leads to false detection and missed detection, which has a great impact on the correlation accuracy and fusion robustness of multi-sensor fusion. The overlapping frame problem of , non-maximum suppression is a commonly used detection filtering method, which is used to extract object detection boxes with high confidence and suppress false detection boxes with low confidence. Among them, the standard non-maximum suppression applies For the detection of horizontal two-dimensional rectangular boxes, smooth non-maximum suppression (soft NMS) and adaptive non-maximum suppression (adaptive NMS) can be used to optimize non-maximum suppression, which is prone to leaks when objects are dense and occluded. inspection problem.
为解决上述问题,本发明实施例提出一种基于倾斜非极大值抑制(Incline Non-Maximum Suppression简称为INMS)的检测方法,该方法基于倾斜框,将当前遍历的检测框与剩余的其他检测框进行交集运算得到相应的相交点集合,并根据判断相交点集合组成的凸边形的面积,计算每两个检测框的交并比,但是,该方法大多基于单帧观测信息进行检测过滤,忽略了其余时间戳信息对于感知结果的作用,从而存在对对象进行检测的准确率低的问题。In order to solve the above problem, an embodiment of the present invention proposes a detection method based on Incline Non-Maximum Suppression (INMS for short), which is based on an inclined frame and compares the currently traversed detection frame with the remaining other detection frames. The intersection operation is performed on the frame to obtain the corresponding intersection point set, and the intersection ratio of each two detection frames is calculated according to the area of the convex polygon composed of the intersection point set. However, most of this method is based on single-frame observation information for detection and filtering. The effect of the remaining timestamp information on the perception result is ignored, so there is a problem of low accuracy of object detection.
在一种相关技术中,提供一种用于自动驾驶中的传感器数据过滤及融合方法,该方法分为空间过滤融合方法和紧随的时间过滤融合方法,空间过滤融合方法在自动驾驶的感知系统从传感器获得一帧原始数据后,对数据点进行空间聚类处理,认定有效的聚类结果并排除噪声;对聚类结果进行关联性跟踪,从而能够得到某一物体的位置速度信息、历史信息、预测信息;然后进行特征信息估计,计算聚类以后物体的长宽和朝向信息;再进行时间过滤融合方法将原始目标判别为确定目标、可疑目标和伪目标。该方法无需考虑传感器的类型和传感器的数量,虽然针对任何大噪声和小噪声的传感器数据都能够使用,计算量小,简单而灵活有效,但是,该方法大多基于单帧观测信息进行检测过滤,忽略了其余时间戳信息对于感知结果的作用,仍存在对对象进行检测的准确率低的问题。In a related art, a sensor data filtering and fusion method for automatic driving is provided. The method is divided into a spatial filtering and fusion method and a subsequent temporal filtering and fusion method. The spatial filtering and fusion method is used in the perception system of automatic driving. After obtaining a frame of raw data from the sensor, perform spatial clustering processing on the data points, identify valid clustering results and eliminate noise; perform correlation tracking on the clustering results, so as to obtain the position and speed information and historical information of an object. , prediction information; then estimate the feature information, calculate the length, width and orientation information of the object after clustering; then use the time filtering fusion method to distinguish the original target as a definite target, a suspicious target and a pseudo target. This method does not need to consider the type of sensor and the number of sensors. Although it can be used for any sensor data with large or small noise, the calculation amount is small, simple, flexible and effective. However, this method is mostly based on single-frame observation information for detection and filtering. Ignoring the effect of the remaining timestamp information on the perception results, there is still the problem of low accuracy of object detection.
在另一种相关技术中,提供一种多目标检测的非极大值抑制方法,该方法获取各个检测出的目标的类别和置信分数,并按照置信分数对各个目标进行排序,得到检测队列;获取检测队列中置信分数最高的目标,记作目标A,判断目标A的置信分数是否满足类别预设条件;若目标A的置信分数满足其所属类别的类别预设条件,判断目标A的检测框是否与其它目标的检测框存在交叠;若判断出目标A与其它目标的检测框存在交叠,将检测框与目标A的检测框存在交叠的目标记作目标B,判断目标A与目标B是否为同一类别,并根据判断结果采用对应的抑制算法,判断目标B是否被抑制,由于该方法大多也是基于单帧观测信息进行检测过滤,忽略了其余时间戳信息对于感知结果的作用,因而仍存在对对象进行检测的准确率低的问题。In another related art, a non-maximum suppression method for multi-target detection is provided, the method obtains the category and confidence score of each detected target, and sorts each target according to the confidence score to obtain a detection queue; Obtain the target with the highest confidence score in the detection queue, record it as target A, and judge whether the confidence score of target A satisfies the category preset condition; if the confidence score of target A satisfies the category preset condition of its category, judge the detection frame of target A Whether there is overlap with the detection frame of other targets; if it is determined that the detection frame of target A and the detection frame of other targets overlap, the target whose detection frame overlaps with the detection frame of target A is marked as target B, and the target A and the target are judged to overlap. Whether B is in the same category, and according to the judgment result, the corresponding suppression algorithm is used to judge whether the target B is suppressed. Since most of the methods are also based on single-frame observation information for detection and filtering, the effect of other timestamp information on the perception result is ignored, so There is still a problem of low accuracy in detecting objects.
在另一种相关技术中,提出一种基于多信息融合的3D目标检测方法,该方法分别采集行车图像和点云信息,并进行预处理;基于行车图像,利用2D目标检测器,得到2D目标框及其得分;基于点云信息,利用3D目标检测器,得到候选3D目标框;利用2D目标框和候选3D目标框的对应关系,以及2D目标框得分,对候选3D目标框进行筛选,得到3D目标框,也就是说,该方法通过引入视觉信息,弥补了稀疏点云的不足,通过2D检测结果对3D目标框进行约束,提高了3D目标框的召回率,减小了误检、漏检的概率,但是该方法未考虑多传感器数据的融合,因而仍存在对对象进行检测的准确率低的技术问题。In another related technology, a 3D target detection method based on multi-information fusion is proposed. The method collects driving images and point cloud information respectively, and performs preprocessing. Based on the driving images, a 2D target detector is used to obtain 2D targets. frame and its score; based on the point cloud information, use the 3D target detector to obtain the candidate 3D target frame; use the correspondence between the 2D target frame and the candidate 3D target frame, as well as the 2D target frame score, filter the candidate 3D target frame, get 3D target frame, that is to say, this method makes up for the shortage of sparse point cloud by introducing visual information, and constrains the 3D target frame through the 2D detection result, which improves the recall rate of the 3D target frame and reduces false detection and leakage. However, this method does not consider the fusion of multi-sensor data, so there is still a technical problem of low accuracy of object detection.
为解决上述问题,本发明实施例针对多传感器三维目标检测的过程中容易出现叠框导致融合关联精度降低的问题,提出一种基于INMS的多传感器三维目标检测过滤及融合方法,通过对多传感器包括激光雷达、毫米波雷达和视觉传感器中获取到检测框,对获取到的检测框进行预处理过滤,对视觉传感器和毫米波雷达的检测结果分别按一定的分数(score)阈值和速度阈值进行过滤,其中,分数阈值和速度阈值可以为根据实际情况或经验提前设定的值;再将检测的3D rbox投影到地面获得2D rbox,从而得到检测框的中心点、大小、朝向信息;针对相关技术中,大多基于单帧观测信息进行检测过滤,忽略了其余时间戳信息对于感知结果作用的问题,在INMS中对于单个检测框的处理引入其余时间戳的信息,根据检测框出现的时长设定时间阈值,过滤部分出现时间较短的重叠目标,并将检测框对应的匹配度(score)置为其所有出现帧的匹配度均值,从而能够有效改善检测目标重叠、跳变、关联结果不稳定的问题,提升多传感器对目标检测的融合性能。In order to solve the above problems, the embodiment of the present invention proposes a multi-sensor 3D target detection filtering and fusion method based on INMS, aiming at the problem that overlapping frames are prone to reduce the fusion correlation accuracy in the process of multi-sensor 3D target detection. Including the detection frame obtained from the lidar, millimeter wave radar and vision sensor, the obtained detection frame is preprocessed and filtered, and the detection results of the vision sensor and the millimeter wave radar are processed according to a certain score threshold and speed threshold respectively. Filtering, in which the score threshold and speed threshold can be values set in advance according to the actual situation or experience; then project the detected 3D rbox to the ground to obtain a 2D rbox, so as to obtain the center point, size, and orientation information of the detection frame; In the technology, most of the detection and filtering are based on single-frame observation information, ignoring the effect of the remaining timestamp information on the perception result. In the INMS, the processing of a single detection frame introduces the information of the remaining timestamps, which is set according to the duration of the detection frame. Time threshold, filter some overlapping targets with a short appearance time, and set the matching degree (score) corresponding to the detection frame to the average matching degree of all the appearing frames, which can effectively improve the detection target overlapping, jumping, and unstable association results. to improve the fusion performance of multi-sensor object detection.
下面对本发明实施例进行进一步的介绍。The embodiments of the present invention will be further introduced below.
在本发明实施例中主要分为检测框的获取和处理两个方面。In the embodiment of the present invention, it is mainly divided into two aspects: acquisition and processing of detection frames.
图2是根据本发明实施例的一种检测框的获取方法的流程图,如图2所示,本发明实施例的检测框的获取可以包括以下几个步骤:FIG. 2 is a flowchart of a method for acquiring a detection frame according to an embodiment of the present invention. As shown in FIG. 2 , the acquisition of a detection frame in an embodiment of the present invention may include the following steps:
步骤S201,获取各个传感器的数据。Step S201, acquiring data of each sensor.
在该实施例中,多传感器可以包括激光雷达(lidar)、毫米波雷达(radar)和视觉传感器(camera),利用激光雷达、毫米波雷达和视觉传感器获取检测数据。In this embodiment, the multi-sensor may include a lidar, a millimeter-wave radar (radar), and a vision sensor (camera), and detection data is acquired by using the lidar, the millimeter-wave radar, and the vision sensor.
步骤S202,基于分数阈值或速度阈值对获取到的检测数据进行过滤。Step S202, filtering the acquired detection data based on the score threshold or the speed threshold.
在该实施例中,对获取到的目标进行预处理过滤。In this embodiment, preprocessing filtering is performed on the acquired target.
可选地,可以将lidar获取到的三维检测结果(3Dbox)投影至地面得到二维的检测框(2D rbox),获取rbox的中心点、大小、朝向信息等信息。Optionally, the three-dimensional detection result (3Dbox) obtained by the lidar can be projected onto the ground to obtain a two-dimensional detection frame (2D rbox), and information such as the center point, size, and orientation information of the rbox can be obtained.
可选地,可以基于分数阈值对camera获取到的检测结果进行过滤,比如,可以通过分数阈值对三维检测框进行过滤处理(3D rbox),将过滤得到的检测结果投影到平面得到对应的平面检测框,获取检测框的中心点、大小、朝向信息等信息,其中,分数阈值可以为经验值,可以为根据实际情况或实验检验得到的值,此处不对分数阈值的确定做具体限制。Optionally, the detection results obtained by the camera can be filtered based on the score threshold. For example, the three-dimensional detection frame can be filtered through the score threshold (3D rbox), and the filtered detection results can be projected onto the plane to obtain the corresponding plane detection. frame, to obtain information such as the center point, size, and orientation information of the detection frame, where the score threshold can be an empirical value, or a value obtained according to the actual situation or experimental inspection. There is no specific restriction on the determination of the score threshold here.
可选地,对于radar获取到的检测数据,可以通过速度阈值对获取到的检测结果进行过滤,得到需要处理的检测框。Optionally, for the detection data acquired by radar, the acquired detection results may be filtered through a speed threshold to obtain detection frames that need to be processed.
步骤S203,将所有传感器筛选之后的数据输入倾斜非极大值抑制,并对数据进行处理。Step S203: Suppress the non-maximum value of the input inclination of the data after screening by all the sensors, and process the data.
将筛选之后的检测框输入改进的倾斜非极大值抑制中,并利用改进后的倾斜非极大值抑制对数据进行处理。Input the filtered detection box into the improved tilt non-maximum suppression, and use the improved tilt non-maximum suppression to process the data.
在相关技术中,非极大值抑制是目标检测框架中的后处理模块,可以用于删除高度冗余的检测框,图3是根据相关技术的一种非极大值抑制识别过程的示意图,如图3所示,对于对象A和对象B在检测的时候会产生多个检测框,非极大值抑制本质就是对每个对象的多个检测框去冗余,得到最终的检测结果,可以为得到最终匹配度最大的目标检测框,但是,目标检测一般采用的水平矩形框,而传感器输出的三维目标具有方向不确定性,因此,本发明实施例针对带方向矩形框采用倾斜非极大值抑制,同时在倾斜非极大值抑制中对于单个目标引入其余时间戳的信息,根据其出现时长设定时间阈值,过滤部分出现时间较短的重叠目标,并将目标分数置为其所有出现帧分数均值,图4(a)是根据本发明实施例的一种未经过非极大值抑制的处理结果的示意图,图4(b)是根据本发明实施例的一种未经过非极大值抑制的处理结果的示意图,如图4(a)和图4(b)所示,本发明实施例可以改善融合中,关联目标的跳变、关联结果不稳定的问题,同时,利用时间帧数,过滤掉重叠的部分,从而有效的提升了多传感器三维目标的融合性能。In the related art, non-maximum suppression is a post-processing module in the target detection framework, which can be used to delete highly redundant detection frames. FIG. 3 is a schematic diagram of a non-maximum suppression identification process according to the related art, As shown in Figure 3, for object A and object B, multiple detection frames will be generated during detection. The essence of non-maximum suppression is to remove redundancy from multiple detection frames of each object to obtain the final detection result, which can be In order to obtain the target detection frame with the maximum matching degree, the horizontal rectangular frame is generally used for target detection, and the three-dimensional target output by the sensor has direction uncertainty. At the same time, the information of the remaining time stamps is introduced for a single target in the slope non-maximum value suppression, and the time threshold is set according to its appearance time, and some overlapping targets with a short appearance time are filtered, and the target score is set to all its occurrences Frame score mean, FIG. 4( a ) is a schematic diagram of a processing result without non-maximum suppression according to an embodiment of the present invention, and FIG. 4( b ) is a non-maximum processed result according to an embodiment of the present invention. A schematic diagram of the processing result of value suppression, as shown in Fig. 4(a) and Fig. 4(b), the embodiment of the present invention can improve the problems of the jump of the associated target and the instability of the association result in the fusion, and at the same time, use the time frame The number of overlapping parts is filtered out, which effectively improves the fusion performance of multi-sensor 3D targets.
图5是根据本发明实施例的一种非极大值抑制的处理方法的流程图,如图5所示,本发明实施例提出的INSM的处理可以包括以下几个步骤:FIG. 5 is a flowchart of a processing method for non-maximum value suppression according to an embodiment of the present invention. As shown in FIG. 5 , the processing of the INSM proposed by the embodiment of the present invention may include the following steps:
步骤S501,对各传感器过滤后的检测框进行处理。Step S501, processing the detection frame filtered by each sensor.
可选地,获取各个检测框出现的时间帧数(frame_cnt)和每一帧的置信度,确定每个检测框的置信度均值,将检测框每一帧的分数更改为对应的置信度均值。Optionally, obtain the time frame number (frame_cnt) of each detection frame and the confidence of each frame, determine the average confidence of each detection frame, and change the score of each frame of the detection frame to the corresponding confidence average.
可选地,对同一帧下的所有检测框按照置信度阈值进行排序,得到排序表(rbox_list),需要说明的是,排序可以为升序也可以为降序,排序的主要目的是保证可以对所有检测框进行处理,防止在处理过程中遗漏检测框,因此,此处不对排序的方式做具体说明,只要是为了获取排序顺序的方法都应该在本发明实施例的保护范围之内。Optionally, sort all detection boxes under the same frame according to the confidence threshold to obtain a sorting table (rbox_list). It should be noted that the sorting can be in ascending or descending order. The main purpose of sorting is to ensure that all detections can be Frames are processed to prevent omission of detection frames during processing. Therefore, the sorting method is not specifically described here, as long as the method for obtaining the sorting order should be within the protection scope of the embodiments of the present invention.
步骤S502,利用交并比阈值对检测框进行筛选。In step S502, the detection frame is screened by using the intersection ratio threshold.
可选地,从rbox_list中第一个检测框开始,依次确定每个检测框之间的交并比,可以为将该检测框与其他检测框进行交集运算,得到相应的相交点集合,根据判断相交点集合组成的凸边形的面积,从而确定每两个检测框二者之间的交并比,可以通过以下公式进行确定:Optionally, starting from the first detection frame in the rbox_list, determine the intersection ratio between each detection frame in turn. The intersection operation can be performed on the detection frame and other detection frames to obtain a corresponding set of intersection points. According to the judgment The area of the convex polygon formed by the set of intersection points, so as to determine the intersection ratio between each two detection frames, can be determined by the following formula:
IoU=area_r1r2/(area_r1+area_r2-arear1r2),IoU=area_r1r2/(area_r1+area_r2-area1r2),
其中,area_r1和area_r2分别为两个检测框面积,area_r1r2为两个检测框相交组成的凸边形面积。Among them, area_r1 and area_r2 are the areas of the two detection frames respectively, and area_r1r2 is the area of the convex edge formed by the intersection of the two detection frames.
可选地,若交并比大于交并比阈值(T),则实施步骤S503,若交并比不大于交并比阈值(T),则保留检测框。Optionally, if the intersection ratio is greater than the intersection ratio threshold (T), step S503 is performed, and if the intersection ratio is not greater than the intersection ratio threshold (T), the detection frame is retained.
步骤S503,计算检测框之间的帧数差,利用帧数差阈值对重叠检测框进行过滤。Step S503: Calculate the frame number difference between the detection frames, and filter the overlapping detection frames by using the frame number difference threshold.
可选地,若交并比大于交并比阈值,则说明两个检测框之间有重叠的可能,则对两个检测框之间的帧数差进行判断,计算检测框之间的帧数差。Optionally, if the intersection ratio is greater than the intersection ratio threshold, it means that there is a possibility of overlapping between the two detection frames, then the difference in the number of frames between the two detection frames is judged, and the number of frames between the detection frames is calculated. Difference.
可选地,判断帧数差阈值和检测框之间帧数差的大小,如果检测框之间的帧数差不大于帧数差阈值(time_threshold),则保留匹配度较大的检测框,去除匹配度较小的检测框;如果帧数差大于帧数差阈值,则保留对应时间帧数比较长的检测框,去除对应时间帧数比较短的检测框。Optionally, determine the frame number difference threshold and the size of the frame number difference between the detection frames, if the frame number difference between the detection frames is not greater than the frame number difference threshold (time_threshold), then retain the detection frame with a larger degree of matching, and remove it. A detection frame with a small matching degree; if the frame number difference is greater than the frame number difference threshold, the detection frame with a relatively long corresponding time frame number is retained, and the detection frame with a relatively short corresponding time frame number is removed.
在相关技术中,通过对单时间帧数的图像进行检测,忽略了在一段时间中,其他时间帧信息对感知结果的影响,而在本发明实施例中,对引入其余时间帧信息,根据对象出现的时间帧数设定时间阈值,从而过滤掉出现时间较短的重叠对象,进而提高了对对象识别的准确性。In the related art, by detecting images with a single time frame number, the influence of other time frame information on the perception result in a period of time is ignored. The number of time frames that appear sets a time threshold, thereby filtering out overlapping objects that appear for a short time, thereby improving the accuracy of object recognition.
可选地,对多个检测框经过步骤S502和步骤S504的处理,迭代操作,直至rbox_list中所有检测框都完成筛选。Optionally, the multiple detection boxes are processed in steps S502 and S504, and the iterative operation is performed until all detection boxes in the rbox_list are screened.
步骤S504,获取过滤后的目标,进行多传感器的数据融合。Step S504, acquiring the filtered target, and performing multi-sensor data fusion.
获取过滤后的目标,进行多传感器的数据融合,对经过多次过滤的检测框进行处理,得到待识别的对象。Obtain the filtered target, perform multi-sensor data fusion, and process the detection frame that has been filtered many times to obtain the object to be recognized.
在相关技术中,NMS算法只能实现二维水平矩形框的检测,对空间对象在场景中的过滤存在局限,本发明实施例提出一种基于INMS的检测算法实现了一种多传感器三维空间的目标检测过滤方法,该方法对于每个目标生成的三维检测框投影至地面生成对应的二维检测框,根据对检测框重叠关系的判断过滤匹配度低的检测框,并对应过滤相应的重叠对象,从而能够有效改善检测对象的重叠、跳变、关联结果不稳定等问题,从而达到提升多传感器空间对象的融合性能。In the related art, the NMS algorithm can only realize the detection of a two-dimensional horizontal rectangular frame, which has limitations on the filtering of spatial objects in the scene. The embodiment of the present invention proposes an INMS-based detection algorithm to realize a multi-sensor three-dimensional space detection algorithm. A target detection filtering method, in which the three-dimensional detection frame generated by each target is projected onto the ground to generate a corresponding two-dimensional detection frame, the detection frame with low matching degree is filtered according to the judgment of the overlapping relationship of the detection frame, and the corresponding overlapping objects are filtered correspondingly , which can effectively improve the overlapping, jumping, and unstable correlation results of detected objects, so as to improve the fusion performance of multi-sensor spatial objects.
进一步的,针对相关技术中,基于单帧观测信息进行检测过滤,忽略了其余时间戳信息对于感知结果作用的问题,本发明实施例对现有的INMS进行了改进,在INMS中对单个目标引入其余时间戳的信息,根据其出现时长设定时间阈值,过滤部分出现时间较短的重叠目标,并将检测框的匹配度置为其所有出现帧的匹配度均值,从而提升过滤的稳定性。Further, in the related art, detection and filtering are performed based on the observation information of a single frame, and the effect of the remaining timestamp information on the perception result is ignored. The embodiment of the present invention improves the existing INMS, and introduces a single target For the information of the remaining time stamps, a time threshold is set according to the duration of its occurrence, some overlapping targets with a short duration of occurrence are filtered, and the matching degree of the detection frame is set to the average matching degree of all the occurrence frames, thereby improving the stability of filtering.
在本发明实施例中,通过将多个检测框之间的时间帧数进行对比,达到过滤部分出现时间较短的重叠对象的目的,从而实现了提高对目标识别的准确率技术效果,解决了对目标识别的准确率低技术问题。In the embodiment of the present invention, by comparing the number of time frames between multiple detection frames, the purpose of filtering some overlapping objects with a short appearance time is achieved, thereby achieving the technical effect of improving the accuracy of target recognition, and solving the problem of The technical problem of low accuracy of target recognition.
实施例3Example 3
根据本发明实施例,还提供了一种对象的识别装置。需要说明的是,该对象的识别装置可以用于执行实施例1中的对象的识别方法。According to an embodiment of the present invention, an object identification device is also provided. It should be noted that the object identification device can be used to execute the object identification method in Embodiment 1.
图6是根据本发明实施例的一种对象的识别装置的示意图。如图6所示,该对象的识别装置600可以包括:获取单元602、第一确定单元604、第二确定单元606和识别单元608。FIG. 6 is a schematic diagram of a device for recognizing an object according to an embodiment of the present invention. As shown in FIG. 6 , the object identification apparatus 600 may include: an acquisition unit 602 , a first determination unit 604 , a second determination unit 606 and an identification unit 608 .
获取单元602,用于获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置。The obtaining unit 602 is configured to obtain multiple detection frames of the object in the video, wherein each detection frame is used to represent the position of the object in the video.
第一确定单元604,用于分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度。The first determining unit 604 is configured to respectively determine the number of time frames of the multiple detection frames in the video, where the number of time frames is used to represent the time length that each detection frame appears continuously in the video.
第二确定单元606,用于基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框。The second determining unit 606 is configured to determine a target detection frame in a plurality of detection frames based on the time frame number and the frame number difference threshold.
识别单元608,用于基于目标检测框在视频中识别出对象。The identifying unit 608 is configured to identify the object in the video based on the object detection frame.
可选地,第二确定单元606包括:第一确定模块,用于确定多个检测框中第一检测框和多个检测框中第二检测框的交并比,其中,第一检测框和第二检测框为多个检测框中处于同一时刻任意两个检测框;响应于交并比大于交并比阈值,获取第一检测框的时间帧数和第二检测框的时间帧数;基于第一检测框的时间帧数、第二检测框的时间帧数和帧数差阈值,在第一检测框和第二检测框中确定目标检测框。Optionally, the second determination unit 606 includes: a first determination module, configured to determine the intersection ratio of the first detection frame in the multiple detection frames and the second detection frame in the multiple detection frames, wherein the first detection frame and The second detection frame is any two detection frames at the same moment in multiple detection frames; in response to the intersection ratio being greater than the intersection ratio threshold, the time frame number of the first detection frame and the time frame number of the second detection frame are obtained; based on The time frame number of the first detection frame, the time frame number of the second detection frame, and the frame number difference threshold are used to determine the target detection frame in the first detection frame and the second detection frame.
可选地,第一确定模块包括:第一确定子模块,用于确定第一检测框的时间帧数和第二检测框的时间帧数二者之间的第一帧数差;基于第一帧数差和帧数差阈值,从第一检测框和第二检测框中确定目标检测框。Optionally, the first determination module includes: a first determination submodule for determining the first frame number difference between the time frame number of the first detection frame and the time frame number of the second detection frame; The frame number difference and the frame number difference threshold are used to determine the target detection frame from the first detection frame and the second detection frame.
可选地,第一确定子模块还用于响应于第一帧数差的绝对值大于帧数差阈值,将第一检测框和第二检测框中时间帧数长的检测框确定为目标检测框。Optionally, the first determination submodule is also used to determine the detection frame with a long number of time frames in the first detection frame and the second detection frame as the target detection in response to the absolute value of the first frame number difference being greater than the frame number difference threshold. frame.
可选地,第一确定子模块还用于响应于第一帧数差的绝对值不大于帧数差阈值,将第一检测框和第二检测框中匹配度大的检测框确定为目标检测框,其中,匹配度用于表征对应的检测框与对象的匹配程度。Optionally, the first determination sub-module is also used to determine the detection frame with a large degree of matching between the first detection frame and the second detection frame as the target detection in response to the absolute value of the first frame number difference being not greater than the frame number difference threshold. box, where the matching degree is used to characterize the matching degree between the corresponding detection frame and the object.
可选地,装置还包括:第三确定单元,用于获取每个检测框在视频中每一时刻对应的匹配度,得到每个检测框的至少一匹配度;将检测框的至少一匹配度之和与至少一匹配度的数量二者之间的商,确定为检测框的目标匹配度。Optionally, the device further includes: a third determining unit, configured to obtain the matching degree corresponding to each detection frame at each moment in the video, and obtain at least one matching degree of each detection frame; The quotient between the sum and the number of at least one matching degree is determined as the target matching degree of the detection frame.
可选地,装置还包括:第四确定单元,用于基于对对象处理的历史检测数据,确定帧数差阈值。Optionally, the apparatus further includes: a fourth determining unit, configured to determine the frame number difference threshold based on historical detection data processed on the object.
在本发明实施例中,通过获取单元,获取视频中对象的多个检测框,其中,每个检测框用于表示对象在视频中的位置;通过第一确定单元,分别确定多个检测框在视频中的时间帧数,其中,时间帧数用于表征每个检测框在视频中连续出现的时间长度;通过第二确定单元,基于时间帧数和帧数差阈值,在多个检测框中确定目标检测框;通过识别单元,基于目标检测框在视频中识别出对象。也就是说,本发明实施例通过将多个检测框之间的时间帧数进行对比,达到过滤部分出现时间较短的重叠对象的目的,从而实现了提高对目标识别的准确率技术效果,解决了对目标识别的准确率低技术问题。In the embodiment of the present invention, the acquisition unit acquires multiple detection frames of the object in the video, wherein each detection frame is used to represent the position of the object in the video; The number of time frames in the video, where the number of time frames is used to characterize the time length that each detection frame appears continuously in the video; through the second determination unit, based on the number of time frames and the frame number difference threshold, in multiple detection frames Determine the target detection frame; through the recognition unit, identify the object in the video based on the target detection frame. That is to say, the embodiment of the present invention achieves the purpose of filtering some overlapping objects with a short appearance time by comparing the number of time frames between multiple detection frames, thereby achieving the technical effect of improving the accuracy of target recognition, and solving the problem of The technical problem of low accuracy of target recognition is solved.
实施例4Example 4
根据本发明实施例,还提供了一种计算机可读存储介质,该存储介质包括存储的程序,其中,所述程序执行实施例1中所述的对象的识别方法。According to an embodiment of the present invention, a computer-readable storage medium is further provided, and the storage medium includes a stored program, wherein the program executes the object identification method described in Embodiment 1.
实施例5Example 5
根据本发明实施例,还提供了一种处理器,该处理器用于运行程序,其中,所述程序运行时执行实施例1中所述的对象的识别方法。According to an embodiment of the present invention, there is also provided a processor for running a program, wherein the method for identifying an object described in Embodiment 1 is executed when the program is run.
实施例6Example 6
根据本发明实施例,还提供了一种车辆,该车辆用于运行程序,其中,所述程序运行时执行实施例1中所述的对象的识别方法。According to an embodiment of the present invention, there is also provided a vehicle for running a program, wherein the method for recognizing an object described in Embodiment 1 is executed when the program is running.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are only illustrative, for example, the division of the units may be a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes .
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. It should be regarded as the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210663391.XA CN115115978B (en) | 2022-06-13 | 2022-06-13 | Object recognition method, device, storage medium and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210663391.XA CN115115978B (en) | 2022-06-13 | 2022-06-13 | Object recognition method, device, storage medium and processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115115978A true CN115115978A (en) | 2022-09-27 |
CN115115978B CN115115978B (en) | 2025-04-15 |
Family
ID=83328664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210663391.XA Active CN115115978B (en) | 2022-06-13 | 2022-06-13 | Object recognition method, device, storage medium and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115115978B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117037045A (en) * | 2023-10-08 | 2023-11-10 | 成都考拉悠然科技有限公司 | Anomaly detection system based on fusion clustering and deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008631A (en) * | 2019-12-20 | 2020-04-14 | 浙江大华技术股份有限公司 | Image association method and device, storage medium and electronic device |
CN111126235A (en) * | 2019-12-18 | 2020-05-08 | 浙江大华技术股份有限公司 | Method and device for detecting and processing illegal berthing of ship |
CN112949785A (en) * | 2021-05-14 | 2021-06-11 | 长沙智能驾驶研究院有限公司 | Object detection method, device, equipment and computer storage medium |
CN114581983A (en) * | 2022-03-04 | 2022-06-03 | 浪潮(北京)电子信息产业有限公司 | Detection frame processing method for target detection and related device |
-
2022
- 2022-06-13 CN CN202210663391.XA patent/CN115115978B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126235A (en) * | 2019-12-18 | 2020-05-08 | 浙江大华技术股份有限公司 | Method and device for detecting and processing illegal berthing of ship |
CN111008631A (en) * | 2019-12-20 | 2020-04-14 | 浙江大华技术股份有限公司 | Image association method and device, storage medium and electronic device |
CN112949785A (en) * | 2021-05-14 | 2021-06-11 | 长沙智能驾驶研究院有限公司 | Object detection method, device, equipment and computer storage medium |
CN114581983A (en) * | 2022-03-04 | 2022-06-03 | 浪潮(北京)电子信息产业有限公司 | Detection frame processing method for target detection and related device |
Non-Patent Citations (1)
Title |
---|
侯亚伟: "无人机载激光雷达路面信息检测方法研究", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, no. 3, 15 March 2020 (2020-03-15), pages 034 - 79 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117037045A (en) * | 2023-10-08 | 2023-11-10 | 成都考拉悠然科技有限公司 | Anomaly detection system based on fusion clustering and deep learning |
CN117037045B (en) * | 2023-10-08 | 2024-04-26 | 成都考拉悠然科技有限公司 | Anomaly detection system based on fusion clustering and deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN115115978B (en) | 2025-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084095B (en) | Lane line detection method, lane line detection apparatus, and computer storage medium | |
CN106354816B (en) | video image processing method and device | |
US11288820B2 (en) | System and method for transforming video data into directional object count | |
JP5127182B2 (en) | Object detection device | |
CN114565900A (en) | Target detection method based on improved YOLOv5 and binocular stereo vision | |
WO2016129403A1 (en) | Object detection device | |
CN111753609A (en) | Target identification method and device and camera | |
JP5164351B2 (en) | Object detection apparatus and object detection method | |
CN110544271B (en) | Parabolic motion detection method and related device | |
CN114332708A (en) | Traffic behavior detection method and device, electronic equipment and storage medium | |
CN112528781B (en) | Obstacle detection method, device, equipment and computer readable storage medium | |
KR102120812B1 (en) | Target recognition and classification system based on probability fusion of camera-radar and method thereof | |
CN113920585A (en) | Behavior recognition method and device, equipment and storage medium | |
KR101840042B1 (en) | Multi-Imaginary Fence Line Setting Method and Trespassing Sensing System | |
CN117351395A (en) | Night scene target detection method, system and storage medium | |
CN108830204B (en) | Anomaly detection method in target-oriented surveillance video | |
CN117593548A (en) | Visual SLAM method for removing dynamic feature points based on weighted attention mechanism | |
CN116311166A (en) | Traffic obstacle recognition method and device and electronic equipment | |
CN114913470B (en) | Event detection method and device | |
US20200394802A1 (en) | Real-time object detection method for multiple camera images using frame segmentation and intelligent detection pool | |
CN115115978A (en) | Object recognition method, device, storage medium and processor | |
CN115424224A (en) | Target detection method, device, equipment and storage medium | |
CN117423073A (en) | Vehicle state identification method and device, electronic equipment and storage medium | |
KR101117235B1 (en) | Apparatus and method for recognizing traffic accident | |
Shalma et al. | Deep-learning based object detection and shape recognition in multiple occluded images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |