TWI855331B

TWI855331B - Method for detecting three-dimensional object, electronic device and storage medium

Info

Publication number: TWI855331B
Application number: TW111120337A
Authority: TW
Inventors: 盧志德; 李潔; 郭錦斌
Original assignee: 鴻海精密工業股份有限公司
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2024-09-11
Also published as: TW202349347A

Abstract

The present application provides a method for detect three-dimensional (3D) objects, an electronic device and a storage medium. The method includes: acquiring a detection image; inputting the detection image into a trained target detection model, and determining an object in the detection image and a category of the object, a two-dimensional bounding box of the object and a rotation angle of the object by using the trained target detection model; according to the category of the object, searching a 3D object model library for an object model corresponding to the object and a 3D bounding box corresponding to the object model; determining a distance from a camera to the object model according to a size of the 2D bounding box of the object, image information of the detected image and focal length of the camera. A position of the object model in the 3D space is determined according to the rotation angle of the object, the distance from the camera to the object model, and the 3D bounding box, and the position of the object model in the 3D space is determined to be a position of the object in the 3D space. This application can quickly determine a position of an object in the 3D space.

Description

Three-dimensional target detection method, electronic device and computer-readable storage medium

本申請涉及計算機視覺和深度學習技術、尤其涉及一種三維目標檢測方法、電子設備及計算機可讀存儲媒體。 This application relates to computer vision and deep learning technology, and in particular to a three-dimensional target detection method, electronic equipment, and computer-readable storage medium.

在自動駕駛領域中，根據不同的傳感器可以檢測車輛前方或附近的物體，自動駕駛才能做出對應的决策。因此自動駕駛系統需要快速精準的檢測出物體的類別及位置，才能確保行車安全。目前多數三維目標檢測算法需要利用回歸運算才能得到物體的三維位置，然而使用回歸運算，造成三維目標檢測算法在預測處理上耗費很長時間。此外，在檢測車輛與前方物體的距離時，目前三維目標檢測算法是利用光達或者雷達取得深度信息，但使用光達或者雷達的成本高昂、視場範圍比較小。 In the field of autonomous driving, different sensors can detect objects in front of or near the vehicle, so that the autonomous driving system can make corresponding decisions. Therefore, the autonomous driving system needs to quickly and accurately detect the type and position of the object to ensure driving safety. Currently, most three-dimensional target detection algorithms need to use regression operations to obtain the three-dimensional position of the object. However, the use of regression operations causes the three-dimensional target detection algorithm to spend a long time on prediction processing. In addition, when detecting the distance between the vehicle and the object in front, the current three-dimensional target detection algorithm uses lidar or radar to obtain depth information, but the cost of using lidar or radar is high and the field of view is relatively small.

鑒於以上內容，有必要提供一種三維目標檢測方法、電子設備及計算機可讀存儲媒體，以解決檢測物體的類別及三維位置速度慢的問題，並減少了檢測成本。 In view of the above, it is necessary to provide a three-dimensional target detection method, electronic equipment and computer-readable storage medium to solve the problem of slow speed in detecting the type and three-dimensional position of objects and reduce the detection cost.

本申請實施例提供一種三維目標檢測方法，所述三維目標檢測方法包括：獲取相機拍攝的檢測圖像；將所述檢測圖像輸入至訓練完成的目標檢測模型，利用所述目標檢測模型確定所述檢測圖像中物體的物體類別、物體的二維邊線框及物體的旋轉角度；根據所述物體類別，查找三維物體模型庫確定與所述物體對應的物件模型及與所述物件模型對應的三維邊線框；根據所述物體的二維邊線框的大小、所述檢測圖像的圖像信息及所述相機的焦距確定所述相機到所述物件模型的距離；根據所述物體的旋轉角度、所述相機到所述物件模型的距離及所述三維邊線框確定所述物件模型在三維空間中的位置，將所述物件模型在三維空間中的值作為所述物體在三維空間中的位置。 The present application embodiment provides a three-dimensional target detection method, which includes: obtaining a detection image taken by a camera; inputting the detection image into a trained target detection model, and using the target detection model to determine the object category of the object in the detection image, the two-dimensional boundary frame of the object, and the rotation angle of the object; according to the object category, searching a three-dimensional object model library to determine the object model corresponding to the object and the object model corresponding to the object; The three-dimensional boundary frame corresponding to the object model is obtained; the distance from the camera to the object model is determined according to the size of the two-dimensional boundary frame of the object, the image information of the detection image and the focal length of the camera; the position of the object model in three-dimensional space is determined according to the rotation angle of the object, the distance from the camera to the object model and the three-dimensional boundary frame, and the value of the object model in three-dimensional space is used as the position of the object in three-dimensional space.

在一種可選的實施方式中，所述方法還包括：構建所述目標檢測模型，基於Single Shot MultiBox Detector(SSD)網路改進所述目標檢測模型，其中，所述SSD網路包括backbone網路及第一head網路；所述基於SSD網路進行改進包括：在所述SSD網路中的所述backbone網路後新增第二head網路；改進後，所述目標檢測模型包括所述backbone網路、所述第一head網路及所述第二head網路。 In an optional implementation, the method further includes: constructing the target detection model, improving the target detection model based on the Single Shot MultiBox Detector (SSD) network, wherein the SSD network includes a backbone network and a first head network; the improvement based on the SSD network includes: adding a second head network after the backbone network in the SSD network; after the improvement, the target detection model includes the backbone network, the first head network and the second head network.

在一種可選的實施方式中，所述方法還包括：獲取訓練樣本圖像；利用所述目標檢測模型中所述backbone網路對所述訓練樣本圖像進行特徵提取，得到所述訓練樣本圖像的多個不同尺度的訓練特徵圖；在所述多個不同尺度的訓練特徵圖上生成多個第一默認框及將每個不同尺度的訓練特徵圖輸入所述第一head網路進行卷積，輸出所述多個第一默認框內物體的物體類別的得分和所述多個第一默認框的位置；將所述多個第一默認框進行非極大值抑制運算，輸出所述訓練樣本圖像中物體的二維邊線框，其中，所述訓練樣本圖像中物體的二維邊線框包括所述二維邊線框內的物體類別及所述二維邊線框的位置；將每個不同尺度的訓練特徵圖輸入所述第二head網路進行卷積，輸出所述訓練樣本圖像中物體的旋轉角度；最小化所述第一head網路及所述第二head網路的損失值，得到訓練完成的目標檢測模型。 In an optional implementation, the method further includes: obtaining a training sample image; using the backbone network in the target detection model to extract features from the training sample image to obtain a plurality of training feature maps of different scales of the training sample image; generating a plurality of first default frames on the plurality of training feature maps of different scales and inputting each of the training feature maps of different scales into the first head network for convolution, and outputting the scores of the object categories of the objects in the plurality of first default frames and the positions of the plurality of first default frames. ; Perform non-maximum suppression operation on the multiple first default frames, and output the two-dimensional boundary frame of the object in the training sample image, wherein the two-dimensional boundary frame of the object in the training sample image includes the object category in the two-dimensional boundary frame and the position of the two-dimensional boundary frame; input each training feature map of different scales into the second head network for convolution, and output the rotation angle of the object in the training sample image; minimize the loss value of the first head network and the second head network, and obtain the trained target detection model.

在一種可選的實施方式中，所述方法還包括：對所述訓練樣本圖像進行數據增強處理以增加所述訓練樣本圖像，其中所述數據增強處理包括對所述訓練樣本圖像進行翻轉、旋轉、縮放比例、移位處理。 In an optional implementation, the method further includes: performing data enhancement processing on the training sample image to increase the training sample image, wherein the data enhancement processing includes flipping, rotating, scaling, and shifting the training sample image.

在一種可選的實施方式中，所述根據所述物體類別，查找三維物體模型庫確定與所述物體對應的物件模型及與所述物件模型對應的三維邊線框包括：建立三維物體模型庫，其中，所述三維物體模型庫包括與不同物體類別對應的多個物件模型及與每個物件模型對應的三維邊線框，所述三維邊線框包括所述物體類別對應的長、寬、高。 In an optional implementation, searching a three-dimensional object model library to determine an object model corresponding to the object and a three-dimensional boundary frame corresponding to the object model according to the object category includes: establishing a three-dimensional object model library, wherein the three-dimensional object model library includes multiple object models corresponding to different object categories and a three-dimensional boundary frame corresponding to each object model, and the three-dimensional boundary frame includes the length, width, and height corresponding to the object category.

在一種可選的實施方式中，所述根據所述物體的二維邊線框的大小、所述檢測圖像的圖像信息及所述相機的焦距確定所述相機到所述物件模型的距離包括：根據所述物體的二維邊線框的寬度及/或長度、所述相機焦距、所述檢測圖像的分辨率及像素寬度計算得到所述相機到所述物件模型的距離。 In an optional implementation, determining the distance from the camera to the object model according to the size of the two-dimensional bounding box of the object, the image information of the detection image, and the focal length of the camera includes: calculating the distance from the camera to the object model according to the width and/or length of the two-dimensional bounding box of the object, the focal length of the camera, the resolution of the detection image, and the pixel width.

在一種可選的實施方式中，所述根據所述物體的旋轉角度、所述相機到所述物件模型的距離及所述三維邊線框確定所述物件模型在三維空間中的位置包括：將所述物體的旋轉角度作為所述物件模型的旋轉角度；根據所述物件模型的旋轉角度確定所述物件模型在所述三維空間中的方向；根據所述物件模型在所述三維空間中的方向、所述相機到所述物件模型的距離及所述三維邊線框，確定所述物件模型在三維空間中的位置。 In an optional implementation, determining the position of the object model in the three-dimensional space according to the rotation angle of the object, the distance from the camera to the object model, and the three-dimensional boundary frame includes: using the rotation angle of the object as the rotation angle of the object model; determining the direction of the object model in the three-dimensional space according to the rotation angle of the object model; determining the position of the object model in the three-dimensional space according to the direction of the object model in the three-dimensional space, the distance from the camera to the object model, and the three-dimensional boundary frame.

在一種可選的實施方式中，所述方法還包括：輸出所述物體類別及所述物體在三維空間中的位置並將所述物體類別及所述物體在三維空間中的位置顯示於顯示屏。 In an optional implementation, the method further includes: outputting the object category and the position of the object in three-dimensional space and displaying the object category and the position of the object in three-dimensional space on a display screen.

本申請實施例還提供一種電子設備，所述電子設備包括處理器和記憶體，所述處理器用於執行所述記憶體中存儲的計算機程式時實現所述的三維目標檢測方法。 This application embodiment also provides an electronic device, which includes a processor and a memory, and the processor is used to implement the three-dimensional target detection method when executing a computer program stored in the memory.

本申請實施例還提供一種計算機可讀存儲媒體，所述計算機可讀存儲媒體上存儲有計算機程式，所述計算機程式被處理器執行時實現所述的三維目標檢測方法。 This application embodiment also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the three-dimensional target detection method is implemented.

藉由利用本申請各實施例提供的技術方案，不需進行複雜的運算減少了人力成本且能快速地得到物體的三維位置。 By utilizing the technical solutions provided by each embodiment of this application, there is no need to perform complex calculations, which reduces labor costs and can quickly obtain the three-dimensional position of an object.

5:電子設備 5: Electronic equipment

501:記憶體 501: Memory

502:處理器 502:Processor

503:計算機程式 503:Computer program

504:通訊匯流排 504: Communication bus

X:檢測圖像的像素寬度 X: Pixel width of the detected image

G:表示相機的光心 G: represents the optical center of the camera

Y:二維邊線框的寬度 Y: The width of the two-dimensional border frame

f:相機的焦距 f: Focal length of the camera

d:相機到物件模型的距離 d: distance from camera to object model

101-106:步驟 101-106: Steps

21-24:步驟 21-24: Steps

圖1為本申請實施例提供的一種三維目標檢測方法的流程圖。 Figure 1 is a flow chart of a three-dimensional target detection method provided in an embodiment of this application.

圖2為本申請實施例提供的非極大值抑制方法流程圖。 Figure 2 is a flow chart of the non-maximum value suppression method provided in the embodiment of this application.

圖3為本申請實施例提供的三維物體模型庫示意圖。 Figure 3 is a schematic diagram of the three-dimensional object model library provided in this application embodiment.

圖4為本申請實施例提供的計算相機到物體的物件模型的距離方法示意圖。 FIG4 is a schematic diagram of a method for calculating the distance from a camera to an object model of an object provided in an embodiment of the present application.

圖5為本申請實施例提供的一種電子設備的結構示意圖。 Figure 5 is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

為了能夠更清楚地理解本申請的上述目的、特徵和優點，下面結合附圖和具體實施例對本申請進行詳細描述。需要說明的是，此處所描述的具體實施例僅用以解釋本申請，並不用於限定本申請。 In order to more clearly understand the above-mentioned purpose, features and advantages of this application, the following is a detailed description of this application in conjunction with the attached drawings and specific embodiments. It should be noted that the specific embodiments described here are only used to explain this application and are not intended to limit this application.

在下面的描述中闡述了很多具體細節以便於充分理解本申請，所描述的實施例僅僅是本申請一部分實施例，而不是全部的實施例。基於本申請中的實施例，本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例，都屬本申請保護的範圍。 In the following description, many specific details are explained to facilitate a full understanding of this application. The embodiments described are only part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by ordinary technicians in this field without creative labor are within the scope of protection of this application.

除非另有定義，本文所使用的所有的技術和科學術語與屬本申請的技術領域的技術人員通常理解的含義相同。本文中在本申請的說明書中所使用的術語只是為了描述具體的實施例的目的，不是旨在於限制本申請。 Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by technicians in the technical field of this application. The terms used in this specification of this application are only for the purpose of describing specific embodiments and are not intended to limit this application.

在自動駕駛領域中，根據不同的傳感器可以檢測車輛前方或附近的物體，自動駕駛系統才能做出對應的决策。因此自動駕駛系統需要快速精準的檢測出物體的類別及位置，才能確保行車安全。目前多數三維目標檢測算法在檢測出物體的類別及位置信息後利用回歸運算得到物體的三維位置，然而使用回歸運算，造成三維目標檢測算法在預測處理上花費很長時間。此外，在檢測車輛與前方物體的距離時，目前三維目標檢測算法是利用光達或者雷達取得深度信息，但使用光達或者雷達的成本高昂、視場範圍比較小。 In the field of autonomous driving, different sensors can detect objects in front of or near the vehicle, and the autonomous driving system can make corresponding decisions. Therefore, the autonomous driving system needs to quickly and accurately detect the type and position of the object to ensure driving safety. Currently, most three-dimensional target detection algorithms use regression operations to obtain the three-dimensional position of the object after detecting the type and position information of the object. However, the use of regression operations causes the three-dimensional target detection algorithm to spend a long time on prediction processing. In addition, when detecting the distance between the vehicle and the object in front, the current three-dimensional target detection algorithm uses lidar or radar to obtain depth information, but the cost of using lidar or radar is high and the field of view is relatively small.

基於上述問題，本申請實施例提供一種三維目標檢測方法，無需使用到回歸運算，能夠快速得到物體的類別及三維位置，而且在獲取檢測車輛與前方物體的距離時，無需利用光達或者雷達獲取深度信息，降低了成本。下文將對所述三維目標檢測方法進行詳細描述。 Based on the above problems, the embodiment of the present application provides a three-dimensional target detection method, which can quickly obtain the category and three-dimensional position of the object without using regression operation, and when obtaining the distance between the detection vehicle and the object in front, it is not necessary to use lidar or radar to obtain depth information, thereby reducing costs. The three-dimensional target detection method will be described in detail below.

參閱圖1所示，為本申請實施例提供的一種三維目標檢測方法的流程圖。所述方法應用於電子設備(例如，圖5所示的電子設備5)中，所述電子設備可以是任何一種可與用戶進行人機交互的電子產品，例如，個人計算機、平板電腦、智能手機、個人數位助理(Personal Digital Assistant，PDA)、遊戲機、交互式網路電視(Internet Protocol Television，IPTV)、智能穿戴式裝置等。 Refer to FIG. 1, which is a flow chart of a three-dimensional target detection method provided in an embodiment of the present application. The method is applied to an electronic device (e.g., the electronic device 5 shown in FIG. 5), and the electronic device can be any electronic product that can interact with a user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television (IPTV), a smart wearable device, etc.

所述電子設備是一種能夠按照事先設定或存儲的指令，自動進行數值計算和/或信息處理的設備，其硬體包括，但不限於：微處理器、專用集成電路(Application Specific Integrated Circuit，ASIC)、可編程門陣列(Field-Programmable Gate Array，FPGA)、數位訊號處理器(Digital Signal Processor，DSP)、嵌入式設備等。 The electronic device is a device that can automatically perform numerical calculations and/or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to: microprocessors, application specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), digital signal processors (DSP), embedded devices, etc.

所述電子設備還可以包括網路設備和/或用戶設備。其中，所述網路設備包括，但不限於單個網路伺服器、多個網路伺服器組成的伺服器組或基於雲計算(Cloud Computing)的由大量主機或網路伺服器構成的雲。 The electronic device may also include network equipment and/or user equipment. The network equipment includes, but is not limited to, a single network server, a server group consisting of multiple network servers, or a cloud consisting of a large number of hosts or network servers based on cloud computing.

所述電子設備所處的網路包括但不限於網際網路、廣域網、城域網、區域網路、虛擬專用網路(Virtual Private Network，VPN)等。 The network where the electronic device is located includes but is not limited to the Internet, wide area network, metropolitan area network, local area network, virtual private network (VPN), etc.

所述方法具體包括如下步驟。 The method specifically includes the following steps.

步驟101，獲取訓練樣本圖像及相機所拍攝的檢測圖像。 Step 101, obtain training sample images and detection images taken by the camera.

在本申請的至少一個實施例中，訓練樣本圖像包括，但不限於：在不同時間段的城市公路、鄉村公路上的各類場景圖像。所述相機所拍攝的檢測圖像包括，但不限於單目相機拍攝不同時間段的城市公路、鄉村公路上的場景圖像。需要說明的是，雖然所述訓練樣本圖像與所述單目相機所拍攝的圖像都是取不同時間段城市、鄉村不同時間段的場景圖像，但所述訓練樣本圖像與所述相機所拍攝的圖像並不相同。 In at least one embodiment of the present application, the training sample images include, but are not limited to: various scene images on urban roads and rural roads at different time periods. The detection images taken by the camera include, but are not limited to scene images on urban roads and rural roads taken by a monocular camera at different time periods. It should be noted that although the training sample images and the images taken by the monocular camera are both scene images of cities and rural areas at different time periods, the training sample images are not the same as the images taken by the camera.

在本申請的至少一個實施例中，所述獲取訓練樣本圖像還包括：對所述訓練樣本圖像進行數據增強操作，以獲取更多不相同的訓練樣本圖像，所述數據增強操作包括，但不限於翻轉圖像、旋轉圖像、縮放圖像、裁剪圖像。藉由所述數據增強操作可以有效擴充樣本數據，使用更多不同場景下的訓練樣本圖像訓練並優化所述目標檢測模型，使所述目標檢測模型更具魯棒性。 In at least one embodiment of the present application, the step of obtaining the training sample images further includes: performing a data enhancement operation on the training sample images to obtain more different training sample images, and the data enhancement operation includes, but is not limited to, flipping the image, rotating the image, scaling the image, and cropping the image. The data enhancement operation can effectively expand the sample data, use more training sample images in different scenes to train and optimize the target detection model, and make the target detection model more robust.

步驟102，構建目標檢測模型，並利用訓練樣本圖像訓練所述目標檢測模型，得到訓練完成的目標檢測模型。 Step 102, construct a target detection model, and use the training sample images to train the target detection model to obtain a trained target detection model.

在本申請的至少一個實施例中，所述構建所述目標檢測模型包括：基於Single Shot MultiBox Detector(SSD)網路改進所述目標檢測模型，其中，所述SSD網路包括backbone網路及第一head網路。在本實施例中，所述基於SSD網路改進所述目標檢測模型包括：在所述SSD網路中的所述backbone網路後新增第二head網路。改進後，所述目標檢測模型包括所述backbone網路、所述第一head網路及所述第二head網路。 In at least one embodiment of the present application, the constructing of the target detection model includes: improving the target detection model based on a Single Shot MultiBox Detector (SSD) network, wherein the SSD network includes a backbone network and a first head network. In this embodiment, the improving of the target detection model based on the SSD network includes: adding a second head network after the backbone network in the SSD network. After the improvement, the target detection model includes the backbone network, the first head network, and the second head network.

在本申請的至少一個實施例中，利用訓練樣本圖像訓練所述目標檢測模型，得到訓練完成的目標檢測模型包括：利用所述目標檢測模型中所述backbone網路對所述訓練樣本圖像進行特徵提取，得到所述訓練樣本圖像的多個不同尺度的訓練特徵圖；在所述多個不同尺度的訓練特徵圖上生成多個第一默認框及將每個不同尺度的訓練特徵圖輸入所述第一head網路進行卷積，輸出所述多個默認框內物體的物體類別的得分和所述多個第一默認框的位置；將所述多個第一默認框進行非極大值抑制運算輸出所述訓練樣本圖像中物體的二維邊線框，其中，所述訓練樣本圖像中物體的二維邊線框包括所述二維邊線框內的物體類別及所述二維邊線框的位置；將每個不同尺度的訓練特徵圖輸入所述第二head網路進行卷積，輸出所述訓練樣本圖像中物體的旋轉角度；最小化所述第一head網路及所述第二head網路的損失值，得到訓練完成的目標檢測模型。 In at least one embodiment of the present application, the target detection model is trained using the training sample image to obtain the trained target detection model, including: using the backbone network in the target detection model to extract features from the training sample image to obtain a plurality of training feature maps of different scales of the training sample image; generating a plurality of first default frames on the plurality of training feature maps of different scales and inputting each of the training feature maps of different scales into the first head network for convolution, outputting the scores of the object categories of the objects in the plurality of default frames and the The positions of multiple first default frames; the multiple first default frames are subjected to non-maximum suppression operation to output the two-dimensional boundary frame of the object in the training sample image, wherein the two-dimensional boundary frame of the object in the training sample image includes the object category in the two-dimensional boundary frame and the position of the two-dimensional boundary frame; each training feature map of different scales is input into the second head network for convolution, and the rotation angle of the object in the training sample image is output; the loss values of the first head network and the second head network are minimized to obtain a trained target detection model.

在本申請的至少一個實施例中，所述第一head網路包括多個卷積層。 In at least one embodiment of the present application, the first head network includes multiple convolutional layers.

在本申請的至少一個實施例中，所述第二head網路包括多個卷積層及全連接層組成。在將每個不同尺度的訓練特徵圖輸入所述第二head網路的卷積層進行卷積後，由全連接層輸出所述訓練樣本圖像中物體的旋轉角度。 In at least one embodiment of the present application, the second head network includes a plurality of convolutional layers and a fully connected layer. After each training feature map of different scales is input into the convolutional layer of the second head network for convolution, the fully connected layer outputs the rotation angle of the object in the training sample image.

在本申請的至少一個實施例中，採用SSD網路的損失函數作為第一head網路的損失函數，所述SSD網路的損失函數為現有技術，本申請在此不再贅述。 In at least one embodiment of the present application, the loss function of the SSD network is used as the loss function of the first head network. The loss function of the SSD network is a prior art and will not be further described in this application.

在本申請的至少一個實施例中，所述第二head網路的損失函數為：

其中，LOSS為第二head網路的損失函數，Y _i表示第二head網路的物體的旋轉角度輸出值，f(x _i)表示物體的旋轉角度的實際值。 In at least one embodiment of the present application, the loss function of the second head network is:

Wherein, LOSS is the loss function of the second head network, Yi represents the rotation angle output value of the object of the second head network, _and f ( xi ₎ represents the actual value of the rotation angle of the object.

在本申請的至少一個實施例中，利用第一head網路及所述第二head網路的損失函數對目標檢測模型進行訓練，藉由迭代訓練直到目標檢測模型收斂，達到最小化損失值，得到訓練完成的目標檢測模型。 In at least one embodiment of the present application, the target detection model is trained using the loss functions of the first head network and the second head network, and the target detection model is trained by iterative training until the target detection model converges and the loss value is minimized to obtain a trained target detection model.

在本申請的至少一個實施例中，所述將所述多個第一默認框進行非極大值抑制運算輸出所述訓練樣本圖像中物體的二維邊線框包括：由上述可知，所述多個第一默認框內有物體的物體類別的得分和所述多個默認框的位置。所述進行非極大值抑制運算(Non-Maximum Suppression，NMS)參考圖2所述流程圖，具體包括： In at least one embodiment of the present application, the method of performing non-maximum suppression operation on the multiple first default frames to output the two-dimensional boundary frame of the object in the training sample image includes: As can be seen from the above, the scores of the object categories of the objects in the multiple first default frames and the positions of the multiple default frames. The method of performing non-maximum suppression operation (NMS) refers to the flowchart shown in Figure 2, which specifically includes:

步驟21，按照第一默認框的得分，對多個第一默認框進行排序，選擇得分最高的第一默認框。 Step 21, sort multiple first default boxes according to their scores, and select the first default box with the highest score.

步驟22，遍歷其他第一默認框，計算其他第一默認框與選擇的第一默認框之間的交並比(Intersection Over Union，IOU)，删除大於預設閾值的交並比對應的第一默認框。在本實施例中，所述交並比為選擇的第一默認框(即得分最高的)與其他第一默認框之間的重疊程度。 Step 22, traverse other first default frames, calculate the intersection over union (IOU) between other first default frames and the selected first default frame, and delete the first default frames corresponding to the IOU greater than the preset threshold. In this embodiment, the IOU is the degree of overlap between the selected first default frame (i.e., the frame with the highest score) and other first default frames.

步驟23，判斷除了所述選擇的第一默認框之外，是否還存在其他的第一默認框。若還存在其他的第一默認框，流程返回步驟21。若不存在其他的第一默認框，執行步驟24，輸出所述選擇的第一默認框作為所述訓練圖像中物體的二維邊線框。 Step 23, determine whether there are other first default frames besides the selected first default frame. If there are other first default frames, the process returns to step 21. If there are no other first default frames, execute step 24, output the selected first default frame as the two-dimensional boundary frame of the object in the training image.

步驟103，將所述檢測圖像輸入至訓練完成的目標檢測模型，利用所述目標檢測模型確定所述檢測圖像中物體的物體類別、物體的二維邊線框及物體的旋轉角度。 Step 103, input the detection image into the trained target detection model, and use the target detection model to determine the object category, the two-dimensional boundary frame and the rotation angle of the object in the detection image.

在本申請的至少一個實施例中，將所述檢測圖像輸入至訓練完成的目標檢測模型，經過backbone網路提取檢測圖像的特徵得到多個不同尺度的特徵圖。在所述多個不同尺度的特徵圖上生成多個第二默認框及將每個不同尺度的特徵圖輸入所述第一head網路進行卷積，輸出所述多個第二默認框內物體的物體類別的得分和所述多個第二默認框的位置，將所述多個第二默認框進行非極大值抑制運算輸出所述檢測圖像中物體的二維邊線框，其中，所述檢測圖像中物體的二維邊線框包括所述二維邊線框內的物體類別及所述二維邊線框的位置，將每個不同尺度的特徵圖輸入所述第二head網路進行卷積，輸出所述檢測圖像中物體的旋轉角度。至此，將所述檢測圖像輸入所述訓練完成的目標檢測模型，輸出所述檢測圖像中的物體類別、物體的二維邊線框及物體的旋轉角度。 In at least one embodiment of the present application, the detection image is input into a trained target detection model, and features of the detection image are extracted through a backbone network to obtain feature maps of multiple scales. Multiple second default frames are generated on the feature maps of multiple scales, and each feature map of different scales is input into the first head network for convolution, and the scores of the object categories of the objects in the multiple second default frames and the positions of the multiple second default frames are output. The multiple second default frames are subjected to non-maximum suppression operations to output the two-dimensional boundary frame of the object in the detection image, wherein the two-dimensional boundary frame of the object in the detection image includes the object category in the two-dimensional boundary frame and the position of the two-dimensional boundary frame. Each feature map of different scales is input into the second head network for convolution, and the rotation angle of the object in the detection image is output. At this point, the detection image is input into the trained target detection model, and the object category, the two-dimensional boundary frame of the object and the rotation angle of the object in the detection image are output.

步驟104，根據所述物體類別，查找三維物體模型庫確定與所述物體對應的物件模型及與所述物件模型對應的三維邊線框。 Step 104, according to the object category, search the three-dimensional object model library to determine the object model corresponding to the object and the three-dimensional boundary frame corresponding to the object model.

在本申請的至少一個實施例中，所述方法還包括：建立三維物體模型庫，其中，所述三維物體模型庫包括與不同物體類別對應的多個物件模型及與每個物件模型對應的三維邊線框，所述三維邊線框包括所述物體類別對應的長、寬、高。 In at least one embodiment of the present application, the method further includes: establishing a three-dimensional object model library, wherein the three-dimensional object model library includes multiple object models corresponding to different object categories and a three-dimensional border frame corresponding to each object model, wherein the three-dimensional border frame includes the length, width, and height corresponding to the object category.

在本實施例中，根據所述物體類別，基於所述三維物體模型庫確定所述物件模型，並根據所述物件模型確定所述物件模型的三維邊線框。例如，如圖3所示，為本申請實施例提供的確定三維邊線框示意圖。當物體類別為小車時，基於三維物體模型庫，查找小車的物件模型，根據小車的物件模型，查找小車的三維邊線框；當物體類別為小貨車時，基於三維物體模型庫，查找小貨車的物件模型，根據小貨車的物件模型，查找小貨車的三維邊線框；當物體類別為電動車時，基於三維物體模型庫，查找電動車的物件模型，根據電動車的物件模型，查找電動車的三維邊線框；當物體類別為大巴車時，基於三維物體模型庫，查找大巴車的物件模型，根據大巴車的物件模型，查找大巴車的三維邊線框。在本實施例中，所述物件模型包括，但不限於三維模型。 In this embodiment, the object model is determined based on the object category and the three-dimensional object model library, and the three-dimensional boundary frame of the object model is determined based on the object model. For example, as shown in FIG3 , it is a schematic diagram of determining the three-dimensional boundary frame provided in the embodiment of this application. When the object category is a car, the object model of the car is searched based on the three-dimensional object model library, and the three-dimensional boundary frame of the car is searched based on the object model of the car; when the object category is a small truck, the object model of the small truck is searched based on the three-dimensional object model library, and the three-dimensional boundary frame of the small truck is searched based on the object model of the small truck; when the object category is an electric car, the object model of the electric car is searched based on the three-dimensional object model library, and the three-dimensional boundary frame of the electric car is searched based on the object model of the electric car; when the object category is a bus, the object model of the bus is searched based on the three-dimensional object model library, and the three-dimensional boundary frame of the bus is searched based on the object model of the bus. In this embodiment, the object model includes, but is not limited to, a three-dimensional model.

步驟105，根據所述物體的二維邊線框的大小、所述檢測圖像的圖像信息及所述相機的焦距確定所述相機到所述物件模型的距離。 Step 105, determining the distance from the camera to the object model according to the size of the two-dimensional boundary frame of the object, the image information of the detection image and the focal length of the camera.

在本申請的至少一個實施例中，所述物體的二維邊線框的大小包括所述物體的二維邊線框的寬度及/或長度，所述檢測圖像的圖像信息包括所述檢測圖像的分辨率及所述檢測圖像的像素寬度。 In at least one embodiment of the present application, the size of the two-dimensional border frame of the object includes the width and/or length of the two-dimensional border frame of the object, and the image information of the detection image includes the resolution of the detection image and the pixel width of the detection image.

在本申請的至少一個實施例中，根據所述物體的二維邊線框的寬度及/或長度、所述相機焦距、所述檢測圖像的分辨率及所述檢測圖像的像素寬度計算得到所述相機到所述物件模型的距離。例如，參考圖4所示，為本申請實施例提供的計算相機到物體的物件模型的距離方法的示意圖。其中X表示檢測圖像的像素寬度，f表示相機的焦距，G表示相機的光心，d表示相機到所述物件模型的距離，Y表示二維邊線框的寬度，根據三角形相似原理可得公式

為：

公式

為：

其中Y表示二維邊線框的寬度，X表示檢測圖像的像素寬度，P表示檢測圖像的分辨率。 In at least one embodiment of the present application, the distance from the camera to the object model is calculated based on the width and/or length of the two-dimensional border frame of the object, the focal length of the camera, the resolution of the detection image, and the pixel width of the detection image. For example, refer to FIG. 4, which is a schematic diagram of a method for calculating the distance from the camera to the object model of the object provided in an embodiment of the present application. Where X represents the pixel width of the detection image, f represents the focal length of the camera, G represents the optical center of the camera, d represents the distance from the camera to the object model, and Y represents the width of the two-dimensional border frame. According to the triangle similarity principle, the formula

for:

formula

for:

Where Y represents the width of the two-dimensional bounding box, X represents the pixel width of the detection image, and P represents the resolution of the detection image.

由上述公式

可求得所述相機到所述物件模型的距離。 According to the above formula

The distance from the camera to the object model can be obtained.

步驟106，根據所述物體的旋轉角度、相機到所述物件模型的距離及所述三維邊線框確定所述物件模型在三維空間中的位置，將所述物件模型在三維空間中的位置作為所述物體在三維空間中的位置。 Step 106, determine the position of the object model in three-dimensional space according to the rotation angle of the object, the distance from the camera to the object model and the three-dimensional boundary frame, and use the position of the object model in three-dimensional space as the position of the object in three-dimensional space.

在本申請的至少一個實施例中，將所述物體的旋轉角度作為所述物件模型的旋轉角度。 In at least one embodiment of the present application, the rotation angle of the object is used as the rotation angle of the object model.

在本申請的至少一個實施例中，根據所述物體的旋轉角度、相機到所述物件模型的距離及所述三維邊線框確定所述物件模型在三維空間中的位置包括：根據所述物件模型的旋轉角度確定所述物件模型在所述三維空間中的方向，根據所述物件模型在三維空間中的方向、相機到所述物件模型的距離及所述三維邊線框，確定所述物件模型在三維空間中的位置。 In at least one embodiment of the present application, determining the position of the object model in three-dimensional space according to the rotation angle of the object, the distance from the camera to the object model, and the three-dimensional boundary frame includes: Determining the direction of the object model in the three-dimensional space according to the rotation angle of the object model, and determining the position of the object model in the three-dimensional space according to the direction of the object model in the three-dimensional space, the distance from the camera to the object model, and the three-dimensional boundary frame.

在本申請的至少一個實施例中，將所述物件模型在三維空間中的位置作為所述物體在三維空間中的位置，輸出所述物體類別及所述物體在三維空間中的位置，並將所述物體類別及所述物體在三維空間中的位置顯示於顯示屏。 In at least one embodiment of the present application, the position of the object model in the three-dimensional space is used as the position of the object in the three-dimensional space, the object category and the position of the object in the three-dimensional space are output, and the object category and the position of the object in the three-dimensional space are displayed on a display screen.

以上所述，僅是本申請的具體實施方式，但本申請的保護範圍並不局限於此，對於本領域的普通技術人員來說，在不脫離本申請創造構思的前提下，還可以做出改進，但這些均屬本申請的保護範圍。 The above is only the specific implementation method of this application, but the protection scope of this application is not limited to this. For ordinary technical personnel in this field, improvements can be made without departing from the creative concept of this application, but these are all within the protection scope of this application.

如圖5所示，圖5為本申請實施例提供的一種電子設備結構示意圖。所述電子設備5包括記憶體501、至少一個處理器502、存儲在所述記憶體501中並可在所述至少一個處理器502上運行的計算機程式503及至少一條通訊匯流排504。 As shown in FIG5 , FIG5 is a schematic diagram of an electronic device structure provided in an embodiment of the present application. The electronic device 5 includes a memory 501, at least one processor 502, a computer program 503 stored in the memory 501 and executable on the at least one processor 502, and at least one communication bus 504.

本領域技術人員可以理解，圖5所示的示意圖僅僅是所述電子設備5的示例，並不構成對所述電子設備5的限定，可以包括比圖示更多或更少的部件，或者組合某些部件，或者不同的部件，例如所述電子設備5還可以包括輸入輸出設備、網路接入設備等。 Those skilled in the art can understand that the schematic diagram shown in FIG5 is only an example of the electronic device 5 and does not constitute a limitation on the electronic device 5. The electronic device 5 may include more or fewer components than shown in the diagram, or may combine certain components, or may include different components. For example, the electronic device 5 may also include input and output devices, network access devices, etc.

所述至少一個處理器502可以是中央處理單元(Central Processing Unit，CPU)，還可以是其他通用處理器、數位訊號處理器(Digital Signal Processor，DSP)、專用集成電路(Application Specific Integrated Circuit，ASIC)、現場可編程門陣列(Field-Programmable Gate Array，FPGA)或者其他可編程邏輯器件、分立門或者晶體管邏輯器件、分立硬體組件等。該至少一個處理器502可以是微處理器或者該至少一個處理器502也可以是任何常規的處理器等，所述至少一個處理器502是所述電子設備5的控制中心，利用各種介面和線路連接整個電子設備5的各個部分。 The at least one processor 502 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The at least one processor 502 may be a microprocessor or any conventional processor, etc. The at least one processor 502 is the control center of the electronic device 5, and uses various interfaces and lines to connect the various parts of the entire electronic device 5.

所述記憶體501可用於存儲所述計算機程式503，所述至少一個處理器502藉由運行或執行存儲在所述記憶體501內的計算機程式503，以及調用存儲在記憶體501內的數據，實現所述電子設備5的各種功能。所述記憶體501可主要包括存儲程式區和存儲數據區，其中，存儲程式區可存儲操作系統、至少一個功能所需的應用程式(比如聲音播放功能、圖像播放功能等)等；存儲數據區可存儲根據電子設備5的使用所創建的數據(比如音頻數據)等。此外，記憶體501可以包括非易失性記憶體，例如硬盤、內存、插接式硬盤，智能存儲卡(Smart Media Card，SMC)，安全數位(Secure Digital，SD)卡，閃存卡(Flash Card)、至少一個磁盤記憶體件、閃存器件、或其他非易失性固態記憶體件。 The memory 501 can be used to store the computer program 503. The at least one processor 502 realizes various functions of the electronic device 5 by running or executing the computer program 503 stored in the memory 501 and calling the data stored in the memory 501. The memory 501 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the data storage area can store data created according to the use of the electronic device 5 (such as audio data), etc. In addition, the memory 501 may include non-volatile memory, such as a hard disk, internal memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a Flash Card, at least one magnetic disk memory device, a flash memory device, or other non-volatile solid-state memory devices.

所述電子設備5集成的模塊/單元如果以軟件功能單元的形式實現並作為獨立的產品銷售或使用時，可以存儲在一個計算機可讀取存儲媒體中。基於這樣的理解，本申請實現上述實施例方法中的全部或部分流程，也可以藉由計算機程式來指令相關的硬體來完成，所述的計算機程式可存儲於一計算機可讀存儲媒體中，該計算機程式在被處理器執行時，可實現上述各個方法實施例的步驟。其中，所述計算機程式包括計算機程式代碼，所述計算機程式代碼可以為源代碼形式、對象代碼形式、可執行文件或某些中間形式等。所述計算機可讀媒體可以包括：能夠攜帶所述計算機程式代碼的任何實體或裝置、記錄媒體、隨身碟、移動硬盤、磁碟、光盤、計算機記憶體以及唯讀記憶體(ROM，Read-Only Memory)。 If the module/unit integrated in the electronic device 5 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above-mentioned method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, a flash drive, a removable hard drive, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM).

對於本領域技術人員而言，顯然本申請不限於上述示範性實施例的細節，而且在不背離本申請的精神或基本特徵的情況下，能夠以其他的具體形式實現本申請。因此，無論從哪一點來看，均應將實施例看作是示範性的，而且是非限制性的，本申請的範圍由所附請求項而不是上述說明限定，因此旨在將落在請求項的等同要件的含義和範圍內的所有變化涵括在本申請內。不應將請求項中的任何附關聯圖標記視為限制所涉及的請求項。 It is obvious to those skilled in the art that the present application is not limited to the details of the above exemplary embodiments and can be implemented in other specific forms without departing from the spirit or basic features of the present application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-restrictive, and the scope of the present application is defined by the attached claims rather than the above description, so it is intended to include all changes that fall within the meaning and scope of the equivalent elements of the claims. Any attached figure mark in the claims should not be regarded as limiting the claims involved.

最後應說明的是，以上實施例僅用以說明本申請的技術方案而非限制，儘管參照較佳實施例對本申請進行了詳細說明，本領域的普通技術人員應當理解，可以對本申請的技術方案進行修改或等同替換，而不脫離本申請技術方案的精神和範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of this application and are not limiting. Although this application is described in detail with reference to the preferred embodiments, ordinary technicians in this field should understand that the technical solution of this application can be modified or replaced by equivalents without departing from the spirit and scope of the technical solution of this application.

101-106:步驟 101-106: Steps

Claims

A three-dimensional target detection method is applied to electronic equipment, wherein the three-dimensional target detection method includes: obtaining a training sample image and a detection image taken by a camera; constructing a target detection model, and improving the target detection model based on an SSD network, wherein the SSD network includes a backbone network and a first head network; the improving the target detection model based on the SSD network includes: adding a second head network after the backbone network in the SSD network; the improved target detection model includes the backbone network, the first head network and the second head network; using the training sample image to train the improved target detection model to obtain a trained target detection model; inputting the detection image into the trained target detection model, and using the target detection model to determine the location of the object in the detection image The method comprises: inputting the detection image into a trained target detection model, extracting the features of the detection image through the backbone network to obtain feature maps of multiple different scales; inputting each feature map of different scales into the second head network for convolution, and outputting the rotation angle of the object in the detection image; searching the three-dimensional object model library to determine the object according to the object category; The object model corresponding to the object and the three-dimensional border frame corresponding to the object model; determining the distance from the camera to the object model according to the size of the two-dimensional border frame of the object, the image information of the detection image and the focal length of the camera, including: calculating the distance from the camera to the object model according to the width and/or length of the two-dimensional border frame of the object, the focal length of the camera, the resolution and pixel width of the detection image, wherein the formula is used

The distance from the camera to the object model can be obtained: Formula

for:

;formula

for:

; wherein X represents the pixel width of the detection image, f represents the focal length of the camera, d represents the distance from the camera to the object model, Y represents the width of the two-dimensional boundary frame, and P represents the resolution of the detection image; the position of the object model in the three-dimensional space is determined according to the rotation angle of the object, the distance from the camera to the object model and the three-dimensional boundary frame, and the position of the object model in the three-dimensional space is used as the position of the object in the three-dimensional space.

A three-dimensional target detection method as described in claim 1, wherein the method further comprises: using the backbone network in the target detection model to extract features from the training sample image to obtain a plurality of training feature maps of different scales of the training sample image; generating a plurality of first default frames on the plurality of training feature maps of different scales and inputting each of the training feature maps of different scales into the first head network for convolution, and outputting the scores of the object categories of the objects in the plurality of first default frames and the positions of the plurality of first default frames. ; Perform non-maximum suppression operation on the multiple first default frames, and output the two-dimensional boundary frame of the object in the training sample image, wherein the two-dimensional boundary frame of the object in the training sample image includes the object category in the two-dimensional boundary frame and the position of the two-dimensional boundary frame; input each training feature map of different scales into the second head network for convolution, and output the rotation angle of the object in the training sample image; minimize the loss value of the first head network and the second head network, and obtain the trained target detection model.

The three-dimensional target detection method as described in claim 2, wherein the method further comprises: Performing data enhancement processing on the training sample image to increase the training sample image, wherein the data enhancement processing comprises flipping, rotating, scaling, and shifting the training sample image.

The three-dimensional target detection method as described in claim 1, wherein the step of searching the three-dimensional object model library to determine the object model corresponding to the object and the three-dimensional boundary frame corresponding to the object model according to the object category includes: establishing a three-dimensional object model library, wherein the three-dimensional object model library includes multiple object models corresponding to different object categories and a three-dimensional boundary frame corresponding to each object model, and the three-dimensional boundary frame includes the length, width, and height corresponding to the object category.

A three-dimensional target detection method as described in any one of claims 1 to 3, wherein the determining the position of the object model in the three-dimensional space according to the rotation angle of the object, the distance from the camera to the object model, and the three-dimensional boundary frame comprises: using the rotation angle of the object as the rotation angle of the object model; determining the direction of the object model in the three-dimensional space according to the rotation angle of the object model; determining the position of the object model in the three-dimensional space according to the direction of the object model in the three-dimensional space, the distance from the camera to the object model, and the three-dimensional boundary frame.

The three-dimensional target detection method as described in claim 5, wherein the method further comprises: outputting the object category and the position of the object in the three-dimensional space and displaying the object category and the position of the object in the three-dimensional space on a display screen.

An electronic device, wherein the electronic device includes a processor and a memory, and the processor is used to execute a computer program stored in the memory to implement a three-dimensional target detection method as described in any one of claims 1 to 6.

A computer-readable storage medium, wherein the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, a three-dimensional target detection method as described in any one of claim items 1 to 6 is implemented.