CN113378693B

CN113378693B - Method and device for generating target detection system and detecting target

Info

Publication number: CN113378693B
Application number: CN202110635776.0A
Authority: CN
Inventors: 方进; 周定富; 宋希彬; 张良俊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2023-07-18
Anticipated expiration: 2041-06-08
Also published as: CN113378693A

Abstract

The disclosure provides a method and a device for generating a target detection system and detecting a target, relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to an automatic driving scene. The specific implementation scheme is as follows: acquiring a sample set and a high-precision map; selecting a sample from the sample set and performing the training steps of: extracting fusion characteristics from point cloud data and a high-precision map in the selected sample; inputting the fusion characteristics into a target detection model to obtain a prediction tag set; calculating a total loss value based on the prediction tag set and the sample tag set; if the total loss value is smaller than the preset threshold value, constructing the target detection system according to the target detection model. According to the embodiment, the detection precision can be improved, and the safety of automatic driving is guaranteed.

Description

Method and device for generating target detection system and detecting target

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of computer vision and deep learning, and more particularly, to a method and apparatus for generating a target detection system and detecting a target.

Background

In the automatic driving technology, a perception system is an automatic 'core', and accurate and reliable perception results influence subsequent object tracking, decision making and path planning. For automatic driving, a high-precision map plays an important auxiliary role. On the one hand, the high-precision map provides the prepared road structure information to help the host vehicle to be positioned more accurately, and on the other hand, invalid detection results can be filtered by using the prior information of the high-precision map, so that the target detection results are improved. And the complexity of other modules can be greatly reduced by relying on the priori knowledge of the high-precision map, so that the real-time performance of the algorithm is realized.

Currently, high-precision maps are mainly used as post-processing procedures for filtering false positives for target detection systems.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, storage medium and computer program product for generating an object detection system and detecting an object.

According to a first aspect of the present disclosure, there is provided a method of generating an object detection system, comprising: and acquiring a sample set and a high-precision map, wherein each sample in the sample set comprises a frame of point cloud data and a sample tag set corresponding to the point cloud data, and the high-precision map contains vectorized road elements. Selecting a sample from the sample set and performing the training steps of: and extracting fusion characteristics from the point cloud data and the high-precision map in the selected sample. And inputting the fusion characteristics into a target detection model to obtain a prediction tag set. The total loss value is calculated based on the prediction tab set and the sample tab set. If the total loss value is smaller than the preset threshold value, constructing the target detection system according to the target detection model.

According to a second aspect of the present disclosure, there is provided a method of detecting a target, comprising: and acquiring point cloud data of the region to be detected and a high-precision map of the region to be detected. The point cloud data and the high-precision map are input into the target detection system generated by the method according to the first aspect, and the detection result is output.

According to a third aspect of the present disclosure, there is provided an apparatus for generating an object detection system, comprising: and an acquisition unit configured to acquire a sample set and a high-precision map, wherein each sample in the sample set comprises one frame of point cloud data and a sample tag set corresponding to the point cloud data, and the high-precision map contains vectorized road elements. A training unit configured to select samples from the set of samples, and to perform the training steps of: and extracting fusion characteristics from the point cloud data and the high-precision map in the selected sample. And inputting the fusion characteristics into a target detection model to obtain a prediction tag set. The total loss value is calculated based on the prediction tab set and the sample tab set. If the total loss value is smaller than the preset threshold value, constructing the target detection system according to the target detection model.

According to a fourth aspect of the present disclosure, there is provided an apparatus for detecting an object, comprising: and the acquisition unit is configured to acquire the point cloud data of the area to be detected and the high-precision map of the area to be detected. A detection unit configured to input the point cloud data and the high-precision map into the object detection system generated using the apparatus as the third aspect, and output a detection result.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor. And a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect or the second aspect.

According to a sixth aspect of the present disclosure there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the first or second aspect.

According to the method and the device for generating the target detection system and the target detection, the high-precision map is fused into the target detection system, so that the target detection performance is directly improved, and the superiority and the technical barrier in the automatic driving technology are facilitated to be established. And meanwhile, the safer target detection system is beneficial to improving the safety of the whole system.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method of generating an object detection system according to the present application;

FIG. 3 is a schematic illustration of one application scenario of a method of generating an object detection system according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of an apparatus for generating an object detection system according to the present application;

FIG. 5 is a flow chart of one embodiment of a method of detecting an object according to the present application;

FIG. 6 is a schematic structural view of one embodiment of an apparatus for detecting an object according to the present application;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present application.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates an exemplary system architecture 100 to which the methods of generating an object detection system, the apparatuses of generating an object detection system, the methods of detecting an object, or the apparatuses of detecting an object of embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include unmanned vehicles (also known as autopilots) 101, 102, a network 103, a database server 104, and a server 105. The network 103 is used to provide a medium for communication links between the drones 101, 102, the database server 104 and the server 105. The network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The unmanned vehicles 101 and 102 are equipped with driving control devices and devices for acquiring point cloud data such as a laser radar and a millimeter wave radar. The driving control device (also called a vehicle-mounted brain) is responsible for intelligent control of the unmanned vehicle. The driving control device may be a separately provided controller, such as a programmable logic controller (Programmable Logic Controller, PLC), a single chip microcomputer, an industrial controller, or the like; the device can also be equipment consisting of other electronic devices with input/output ports and operation control functions; but also a computer device installed with a vehicle driving control type application.

In practice, at least one sensor such as a camera, a gravity sensor, a wheel speed sensor, or the like may be mounted in the unmanned vehicle. In some cases, a GNSS (Global Navigation Satellite System ) device and an SINS (Strap-down Inertial Navigation System, strapdown inertial navigation System) device and the like can also be installed in the unmanned vehicle.

Database server 104 may be a database server that provides various services. For example, a database server may have stored therein a sample set. The sample set contains a large number of samples. The sample may include point cloud data and a sample tag corresponding to the point cloud data. Thus, the user may also select samples from the sample set stored by the database server 104 via the drones 101, 102.

The server 105 may also be a server that provides various services, such as a background server that provides support for various applications displayed on the drones 101, 102. The background server may train the initial model using samples in the sample set collected by the drones 101, 102, and may send training results (e.g., the generated target detection system) to the drones 101, 102. Therefore, a user can apply the generated target detection system to detect the obstacle, and the unmanned vehicle can detect the obstacle such as pedestrians, vehicles and the like, so that the running state of the vehicle is controlled, and the running safety is ensured.

The database server 104 and the server 105 may be hardware or software. When they are hardware, they may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server. When they are software, they may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein. Database server 104 and server 105 may also be servers of a distributed system or servers that incorporate blockchains. Database server 104 and server 105 may also be cloud servers, or intelligent cloud computing servers or intelligent cloud hosts with artificial intelligence technology.

It should be noted that, the method for generating the target detection system or the method for detecting the target provided in the embodiments of the present application is generally performed by the server 105. Accordingly, a device that generates an object detection system or a device that detects an object is also generally provided in the server 105. The method of detecting a target may also be performed by an unmanned vehicle.

It should be noted that the database server 104 may not be provided in the system architecture 100 in cases where the server 105 may implement the relevant functions of the database server 104.

It should be understood that the number of drones, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of drones, networks, database servers, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of generating an object detection system according to the present application is shown. The method of generating a target detection system may comprise the steps of:

step 201, a sample set and a high-precision map are acquired.

In the present embodiment, the execution subject of the method of generating the target detection system (e.g., the server 105 shown in fig. 1) may acquire a sample set in various ways. For example, the executing entity may obtain the existing sample set stored therein from a database server (e.g., database server 104 shown in fig. 1) through a wired connection or a wireless connection. As another example, a user may collect a sample via an unmanned vehicle (e.g., unmanned vehicles 101, 102 shown in fig. 1). In this way, the executing body may receive samples collected by the drone and store the samples locally, thereby generating a sample set.

Each sample in the sample set includes a frame of point cloud data and a sample tag set corresponding to the point cloud data. The point cloud data of each frame are collected by a laser radar or a millimeter wave radar in one scene. The same type of point cloud data needs to be used. The category and the position of each point are marked manually or automatically in advance as sample labels, for example, the points of objects such as vehicles, pedestrians, green belts and the like in one frame of point cloud data can be marked through the frame of a cuboid.

When a sample is acquired in a specified area, a high-precision map of the area is also acquired. The high-precision map contains vectorized road elements. The road elements may include lane lines, drivable areas, intersection information, sidewalks, and the like. The vectorized representation direction, for example, can vectorize the lane while knowing the preceding and following lanes and the adjacent information of the lane. The road is vectorized and then represented by a graph structure. The edges of each graph represent the relationship between two road elements, the vertices of the graph being the road elements.

Step 202, selecting a sample from a sample set.

In this embodiment, the execution subject may select a sample from the sample set acquired in step 201, and execute the training steps of steps 203 to 207. The selection manner and the selection number of the samples are not limited in the application. For example, the sample can be selected randomly, or the sample with the largest point cloud data labeling amount can be selected from the sample.

And 203, extracting fusion characteristics from the point cloud data and the high-precision map in the selected sample.

In this embodiment, fusion features may be extracted from the point cloud data and the high-precision map in the selected sample by a manual method or a neural network. The input of the neural network is point cloud data and a high-precision map, and the output is a fusion characteristic. The neural network may be supervised trained by pre-labeled point cloud data and pre-labeled high-precision maps as training samples. The training process is prior art and will not be described in detail.

Optionally, the fused features can be extracted by respectively extracting the features of the point cloud data and the features of the high-precision map and then fusing the features, and the specific steps are as follows:

step 2031, inputting the point cloud data in the selected sample to a point cloud feature extraction model to obtain a point cloud feature.

In this embodiment, the point cloud feature extraction model may be a neural network, such as a 3D version of resnet50. The selected sample outputs point cloud characteristics after the point cloud characteristics are extracted from the model, and the point cloud characteristics can be characteristic diagrams or characteristic vectors.

Step 2032, inputting the high-precision map into a map feature extraction model to obtain map features.

In this embodiment, the map feature extraction model may be a neural network, such as, for example, resnet101. The high-precision map may also be converted into a map structure, and the map feature extraction model may be GNN (Graph Neural Networks, graph neural network).

And step 2033, fusing the point cloud features and the map features to obtain fusion features.

In this embodiment, the point cloud features may be converted into 2-dimensional features by projecting the 3-dimensional point cloud features in the ground direction. And then fusing with map features, wherein a specific fusing scheme can comprise any one of the following steps: the weights are added, 1*1 convolved, and the information superimposed. The feature fusion mainly fuses the point cloud features and the map features, so that the point cloud features and the map features can exchange information, and finally, performance improvement is brought to the next two task ends.

And 204, inputting the fusion characteristics into a target detection model to obtain a prediction tag set.

In this embodiment, the object detection model is a neural network, e.g., an RPN (Region Proposal Network, regional generation network). The output of the target detection model is a detection result, and the detection result is to circle some point cloud data in a detection frame mode and obtain a prediction tag set of the point cloud data, namely the type of the predicted obstacle.

Step 205, calculating a total loss value based on the predictive tag set and the sample tag set.

In this embodiment, the prediction tag set and the sample tag set may be used as parameters, and input into a specified loss function (loss function), so that the total loss value between the two may be calculated.

In this embodiment, the loss function is typically used to measure the degree of inconsistency between the predicted value (e.g., the predicted tag set) and the actual value (e.g., the sample tag set) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function can be set according to actual requirements.

If the total loss value is less than the predetermined threshold, the target detection system is constructed according to the target detection model, step 206.

In this embodiment, the predetermined threshold may be generally used to represent an ideal case of a degree of inconsistency between a predicted value (e.g., a predicted tag set) and a true value (e.g., a sample tag set). That is, when the total loss value reaches a predetermined threshold, the predetermined threshold may be considered to be close to or approximate a true value. The predetermined threshold may be set according to actual requirements. If the total loss value is smaller than the preset threshold value, the target detection model training is completed, and the target detection system can be formed by the target detection model training completion and the neural network for extracting fusion characteristics and is used for target detection.

Optionally, a point cloud feature extraction model, a map feature extraction model, and a target detection model may be included in the target detection system. The point cloud feature extraction model and the map feature extraction model can be trained and can be directly used. Steps 202-207 therefore only need to train the object detection model, but still require the use of a point cloud feature extraction model and a map feature extraction model when applying the object detection system.

Step 207, if the total loss value is not less than the predetermined threshold, the relevant parameters of the target detection model are adjusted, and steps 202-207 are continued.

In this embodiment, if the total loss value is not less than the predetermined threshold, it is explained that training of the point target detection model is not completed, and relevant parameters of the target detection model are adjusted, for example, the weight in each convolution layer in the target detection model is modified by adopting a back propagation technique. And may return to step 202 to re-select samples from the sample set. So that the training steps described above can be continued.

Optionally, a point cloud feature extraction model, a map feature extraction model, and a target detection model may be included in the target detection system. The point cloud feature extraction model and the map feature extraction model are untrained models, and are required to be jointly trained together with the target detection model. And if the total loss value is not smaller than the preset threshold value, the point cloud feature extraction model, the map feature extraction model and the target detection model are not trained, and the relevant parameters of the point cloud feature extraction model, the relevant parameters of the map feature extraction model and the relevant parameters of the target detection model are adjusted.

According to the method and the device for generating the target detection system, the high-precision map is fused into the target detection system, so that the target detection performance is directly improved, and the superiority and the technical barrier in the automatic driving technology are facilitated to be established. And meanwhile, the safer target detection system is beneficial to improving the safety of the whole system.

In some optional implementations of the present embodiment, inputting the point cloud data in the selected sample to the point cloud feature extraction model to obtain the point cloud feature includes: dividing the point cloud data in the selected sample into a three-dimensional grid set with fixed resolution; and inputting the three-dimensional grid set into a point cloud feature extraction model to obtain point cloud features. And meshing the three-dimensional sparse radar point cloud, namely dividing the three-dimensional point cloud into three-dimensional grids with fixed resolution, and finally obtaining tensors with the size (H, W and C). In order to accelerate the process, the GPU equipment can be used for parallelization processing, and each point is quantized and then fed into a corresponding grid. The point cloud data are converted into the three-dimensional grids for processing, so that the data processing speed can be improved.

In some optional implementations of the present embodiment, inputting the high-precision map into the map feature extraction model to obtain the map features includes: establishing a graph structure according to road elements in the high-precision map, wherein the edges of the graph structure represent the relationship between two road elements, and the vertexes of the graph structure represent the road elements; and inputting the graph structure into a graph neural network to obtain map features. The high-precision map contains information such as lane lines, drivable areas, intersection information, sidewalks and the like. Meanwhile, high-precision maps generally contain vectorized information. The lane can be vectorized while knowing the preceding and following lanes and the adjacent information of the lane. The road is vectorized and then represented by a graph structure. The edges of each graph represent the relationship between two road elements, the vertices of the graph being the road elements. And sending the map structure after modeling the road elements in the high-precision map into a map neural network to obtain map features which can be extracted after passing through the map neural network. Specifically, the graph neural network is composed of a multi-layer graph convolution structure, each layer extracts abstract features, and after multi-layer feature extraction, final features of the graph structure can be finally extracted.

After the high-precision map is converted into the map structure, the map neural network can be used for extracting the characteristics, so that the accuracy and the speed of characteristic extraction are improved.

In some optional implementations of the present embodiment, the point cloud feature extraction model is a sparse convolution network or a three-dimensional convolution network. On the basis of a three-dimensional grid, a deep learning technology is used for feature extraction, and two common technical schemes are sparse convolution and three-dimensional convolution. Through the multi-layer neural network, grid voxels can be converted into feature data with higher dimensionality. Specifically, the three-dimensional tensor can extract more abstract features through sparse three-dimensional convolution or a three-dimensional convolution network, and finally is converted into point cloud features through a plurality of layers of neural networks. The speed and the accuracy of feature extraction are improved.

With further reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method of generating the target detection system according to the present embodiment. In the application scenario of fig. 3, a user randomly selects a sample from a sample set, the sample including a frame of point cloud data, the sample tag of which is a vehicle. A high-precision map of the area where the sample was collected was also obtained, including road elements such as intersections, lanes. And inputting the point cloud data into a point cloud feature extraction model to obtain the point cloud features. And inputting the high-precision map into a map feature extraction model to obtain map features. And then fusing the point cloud features and the map features, inputting the fused features into a target detection model, and obtaining a prediction label which is the probability that the point cloud data is a vehicle. If the loss value between the probability value and the sample tag 1 is smaller than the preset threshold value, training the point cloud feature extraction model, the map feature extraction model and the target detection model is completed, and a target detection system can be constructed. Otherwise, the relevant parameters of the point cloud feature extraction model, the map feature extraction model and the target detection model are adjusted, samples are reselected, training is continued, and the total loss value is reduced until the preset threshold value is converged.

Referring to fig. 4, a flow 400 of one embodiment of a method of detecting a target provided herein is shown. The method of detecting an object may include the steps of:

step 401, acquiring point cloud data of a region to be detected and a high-precision map of the region to be detected.

In the present embodiment, the execution subject of the method of detecting a target (e.g., the server 105 shown in fig. 1) may acquire the point cloud data of the area to be detected in various ways. For example, the execution subject may acquire the point cloud data stored therein from a database server (e.g., the database server 104 shown in fig. 1) through a wired connection or a wireless connection. For another example, the execution subject may also receive point cloud data of the area to be detected acquired by an unmanned vehicle (e.g., unmanned vehicles 101, 102 shown in fig. 1). The laser radar continuously scans and collects point cloud data in the driving process of the unmanned vehicle. The detection target is to judge whether the area to be detected has an obstacle or not, and the position and the type of the obstacle.

In addition, a high-precision map of the region to be detected needs to be acquired, the high-precision map can be positioned according to the GPS and then acquired from a database, and the high-precision map can also be acquired from a map server of a third party.

And step 402, inputting the point cloud data and the high-precision map into a target detection system, and outputting a detection result.

In this embodiment, the execution subject may input the point cloud data acquired in step 401 into the point cloud feature extraction model of the target detection system, input the high-precision map into the map feature extraction model, and then input the fusion feature into the target detection model through feature extraction and feature fusion. And finally outputting a detection result of the region to be detected. The detection result may be a description of whether an obstacle exists in the region to be detected, the position and the category of the obstacle.

In this embodiment, the target detection system may be generated using the method described above in connection with the embodiment of FIG. 2. The specific generation process may be referred to in the description of the embodiment of fig. 2, and will not be described herein.

It should be noted that, the method for detecting an object according to the present embodiment may be used to test the object detection system generated in each of the above embodiments. And further, the target detection system can be continuously optimized according to the test result. The method may be a practical application method of the target detection system generated in each of the above embodiments. The target detection system generated by the embodiments is used for target detection, and is beneficial to improving the performance of the target detection system. Such as the type of obstacle found, the location of the obstacle is relatively accurate, etc.

With continued reference to FIG. 5, as an implementation of the method illustrated in the above figures, the present application provides one embodiment of an apparatus for an object detection system. The embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices.

As shown in fig. 5, the apparatus 500 of the object detection system of the present embodiment may include: an acquisition unit 501 and a training unit 502. Wherein the obtaining unit 501 is configured to obtain a sample set and a high-precision map, wherein each sample in the sample set comprises a frame of point cloud data and a sample tag set corresponding to the point cloud data, and the high-precision map contains vectorized road elements. A training unit 502 configured to select samples from the sample set and to perform the following training steps: and extracting fusion characteristics from the point cloud data and the high-precision map in the selected sample. And inputting the fusion characteristics into a target detection model to obtain a prediction tag set. The total loss value is calculated based on the prediction tab set and the sample tab set. If the total loss value is smaller than the preset threshold value, constructing the target detection system according to the target detection model.

In some optional implementations of the present embodiment, training unit 502 is further configured to: if the total loss value is not smaller than the preset threshold value, the relevant parameters of the target detection model are adjusted, the sample is selected again from the sample set, and the training step is continuously executed.

In some optional implementations of the present embodiment, training unit 502 is further configured to: inputting the point cloud data in the selected sample into a point cloud feature extraction model to obtain point cloud features; inputting the high-precision map into a map feature extraction model to obtain map features; and fusing the point cloud features and the map features to obtain fused features.

In some optional implementations of the present embodiment, training unit 502 is further configured to: and dividing the point cloud data in the selected sample into a three-dimensional grid set with a fixed resolution. And inputting the three-dimensional grid set into a point cloud feature extraction model to obtain point cloud features.

In some optional implementations of the present embodiment, training unit 502 is further configured to: and establishing a graph structure according to the road elements in the high-precision map, wherein the edges of the graph structure represent the relationship between two road elements, and the vertexes of the graph structure represent the road elements. And inputting the graph structure into a graph neural network to obtain map features.

In some optional implementations of the present embodiment, the point cloud feature extraction model is a sparse convolution network or a three-dimensional convolution network.

In some optional implementations of this embodiment, the training unit is further configured to: if the total loss value is smaller than the preset threshold value, constructing a target detection system according to the point cloud feature extraction model, the map feature extraction model and the target detection model; and if the total loss value is not smaller than the preset threshold value, adjusting the relevant parameters of the point cloud feature extraction model, the relevant parameters of the map feature extraction model and the relevant parameters of the target detection model.

With continued reference to fig. 6, as an implementation of the method illustrated in the above figures, the present application provides one embodiment of an apparatus for detecting a target. The embodiment of the device corresponds to the embodiment of the method shown in fig. 4, and the device can be applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for detecting a target of the present embodiment may include: the acquiring unit 601 is configured to acquire point cloud data of an area to be detected and a high-precision map of the area to be detected. The detection unit 602 is configured to output a detection result in the target detection system generated by the point cloud data and the high-precision map input device 500.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of flow 200 or 400.

A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of flow 200 or 400.

A computer program product comprising a computer program that when executed by a processor implements the method of flow 200 or 400.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as a method of generating an object detection system. For example, in some embodiments, the method of generating an object detection system may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the method of generating an object detection system described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method of generating the target detection system by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a server of a distributed system or a server that incorporates a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology. The server may be a server of a distributed system or a server that incorporates a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of generating a target detection system, comprising:

obtaining a sample set and a high-precision map, wherein each sample in the sample set comprises a frame of point cloud data and a sample tag set corresponding to the point cloud data, and the high-precision map comprises vectorized road elements;

selecting a sample from the sample set and performing the training steps of: extracting fusion characteristics from the point cloud data in the selected sample and the high-precision map; inputting the fusion characteristics into a target detection model to obtain a prediction tag set; calculating a total loss value based on the prediction tag set and the sample tag set; if the total loss value is smaller than a preset threshold value, constructing a target detection system according to the target detection model;

the extracting the fusion feature from the point cloud data in the selected sample and the high-precision map includes:

inputting the point cloud data in the selected sample into a point cloud feature extraction model to obtain point cloud features;

inputting the high-precision map into a map feature extraction model to obtain map features;

fusing the point cloud features and the map features to obtain fused features;

the step of inputting the point cloud data in the selected sample into a point cloud feature extraction model to obtain the point cloud features comprises the following steps:

dividing the point cloud data in the selected sample into a three-dimensional grid set with fixed resolution;

inputting the three-dimensional grid set into a point cloud feature extraction model to obtain point cloud features;

the step of inputting the high-precision map into a map feature extraction model to obtain map features comprises the following steps:

establishing a graph structure according to road elements in the high-precision map, wherein the edges of the graph structure represent the relationship between two road elements, and the vertexes of the graph structure represent the road elements;

and inputting the graph structure into a graph neural network to obtain map features.

2. The method of claim 1, wherein the method further comprises:

and if the total loss value is not smaller than a preset threshold value, adjusting relevant parameters of the target detection model, and re-selecting samples from the sample set to continue to execute the training step.

3. The method of claim 1, wherein the point cloud feature extraction model is a sparse convolutional network or a three-dimensional convolutional network.

4. The method of claim 1, wherein the constructing an object detection system from the object detection model comprises:

constructing a target detection system according to the point cloud feature extraction model, the map feature extraction model and the target detection model; and

the method further comprises the steps of:

and if the total loss value is not smaller than a preset threshold value, adjusting the relevant parameters of the point cloud feature extraction model, the relevant parameters of the map feature extraction model and the relevant parameters of the target detection model.

5. A method of detecting a target, comprising:

acquiring point cloud data of an area to be detected and a high-precision map of the area to be detected;

inputting the point cloud data and the high-precision map into a target detection system generated by the method according to any one of claims 1-4, and outputting a detection result.

6. An apparatus for generating an object detection system, comprising:

an acquisition unit configured to acquire a sample set and a high-precision map, wherein each sample in the sample set includes one frame of point cloud data and a sample tag set corresponding to the point cloud data, the high-precision map containing vectorized road elements;

a training unit configured to select samples from the set of samples, and to perform the training steps of: extracting fusion characteristics from the point cloud data in the selected sample and the high-precision map; inputting the fusion characteristics into a target detection model to obtain a prediction tag set; calculating a total loss value based on the prediction tag set and the sample tag set; if the total loss value is smaller than a preset threshold value, constructing a target detection system according to the target detection model;

wherein the training unit is further configured to:

fusing the point cloud features and the map features to obtain fused features;

wherein the training unit is further configured to:

7. The apparatus of claim 6, wherein the training unit is further configured to:

8. The apparatus of claim 6, wherein the point cloud feature extraction model is a sparse convolutional network or a three-dimensional convolutional network.

9. The apparatus of claim 6, wherein the training unit is further configured to:

if the total loss value is smaller than a preset threshold value, constructing a target detection system according to the point cloud feature extraction model, the map feature extraction model and the target detection model; and

10. An apparatus for detecting an object, comprising:

an acquisition unit configured to acquire point cloud data of an area to be detected and a high-precision map of the area to be detected;

a detection unit configured to input the point cloud data and the high-precision map into an object detection system generated using the apparatus according to any one of claims 6 to 9, and output a detection result.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.