CN115205848A

CN115205848A - Target detection method, target detection device, vehicle, storage medium and chip

Info

Publication number: CN115205848A
Application number: CN202210822745.0A
Authority: CN
Inventors: 武鹏
Original assignee: Xiaomi Automobile Technology Co Ltd
Current assignee: Xiaomi Automobile Technology Co Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2022-10-18

Abstract

The present disclosure relates to a target detection method, apparatus, vehicle, storage medium and chip, the method comprising: acquiring an environment image of the surrounding environment in the running process of the vehicle; inputting the environment image into a three-dimensional target detection model to obtain target parameter information output by the three-dimensional target detection model, wherein the three-dimensional target detection model is obtained by pre-training a plurality of first sample images and target sample parameter information corresponding to each first sample image, the target sample parameter information is determined by the pre-trained three-dimensional detection model and a two-dimensional detection model, the three-dimensional detection model is obtained by training a second sample image, the distance between a second sample object in the second sample image and the vehicle is smaller than or equal to a preset distance threshold, and the two-dimensional detection model is used for obtaining the two-dimensional parameter information of the first sample object. Therefore, the accuracy of the three-dimensional target detection model is improved, and the safety of automatic driving of the vehicle is improved.

Description

Target detection method, target detection device, vehicle, storage medium and chip

Technical Field

The present disclosure relates to the field of vehicle technologies, and in particular, to a target detection method and apparatus, a vehicle, a storage medium, and a chip.

Background

With the progress of artificial intelligence technology, the automatic driving technology has been developed extremely rapidly. In the technical field of automatic driving, environment perception is accurately performed, the method is crucial to improving the moving safety of an automatic driving vehicle, and a monocular vision-based three-dimensional target detection technology is developed at the present time. Generally, a monocular vision three-dimensional target detection technology is performed based on a deep learning method, and large-batch and high-precision labeled sample data is required.

In the related technology, image data and point cloud data are synchronously acquired through a laser radar and a camera which are installed on a vehicle, and sample data is obtained after the image data are marked according to the point cloud data. However, due to the limitation of the irradiation distance of the laser radar, the number of the collected point clouds is very limited for a small long-distance target, and the point clouds cannot be labeled, so that the sample data is less, the accuracy of a three-dimensional target detection model obtained through training is lower, and the safety of automatic driving of a vehicle is influenced.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a target detection method, apparatus, vehicle, storage medium, and chip.

According to a first aspect of the embodiments of the present disclosure, there is provided an object detection method applied to a vehicle, including:

acquiring an environment image of the surrounding environment in the running process of the vehicle;

inputting the environment image into a three-dimensional target detection model to acquire target parameter information output by the three-dimensional target detection model, wherein the target parameter information comprises target position information, target size information and target angle information of a target object in the environment image;

the three-dimensional target detection model is obtained by pre-training a plurality of first sample images and target sample parameter information corresponding to each first sample image, the target sample parameter information includes target sample position information, target sample size information and target sample angle information of a first sample object in the first sample image, the target sample parameter information is determined by a pre-trained three-dimensional detection model and a pre-trained two-dimensional detection model, the three-dimensional detection model is obtained by training a second sample image, the distance between the second sample object in the second sample image and the vehicle is smaller than or equal to a preset distance threshold, and the two-dimensional detection model is used for obtaining the two-dimensional parameter information of the first sample object.

Optionally, the three-dimensional target detection model is obtained by training in the following manner:

acquiring a plurality of first sample images;

inputting the first sample image into the three-dimensional detection model to obtain first sample parameter information output by the three-dimensional detection model, and inputting the sample image into the two-dimensional detection model to obtain two-dimensional parameter information output by the two-dimensional detection model, wherein the first sample parameter information comprises first sample position information, first sample size information and first sample angle information of the first sample object;

determining target sample parameter information corresponding to each first sample image according to the first sample parameter information and the two-dimensional parameter information;

and training a first target neural network model through a plurality of first sample images and target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.

Alternatively,

the determining, according to the plurality of first sample parameter information and the plurality of two-dimensional parameter information, the target sample parameter information corresponding to each first sample image includes:

for each piece of the two-dimensional parameter information, when it is determined that first target sample parameter information corresponding to the two-dimensional parameter information exists in a plurality of pieces of the first sample parameter information, the first target sample parameter information is used as the target sample parameter information, when it is determined that the first target sample parameter information does not exist in the plurality of pieces of the first sample parameter information, a plurality of feature maps corresponding to first target sample images are determined through the three-dimensional detection model, and the target sample parameter information corresponding to the first target sample images is determined according to the two-dimensional parameter information and the plurality of feature maps, wherein a first sample object corresponding to the first target sample parameter information is the same as a first sample object corresponding to the two-dimensional parameter information, and the first target sample image is a first sample image corresponding to the two-dimensional parameter information.

Or,

the determining that first target sample parameter information corresponding to the two-dimensional parameter information exists in the plurality of first sample parameter information includes:

determining a first detection frame corresponding to the first sample parameter information, determining an intersection ratio of a second detection frame corresponding to the two-dimensional parameter information and the first detection frame, determining that first target sample parameter information corresponding to the two-dimensional parameter information exists in the plurality of first sample parameter information under the condition that the intersection ratio is greater than or equal to a preset intersection ratio threshold, and taking the first sample parameter information as the first target sample parameter information.

Alternatively,

the determining the first detection frame corresponding to the first sample parameter information includes:

determining position information of a central point of a first detection frame corresponding to the first sample parameter information according to internal reference information of a camera for shooting the first sample image and the first sample position information;

and determining the first detection frame according to the position information of the central point of the first detection frame and the first sample size information.

Or,

the three-dimensional inspection model comprises a plurality of inspection modules, the inspection modules comprising a first convolutional layer, a second convolutional layer, and a predicted convolutional layer, an output of the first convolutional layer being coupled to an input of the second convolutional layer, an output of the second convolutional layer being coupled to an input of the predicted convolutional layer; the determining, by the three-dimensional detection model, a plurality of feature maps corresponding to the first target sample image includes:

inputting the first target sample image into the three-dimensional detection model to obtain a plurality of feature maps output by a predicted convolutional layer of the three-dimensional detection model;

determining, according to the two-dimensional parameter information and the plurality of feature maps, target sample parameter information corresponding to the first target sample image includes:

determining feature position information of a central point of a second detection frame corresponding to the two-dimensional parameter information on each feature map;

and determining target sample parameter information corresponding to the first target sample image according to the plurality of characteristic position information and the two-dimensional parameter information.

Optionally, the three-dimensional detection model is obtained by pre-training in the following manner:

acquiring a plurality of second sample images and second sample parameter information corresponding to each second sample image, wherein the second sample parameter information comprises second sample position information, second sample size information and second sample angle information of the second sample object, and the second sample parameter information is determined according to point cloud data corresponding to the second sample images;

and training a second target neural network model through a plurality of second sample images and second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model.

Optionally, the method further comprises:

determining a driving route of the vehicle according to the target parameter information;

and controlling the vehicle to automatically run according to the running route.

According to a second aspect of the embodiments of the present disclosure, there is provided an object detection apparatus applied to a vehicle, including:

the acquisition module is configured to acquire an environment image of the surrounding environment during the running process of the vehicle;

an information acquisition module configured to input the environment image into a three-dimensional target detection model to acquire target parameter information output by the three-dimensional target detection model, wherein the target parameter information includes target position information, target size information and target angle information of a target object in the environment image;

the three-dimensional target detection model is obtained by pre-training a plurality of first sample images and target sample parameter information corresponding to each first sample image, the target sample parameter information comprises target sample position information, target sample size information and target sample angle information of a first sample object in the first sample images, the target sample parameter information is determined by a pre-trained three-dimensional detection model and a pre-trained two-dimensional detection model, the three-dimensional detection model is obtained by training a second sample image, the distance between the second sample object in the second sample image and the vehicle is smaller than or equal to a preset distance threshold, and the two-dimensional detection model is used for obtaining the two-dimensional parameter information of the first sample object.

acquiring a plurality of first sample images;

Optionally, the determining, according to the plurality of first sample parameter information and the plurality of two-dimensional parameter information, the target sample parameter information corresponding to each first sample image includes:

Optionally, the determining that first target sample parameter information corresponding to the two-dimensional parameter information exists in the plurality of first sample parameter information includes:

Optionally, the determining the first detection frame corresponding to the first sample parameter information includes:

Optionally, the three-dimensional inspection model comprises a plurality of inspection modules, the inspection modules comprising a first convolutional layer, a second convolutional layer, and a predicted convolutional layer, an output of the first convolutional layer being coupled to an input of the second convolutional layer, an output of the second convolutional layer being coupled to an input of the predicted convolutional layer; the determining, by the three-dimensional detection model, a plurality of feature maps corresponding to the first target sample image includes:

Optionally, the apparatus further comprises:

a route determination module configured to determine a driving route of the vehicle according to the target parameter information;

a control module configured to control the vehicle to automatically travel according to the travel route.

According to a third aspect of the embodiments of the present disclosure, there is provided a vehicle including:

a first processor;

a memory for storing processor-executable instructions;

wherein the first processor is configured to:

the steps of the method of the first aspect of the present disclosure are implemented.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of the first aspect of the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a chip comprising a second processor and an interface; the second processor is configured to read instructions to perform the method of the first aspect of the disclosure.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: acquiring an environment image of the surrounding environment in the running process of the vehicle; inputting the environment image into a three-dimensional target detection model to acquire target parameter information output by the three-dimensional target detection model, wherein the target parameter information comprises target position information, target size information and target angle information of a target object in the environment image; the three-dimensional target detection model is obtained by pre-training a plurality of first sample images and target sample parameter information corresponding to each first sample image, the target sample parameter information includes target sample position information, target sample size information and target sample angle information of a first sample object in the first sample image, the target sample parameter information is determined by a pre-trained three-dimensional detection model and a pre-trained two-dimensional detection model, the three-dimensional detection model is obtained by training a second sample image, the distance between the second sample object in the second sample image and the vehicle is smaller than or equal to a preset distance threshold, and the two-dimensional detection model is used for obtaining the two-dimensional parameter information of the first sample object. That is to say, the method and the device can determine the target sample parameter information corresponding to the sample image through the pre-trained three-dimensional detection model and the pre-trained two-dimensional detection model, and label the remote small target, so that the sample data obtained by labeling is richer, the accuracy of the three-dimensional target detection model is improved, and the safety of automatic vehicle driving is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a method of object detection according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method of training a three-dimensional object detection model in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of training a three-dimensional inspection model in accordance with an exemplary embodiment;

FIG. 4 is a flow chart illustrating another method of object detection according to an exemplary embodiment;

FIG. 5 is a block diagram illustrating an object detection device in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating another object detection device in accordance with an exemplary embodiment;

FIG. 7 is a functional block diagram schematic of a vehicle shown in an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be noted that all actions of acquiring signals, information or data in the present application are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

First, an application scenario of the present disclosure will be explained. When sample data of a three-dimensional target detection model is acquired, very high requirements are placed on internal and external parameters and time synchronization of a laser radar and a camera mounted on a vehicle, so that the difficulty of acquiring the sample data is high, and the efficiency of marking the sample data is low due to the fact that manual operation is needed in the marking process of the sample data at present, so that the accuracy of the three-dimensional target detection model obtained through training is low under the condition that the sample data is few. In addition, the long-distance irradiation effect of the laser radar is poor, and for long-distance small targets, the point cloud data acquired by the laser radar is less, so that the targets cannot be marked, sample data corresponding to the long-distance small targets cannot be acquired, the accuracy of a three-dimensional target detection model is low, and the safety of automatic driving of vehicles is affected.

In order to solve the technical problems, the present disclosure provides a target detection method, an apparatus, a vehicle, a storage medium, and a chip, which determine target sample parameter information corresponding to a sample image through a pre-trained three-dimensional detection model and a pre-trained two-dimensional detection model, and label a small target at a long distance, so that sample data obtained by labeling is richer, the accuracy of the three-dimensional target detection model is improved, and the safety of automatic driving of the vehicle is improved.

FIG. 1 is a flow chart illustrating a method of object detection according to an exemplary embodiment, as applied to a vehicle, as shown in FIG. 1, which may include:

s101, collecting an environment image of the surrounding environment in the running process of the vehicle.

In this step, during the running of the vehicle, an environmental image of the environment around the vehicle may be acquired by a camera mounted on the vehicle.

And S102, inputting the environment image into a three-dimensional target detection model to obtain target parameter information output by the three-dimensional target detection model.

The target parameter information may include target position information, target size information, and target angle information of a target object in the environment image. The target position information may be three-dimensional coordinate information of the target object in a body coordinate system of the vehicle, for example, the target position information may be (x, y, z), the target size information may be a size of a three-dimensional detection frame corresponding to the target object, for example, the target size information may be (w, h, l), and the target angle information may be an angle of the target object with respect to a camera that captures the environment image.

The three-dimensional target detection model can be obtained by pre-training a plurality of first sample images and target sample parameter information corresponding to each first sample image, the target sample parameter information comprises target sample position information, target sample size information and target sample angle information of a first sample object in the first sample image, the target sample parameter information is determined by a pre-trained three-dimensional detection model and a pre-trained two-dimensional detection model, the three-dimensional detection model is obtained by training a second sample image, the distance between a second sample object in the second sample image and the vehicle is smaller than or equal to a preset distance threshold, and the two-dimensional detection model is used for obtaining the two-dimensional parameter information of the first sample object.

In this step, after the environmental image is collected, the environmental image may be input into the three-dimensional target detection model, and the environmental image is detected by the three-dimensional target detection model to determine target parameter information corresponding to the environmental image.

By adopting the method, the target sample parameter information corresponding to the sample image is determined through the pre-trained three-dimensional detection model and the pre-trained two-dimensional detection model, and the small target in a long distance can be labeled, so that the labeled sample data is richer, the accuracy of the three-dimensional target detection model is improved, and the safety of automatic driving of the vehicle is improved.

FIG. 2 is a flow diagram illustrating a method of training a three-dimensional object detection model, according to an example embodiment, which may include, as shown in FIG. 2:

and S21, acquiring a plurality of first sample images.

In this step, a history environment image acquired during traveling in the history time period of the vehicle may be used as the first sample image. The historical environment image may include environment images acquired at different road conditions and different time periods, and distances from the vehicle to objects photographed by different historical environment images may also be different, for example, the historical environment image may be an image of an object photographed at a distance of 10 meters from the vehicle, or an image of an object photographed at a distance of 30 meters from the vehicle, and the acquisition mode of the first sample image is not limited in the present disclosure.

And S22, inputting the first sample image into the three-dimensional detection model for each first sample image to obtain first sample parameter information output by the three-dimensional detection model, and inputting the sample image into the two-dimensional detection model to obtain two-dimensional parameter information output by the two-dimensional detection model.

The first sample parameter information may include first sample position information, first sample size information, and first sample angle information of the first sample object, and it should be noted that the first sample position information, the first sample size information, and the first sample angle information are the same as the definitions of the target position information, the target size information, and the target angle information in step S102, and are not described herein again.

In this step, after a plurality of first sample images are acquired, the first sample images may be respectively input into the three-dimensional detection model and the two-dimensional detection model, first sample parameter information corresponding to the first sample images is determined by the three-dimensional detection model, and two-dimensional parameter information corresponding to the first sample images is determined by the two-dimensional detection model. Then, the first sample parameter information and the two-dimensional parameter information can be combined to determine the target sample parameter information corresponding to the first sample image. It should be noted that the present disclosure does not limit the order of determining the first sample parameter information and the two-dimensional parameter information.

And S23, determining target sample parameter information corresponding to each first sample image according to the first sample parameter information and the two-dimensional parameter information.

In a possible implementation manner, for each piece of the two-dimensional parameter information, when it is determined that first target sample parameter information corresponding to the two-dimensional parameter information exists in a plurality of pieces of the first sample parameter information, the first target sample parameter information is used as the target sample parameter information, when it is determined that the first target sample parameter information does not exist in the plurality of pieces of the first sample parameter information, a plurality of feature maps corresponding to first target sample images are determined through the three-dimensional detection model, and the target sample parameter information corresponding to the first target sample image is determined according to the two-dimensional parameter information and the plurality of feature maps, where a first sample object corresponding to the first target sample parameter information is the same as a first sample object corresponding to the two-dimensional parameter information, and the first target sample image is a first sample image corresponding to the two-dimensional parameter information.

And determining a first detection frame corresponding to the first sample parameter information, determining an intersection ratio of a second detection frame corresponding to the two-dimensional parameter information and the first detection frame, determining that first target sample parameter information corresponding to the two-dimensional parameter information exists in a plurality of pieces of first sample parameter information under the condition that the intersection ratio is greater than or equal to a preset intersection ratio threshold, and taking the first sample parameter information as the first target sample parameter information.

For example, taking any first sample parameter information as an example, the position information of the central point of the first detection frame corresponding to the first sample parameter information may be determined according to the internal reference information of the camera that captures the first sample image and the first sample position information; and determining the first detection frame according to the position information of the central point of the first detection frame and the first sample size information. The internal reference information may include focal lengths of an x-axis and a y-axis of the camera, and an optical center position of the camera.

For example, the position information of the center point of the first detection frame can be calculated by formula (1):

wherein (x) _img ,y _img ) As position information of the center point of the first detection frame, f _x Is the focal length of the x-axis of the camera, f _y Is the focal length of the y-axis of the camera, (c) _x ,c _y ) Is the optical center position of the camera, (x) _i ,y _i ,z _i ) Is the first sample position information.

After the position information of the center point of the first detection frame is obtained through calculation, the boundary lines in the three directions of the x axis, the y axis and the z axis can be determined by combining the first sample size information, and the first detection frame is obtained.

After the first detection frame corresponding to the first sample parameter information is determined, the intersection ratio of the second detection frame corresponding to the two-dimensional parameter information and the first detection frame can be calculated through a formula (2):

wherein, score _iou Bbox for the cross-over ratio _a Bbox as the first detection frame _b The second detection frame.

And obtaining a preset intersection ratio threshold value after calculating the intersection ratio of the first detection frame and the second detection frame, determining that first target sample parameter information corresponding to the two-dimensional parameter information exists in a plurality of first sample parameter information under the condition that the intersection ratio is greater than or equal to the preset intersection ratio threshold value, taking the first sample parameter information as the first target sample parameter information, and taking the first target sample parameter information as the target sample parameter information.

Under the condition that the intersection ratio is smaller than the preset intersection ratio threshold, determining that first target sample parameter information corresponding to the two-dimensional parameter information does not exist in the plurality of first sample parameter information, under the condition, determining a plurality of feature maps corresponding to a first target sample image through the three-dimensional detection model, and determining the target sample parameter information corresponding to the first target sample image according to the two-dimensional parameter information and the plurality of feature maps.

In a possible implementation manner, the three-dimensional detection model may include a plurality of detection modules, the detection modules include a first convolution layer, a second convolution layer and a prediction convolution layer, an output of the first convolution layer is coupled with an input of the second convolution layer, an output of the second convolution layer is coupled with an input of the prediction convolution layer, and the first target sample image may be input into the three-dimensional detection model to obtain a plurality of feature maps output by the prediction convolution layer of the three-dimensional detection model; determining feature position information of the central point of the second detection frame corresponding to the two-dimensional parameter information on each feature map; and determining target sample parameter information corresponding to the first target sample image according to a plurality of pieces of characteristic position information and the two-dimensional parameter information.

For example, if the three-dimensional detection model includes a position estimation module, a depth estimation module, an angle estimation module, and a size estimation module, after the first target sample image is input into the three-dimensional detection model, four feature maps output by four predicted convolution layers of the three-dimensional detection model may be obtained, then feature position information of the center point of the second detection frame on each feature map may be determined, and target sample parameter information corresponding to the first target sample image may be determined by combining a plurality of feature position information and the two-dimensional parameter information. For example, the target sample size information in the target sample parameter information may be determined according to the feature map output by the depth estimation module and the two-dimensional parameter information.

And S24, training the first target neural network model through a plurality of first sample images and target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.

In this step, after determining the target sample parameter information corresponding to each first sample image, the first target neural network model may be trained to obtain the three-dimensional target detection model by using a plurality of first sample images and the target sample parameter information corresponding to each first sample image, with reference to the model training method in the prior art, which is not repeated herein.

The first target neural network model may be the same as the second target neural network model, and the first target neural network model may also be the three-dimensional detection model obtained based on the training of the second target neural network model, which is not limited in this disclosure.

By adopting the model training method, under the condition that the first target sample parameter information corresponding to the two-dimensional parameter information exists in the plurality of first sample parameter information, namely, for a short-distance target, the target sample parameter information corresponding to the sample image can be directly determined through the pre-trained three-dimensional detection model, under the condition that the first target sample parameter information does not exist in the plurality of first sample parameter information, namely, for a long-distance small target, the target which cannot be sensed by the laser radar can be supplemented through the two-dimensional detection model, and the target sample parameter information corresponding to the sample image is determined, so that the sample data obtained by labeling is richer, the three-dimensional target detection model capable of sensing the long-distance small target can be trained, and the accuracy of the three-dimensional target detection model is improved.

Fig. 3 is a flowchart illustrating a method of training a three-dimensional inspection model according to an exemplary embodiment, which may include, as shown in fig. 3:

s31, obtaining a plurality of second sample images and second sample parameter information corresponding to each second sample image.

The second sample parameter information may include second sample position information, second sample size information, and second sample angle information of the second sample object, and the second sample parameter information is determined according to point cloud data corresponding to the second sample image.

In this step, with reference to the method for acquiring the first sample image in step S21, an environment image within a preset distance around the vehicle is acquired to obtain the second sample image, point cloud data may be acquired through a laser radar installed on the vehicle while the second sample image is acquired, and then, according to the point cloud data, each second sample image is labeled to obtain second sample parameter information corresponding to each second sample image.

And S32, training a second target neural network model through a plurality of second sample images and second sample parameter information corresponding to each second sample image to obtain the three-dimensional detection model.

In this step, after the second sample parameter information corresponding to each second sample image is determined, the second target neural network model may be trained through a plurality of second sample images and the second sample parameter information corresponding to each second sample image, so as to obtain the three-dimensional detection model, which is not described herein again, with reference to the model training method in the prior art.

By adopting the model training method, the three-dimensional detection model can be obtained through training of the collected close-range second sample image, and automatic labeling of the close-range target can be realized through the three-dimensional detection model, so that a large amount of labeling cost is saved.

Fig. 4 is a flow chart illustrating another method of object detection according to an example embodiment, which may further include, as shown in fig. 4:

and S103, determining the driving route of the vehicle according to the target parameter information.

In this step, after the target parameter information is determined, obstacle avoidance processing may be performed on the target object with reference to a method in the prior art, and a driving route is planned.

And S104, controlling the vehicle to automatically run according to the running route.

In this step, after the travel route is determined, the travel route may be transmitted to an automatic driving system of the vehicle, by which the vehicle is controlled to automatically travel.

In summary, the three-dimensional detection model can be trained only by means of the close-range environment image and the point cloud data acquired by the laser radar, the close-range first sample image is automatically labeled through the three-dimensional detection model, the sample data is supplemented through the remote small target detected by the two-dimensional detection model, so that labeling of a large amount of sample data is realized, the labeled sample data is richer, the three-dimensional target detection model capable of identifying the remote small target can be trained, the accuracy of the three-dimensional target detection model is improved, and the safety of automatic driving of the vehicle is further improved.

FIG. 5 is a block diagram illustrating an object detection device according to an exemplary embodiment, the device being applied to a vehicle, as shown in FIG. 5, and may include:

an acquisition module 501 configured to acquire an environment image of a surrounding environment during the driving of the vehicle;

an information obtaining module 502 configured to input the environment image into a three-dimensional target detection model to obtain target parameter information output by the three-dimensional target detection model, where the target parameter information includes target position information, target size information, and target angle information of a target object in the environment image;

the three-dimensional target detection model is obtained by pre-training a plurality of first sample images and target sample parameter information corresponding to each first sample image, the target sample parameter information comprises target sample position information, target sample size information and target sample angle information of a first sample object in the first sample image, the target sample parameter information is determined by the pre-trained three-dimensional detection model and a two-dimensional detection model, the three-dimensional detection model is obtained by training a second sample image, the distance between the second sample object in the second sample image and the vehicle is smaller than or equal to a preset distance threshold, and the two-dimensional detection model is used for obtaining the two-dimensional parameter information of the first sample object.

acquiring a plurality of the first sample images;

for each first sample image, inputting the first sample image into the three-dimensional detection model to obtain first sample parameter information output by the three-dimensional detection model, and inputting the sample image into the two-dimensional detection model to obtain two-dimensional parameter information output by the two-dimensional detection model, wherein the first sample parameter information comprises first sample position information, first sample size information and first sample angle information of the first sample object;

and training the first target neural network model through a plurality of first sample images and target sample parameter information corresponding to each first sample image to obtain the three-dimensional target detection model.

Optionally, the determining, according to a plurality of the first sample parameter information and a plurality of the two-dimensional parameter information, the target sample parameter information corresponding to each of the first sample images includes:

for each piece of the two-dimensional parameter information, when it is determined that first target sample parameter information corresponding to the two-dimensional parameter information exists in a plurality of pieces of the first sample parameter information, the first target sample parameter information is used as the target sample parameter information, when it is determined that the first target sample parameter information does not exist in the plurality of pieces of the first sample parameter information, a plurality of feature maps corresponding to first target sample images are determined through the three-dimensional detection model, and the target sample parameter information corresponding to the first target sample images is determined according to the two-dimensional parameter information and the plurality of feature maps, wherein the first sample object corresponding to the first target sample parameter information is the same as the first sample object corresponding to the two-dimensional parameter information, and the first target sample image is the first sample image corresponding to the two-dimensional parameter information.

Optionally, the determining that the first target sample parameter information corresponding to the two-dimensional parameter information exists in the plurality of first sample parameter information includes:

and determining a first detection frame corresponding to the first sample parameter information, determining a cross-over ratio of a second detection frame corresponding to the two-dimensional parameter information and the first detection frame, determining that first target sample parameter information corresponding to the two-dimensional parameter information exists in a plurality of pieces of first sample parameter information under the condition that the cross-over ratio is greater than or equal to a preset cross-over ratio threshold, and taking the first sample parameter information as the first target sample parameter information.

determining the position information of the central point of the first detection frame corresponding to the first sample parameter information according to the internal reference information of the camera for shooting the first sample image and the first sample position information;

inputting the first target sample image into the three-dimensional detection model to obtain a plurality of feature maps output by a predicted convolution layer of the three-dimensional detection model;

determining the target sample parameter information corresponding to the first target sample image according to the two-dimensional parameter information and the plurality of feature maps comprises:

determining feature position information of the central point of the second detection frame corresponding to the two-dimensional parameter information on each feature map;

and determining target sample parameter information corresponding to the first target sample image according to a plurality of pieces of feature position information and the two-dimensional parameter information.

Optionally, fig. 6 is a block diagram illustrating another object detection apparatus according to an exemplary embodiment, as shown in fig. 6, the apparatus further includes:

a route determination module 503 configured to determine a driving route of the vehicle according to the target parameter information;

and a control module 504 configured to control the vehicle to automatically travel according to the travel route.

By the aid of the device, target sample parameter information corresponding to the sample image is determined through the pre-trained three-dimensional detection model and the pre-trained two-dimensional detection model, and a small remote target can be labeled, so that sample data obtained by labeling is richer, the accuracy of the three-dimensional target detection model is improved, and safety of automatic driving of a vehicle is improved.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the object detection method provided by the present disclosure.

The apparatus may be a part of a stand-alone electronic device, for example, in an embodiment, the apparatus may be an Integrated Circuit (IC) or a chip, where the IC may be one IC or a collection of multiple ICs; the chip may include, but is not limited to, the following categories: a GPU (Graphics Processing Unit), a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an SOC (System on Chip, SOC, system on Chip, or System on Chip), and the like. The integrated circuit or chip can be used to execute executable instructions (or codes) to realize the target detection method. Where the executable instructions may be stored in the integrated circuit or chip or may be retrieved from another apparatus or device, for example where the integrated circuit or chip includes a second processor, a memory, and an interface for communicating with the other apparatus. The executable instructions may be stored in the memory, and when executed by the processor, implement the object detection method described above; alternatively, the integrated circuit or chip may receive the executable instructions through the interface and transmit the executable instructions to the second processor for execution, so as to implement the target detection method.

Referring to fig. 7, fig. 7 is a functional block diagram of a vehicle 600 according to an exemplary embodiment. The vehicle 600 may be configured in a fully or partially autonomous driving mode. For example, the vehicle 600 may acquire environmental information of its surroundings through the sensing system 620 and derive an automatic driving strategy based on an analysis of the surrounding environmental information to implement full automatic driving, or present the analysis result to the user to implement partial automatic driving.

The vehicle 600 may include various subsystems such as an infotainment system 610, a perception system 620, a decision control system 630, a drive system 640, and a computing platform 650. Alternatively, vehicle 600 may include more or fewer subsystems, and each subsystem may include multiple components. In addition, each of the sub-systems and components of the vehicle 600 may be interconnected by wire or wirelessly.

In some embodiments, the infotainment system 610 may include a communication system 611, an entertainment system 612, and a navigation system 613.

The communication system 611 may comprise a wireless communication system that may communicate wirelessly with one or more devices, either directly or via a communication network. For example, the wireless communication system may use 3G cellular communication, such as CDMA, EVD0, GSM/GPRS, or 4G cellular communication, such as LTE. Or 5G cellular communication. The wireless communication system may communicate with a Wireless Local Area Network (WLAN) using WiFi. In some embodiments, the wireless communication system may utilize an infrared link, bluetooth, or ZigBee to communicate directly with the device. Other wireless protocols, such as various vehicular communication systems, for example, a wireless communication system may include one or more Dedicated Short Range Communications (DSRC) devices that may include public and/or private data communications between vehicles and/or roadside stations.

The entertainment system 612 may include a display device, a microphone, and a sound box, and a user may listen to a broadcast in the car based on the entertainment system, playing music; or the mobile phone is communicated with the vehicle, screen projection of the mobile phone is realized on the display equipment, the display equipment can be in a touch control type, and a user can operate the display equipment by touching the screen.

In some cases, the voice signal of the user may be captured by a microphone, and certain control of the vehicle 600 by the user, such as adjusting the temperature in the vehicle, etc., may be implemented according to the analysis of the voice signal of the user. In other cases, music may be played to the user through a stereo.

The navigation system 613 may include a map service provided by a map provider to provide navigation of a route of travel for the vehicle 600, and the navigation system 613 may be used in conjunction with a global positioning system 621 and an inertial measurement unit 622 of the vehicle. The map service provided by the map provider can be a two-dimensional map or a high-precision map.

The sensing system 620 may include several types of sensors that sense information about the environment surrounding the vehicle 600. For example, the sensing system 620 may include a global positioning system 621 (the global positioning system may be a GPS system, a beidou system or other positioning system), an Inertial Measurement Unit (IMU) 622, a laser radar 623, a millimeter wave radar 624, an ultrasonic radar 625, and a camera 626. The sensing system 620 may also include sensors of internal systems of the monitored vehicle 600 (e.g., an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors may be used to detect the object and its corresponding characteristics (position, shape, orientation, velocity, etc.). Such detection and identification is a critical function of the safe operation of the vehicle 600.

Global positioning system 621 is used to estimate the geographic location of vehicle 600.

The inertial measurement unit 622 is used to sense a pose change of the vehicle 600 based on the inertial acceleration. In some embodiments, the inertial measurement unit 622 may be a combination of an accelerometer and a gyroscope.

Lidar 623 utilizes laser light to sense objects in the environment in which vehicle 600 is located. In some embodiments, lidar 623 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.

The millimeter-wave radar 624 utilizes radio signals to sense objects within the surrounding environment of the vehicle 600. In some embodiments, in addition to sensing objects, the millimeter-wave radar 624 may also be used to sense the speed and/or heading of objects.

The ultrasonic radar 625 may sense objects around the vehicle 600 using ultrasonic signals.

The camera 626 is used to capture image information of the surroundings of the vehicle 600. The image capturing device 626 may include a monocular camera, a binocular camera, a structured light camera, a panoramic camera, and the like, and the image information acquired by the image capturing device 626 may include still images or video stream information.

Decision control system 630 includes a computing system 631 that makes analytical decisions based on information acquired by sensing system 620, decision control system 630 further includes a vehicle control unit 632 that controls the powertrain of vehicle 600, and a steering system 633, throttle 634, and brake system 635 for controlling vehicle 600.

The computing system 631 may operate to process and analyze the various information acquired by the perception system 620 to identify objects, and/or features in the environment surrounding the vehicle 600. The targets may include pedestrians or animals, and the objects and/or features may include traffic signals, road boundaries, and obstacles. The computing system 631 may use object recognition algorithms, motion from Motion (SFM) algorithms, video tracking, and the like. In some embodiments, the computing system 631 may be used to map an environment, track objects, estimate the speed of objects, and so forth. The computing system 631 may analyze the various information obtained and derive a control strategy for the vehicle.

The vehicle controller 632 may be used to perform coordinated control on the power battery and the engine 641 of the vehicle to improve the power performance of the vehicle 600.

Steering system 633 is operable to adjust the heading of vehicle 600. For example, in one embodiment, a steering wheel system.

The throttle 634 is used to control the operating speed of the engine 641 and thus the speed of the vehicle 600.

The braking system 635 is used to control the deceleration of the vehicle 600. The braking system 635 may use friction to slow the wheel 644. In some embodiments, the braking system 635 may convert kinetic energy of the wheels 644 to electrical current. The braking system 635 may also take other forms to slow the rotational speed of the wheel 644 to control the speed of the vehicle 600.

The drive system 640 may include components that provide powered motion to the vehicle 600. In one embodiment, the drive system 640 may include an engine 641, an energy source 642, a transmission 643, and wheels 644. The engine 641 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine consisting of a gasoline engine and an electric motor, a hybrid engine consisting of an internal combustion engine and an air compression engine. The engine 641 converts the energy source 642 into mechanical energy.

Examples of energy sources 642 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electrical power. The energy source 642 may also provide energy to other systems of the vehicle 600.

The transmission 643 may transmit mechanical power from the engine 641 to the wheels 644. The transmission 643 may include a gearbox, a differential, and a drive shaft. In one embodiment, the transmission 643 may also include other devices, such as clutches. Wherein the drive shaft may include one or more axles that may be coupled to one or more wheels 644.

Some or all of the functions of the vehicle 600 are controlled by the computing platform 650. Computing platform 650 can include at least one first processor 651, which first processor 651 can execute instructions 653 stored in a non-transitory computer-readable medium, such as memory 652. In some embodiments, the computing platform 650 may also be a plurality of computing devices that control individual components or subsystems of the vehicle 600 in a distributed manner.

The first processor 651 may be any conventional processor, such as a commercially available CPU. Alternatively, the first processor 651 may also include a processor such as a Graphics Processor Unit (GPU), a Field Programmable Gate Array (FPGA), a System On Chip (SOC), an Application Specific Integrated Circuit (ASIC), or a combination thereof. Although fig. 7 functionally illustrates a processor, memory, and other elements of a computer in the same block, those skilled in the art will appreciate that the processor, computer, or memory may actually comprise multiple processors, computers, or memories that may or may not be stored within the same physical housing. For example, the memory may be a hard drive or other storage medium located in a different enclosure than the computer. Thus, reference to a processor or computer will be understood to include reference to a collection of processors or computers or memories that may or may not operate in parallel. Rather than using a single processor to perform the steps described herein, some components, such as the steering component and the retarding component, may each have their own processor that performs only computations related to the component-specific functions.

In the disclosed embodiment, the first processor 651 may perform the above-described target detection method.

In various aspects described herein, the first processor 651 may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are executed on a processor disposed within the vehicle and others are executed by a remote processor, including taking the steps necessary to perform a single maneuver.

In some embodiments, the memory 652 can include instructions 653 (e.g., program logic), which instructions 653 can be executed by the first processor 651 to perform various functions of the vehicle 600. Memory 652 may also contain additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of infotainment system 610, perception system 620, decision control system 630, drive system 640.

In addition to instructions 653, memory 652 may also store data such as road maps, route information, the location, direction, speed, and other such vehicle data of the vehicle, as well as other information. Such information may be used by the vehicle 600 and the computing platform 650 during operation of the vehicle 600 in autonomous, semi-autonomous, and/or manual modes.

Computing platform 650 may control functions of vehicle 600 based on inputs received from various subsystems (e.g., drive system 640, perception system 620, and decision control system 630). For example, computing platform 650 may utilize input from decision control system 630 in order to control steering system 633 to avoid obstacles detected by perception system 620. In some embodiments, the computing platform 650 is operable to provide control over many aspects of the vehicle 600 and its subsystems.

Optionally, one or more of these components described above may be mounted or associated separately from the vehicle 600. For example, the memory 652 may exist partially or completely separate from the vehicle 600. The above components may be communicatively coupled together in a wired and/or wireless manner.

Optionally, the above components are only an example, in an actual application, components in the above modules may be added or deleted according to an actual need, and fig. 7 should not be construed as limiting the embodiment of the present disclosure.

An autonomous automobile traveling on a roadway, such as vehicle 600 above, may identify objects within its surrounding environment to determine an adjustment to the current speed. The object may be another vehicle, a traffic control device, or another type of object. In some examples, each identified object may be considered independently, and based on the respective characteristics of the object, such as its current speed, acceleration, separation from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to be adjusted.

Optionally, the vehicle 600 or a sensory and computing device associated with the vehicle 600 (e.g., computing system 631, computing platform 650) may predict behavior of the identified object based on characteristics of the identified object and the state of the surrounding environment (e.g., traffic, rain, ice on the road, etc.). Optionally, each of the identified objects is dependent on the behavior of each other, so all of the identified objects can also be considered together to predict the behavior of a single identified object. The vehicle 600 is able to adjust its speed based on the predicted behavior of the identified object. In other words, the autonomous vehicle is able to determine what steady state the vehicle will need to adjust to (e.g., accelerate, decelerate, or stop) based on the predicted behavior of the object. In this process, other factors may also be considered to determine the speed of the vehicle 600, such as the lateral position of the vehicle 600 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and so forth.

In addition to providing instructions to adjust the speed of the autonomous vehicle, the computing device may also provide instructions to modify the steering angle of the vehicle 600 to cause the autonomous vehicle to follow a given trajectory and/or maintain a safe lateral and longitudinal distance from objects in the vicinity of the autonomous vehicle (e.g., vehicles in adjacent lanes on the road).

The vehicle 600 may be any type of vehicle, such as a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a recreational vehicle, a train, etc., and the disclosed embodiment is not particularly limited.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned object detection method when executed by the programmable apparatus.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An object detection method, applied to a vehicle, includes:

collecting an environment image of the surrounding environment in the driving process of the vehicle;

inputting the environment image into a three-dimensional target detection model to obtain target parameter information output by the three-dimensional target detection model, wherein the target parameter information comprises target position information, target size information and target angle information of a target object in the environment image;

2. The method of claim 1, wherein the three-dimensional object detection model is trained by:

acquiring a plurality of the first sample images;

3. The method according to claim 2, wherein the determining, according to the plurality of first sample parameter information and the plurality of two-dimensional parameter information, the target sample parameter information corresponding to each first sample image comprises:

4. The method according to claim 3, wherein the determining that there is a first target sample parameter information corresponding to the two-dimensional parameter information in the plurality of first sample parameter information comprises:

5. The method according to claim 4, wherein the determining the first detection frame corresponding to the first sample parameter information includes:

6. The method of claim 3, wherein the three-dimensional inspection model comprises a plurality of inspection modules, the inspection modules comprising a first convolutional layer, a second convolutional layer, and a predicted convolutional layer, an output of the first convolutional layer coupled to an input of the second convolutional layer, an output of the second convolutional layer coupled to an input of the predicted convolutional layer; the determining, by the three-dimensional detection model, a plurality of feature maps corresponding to the first target sample image includes:

7. The method of claim 1, wherein the three-dimensional inspection model is pre-trained by:

8. The method according to any one of claims 1-7, further comprising:

9. An object detection device, applied to a vehicle, comprising:

the acquisition module is configured to acquire an environment image of the surrounding environment in the running process of the vehicle;

an information acquisition module configured to input the environment image into a three-dimensional target detection model to acquire target parameter information output by the three-dimensional target detection model, the target parameter information including target position information, target size information, and target angle information of a target object in the environment image;

10. A vehicle, characterized by comprising:

a first processor;

a memory for storing processor-executable instructions;

wherein the first processor is configured to:

the steps of carrying out the method of any one of claims 1 to 8.

11. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 8.

12. A chip comprising a second processor and an interface; the second processor is to read an instruction to perform the method of any one of claims 1-8.