Disclosure of Invention
In view of this, the present disclosure provides at least a neural network training, image detection, driving control method, apparatus, electronic device, and storage medium.
In a first aspect, the present disclosure provides a neural network training method, comprising:
The method comprises the steps of acquiring a sample image and two-dimensional annotation data of the sample image, wherein the two-dimensional annotation data comprises detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing the three-dimensional pose of the target vehicle;
Training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional annotation data;
Wherein, after the sample image is input to the target neural network, each branch network outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of characterizing a three-dimensional pose of the target vehicle, respectively.
According to the method, the sample image and the two-dimensional annotation data of the sample image are obtained, the two-dimensional annotation data comprise detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle, the sample image and the two-dimensional annotation data are utilized to train the target neural network, and the two-dimensional annotation data comprise the at least one attribute information capable of representing the three-dimensional pose of the target vehicle, so that the trained target neural network can output and forecast to obtain the at least one attribute information, the data type for determining the three-dimensional pose information of the target vehicle is enriched, and then the three-dimensional pose information of the target vehicle can be determined more accurately by utilizing the forecast output by the target neural network to obtain the at least one attribute information and the detection frame information.
Meanwhile, the target neural network comprises a plurality of branch networks, after the sample image is input into the target neural network, each branch network respectively outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle, and the two-dimensional detection frame information and the at least one attribute information are output in parallel, so that the information detection efficiency is improved.
In a possible implementation manner, the attribute information includes at least one of the following:
First position information of any demarcation point on a demarcation line between adjacent visible faces of the target vehicle, second position information of a contact point between at least one visible wheel of the target vehicle and the ground, orientation information of the target vehicle;
the orientation information comprises first orientation information and/or second orientation information, and the second orientation indicated by the second orientation information is covered in the first orientation indicated by the first orientation information.
Here, a plurality of attribute information are set, so that the content of the attribute information is enriched, and the three-dimensional pose information of the target vehicle can be accurately determined according to at least one attribute information.
In a possible embodiment, where the orientation information includes first orientation information, the first orientation information includes a first category that characterizes a front of the vehicle and is not visible behind the vehicle, a second category that characterizes a rear of the vehicle and is not visible on the front of the vehicle, and a third category that characterizes a front of the vehicle and is not visible behind the vehicle.
In a possible embodiment, in case the orientation information comprises second orientation information, the second orientation information comprises a first intermediate category in which the front and rear sides of the vehicle are not visible and the left side of the vehicle is visible, a second intermediate category in which the front and rear sides of the vehicle are not visible and the right side of the vehicle is visible, a third intermediate category in which the rear and front sides of the vehicle are not visible and the right side of the vehicle is not visible, a fourth intermediate category in which the rear side of the vehicle is visible, the front side of the vehicle is not visible and the right side of the vehicle is visible, a fifth intermediate category in which the rear side of the vehicle is visible, the front side of the vehicle is not visible and the left side of the vehicle is visible, a sixth intermediate category in which the front side of the vehicle is visible, the rear side of the vehicle is not visible, the right side of the vehicle is visible, a seventh intermediate category in which the front side of the vehicle is visible, the rear of the vehicle is not visible and the left side of the vehicle is visible.
By adopting the method, the orientation of the target vehicle can be accurately represented through the set first orientation information and/or second orientation information.
In a possible implementation manner, in a case that the attribute information includes first position information of the demarcation point, acquiring two-dimensional labeling data of the sample image includes:
acquiring the coordinate information of the demarcation point in the horizontal direction of an image coordinate system corresponding to the sample image;
and determining the coordinate information of the target vehicle in the vertical direction indicated by the detection frame information as the coordinate information of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image.
By adopting the method, the coordinate information of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image can be determined, the coordinate information of the demarcation point in the horizontal direction can be obtained, and further, when the target neural network determines the first position information of the demarcation point, the coordinate information of the demarcation point in the horizontal direction can be determined, and the coordinate information of the demarcation point in the vertical direction does not need to be determined, so that the regression data types are reduced, the influence on other regression data (such as detection frame information) except the first position information of the demarcation point when the regression data types are more can be avoided, and the accuracy of other regression data is reduced.
In a possible implementation manner, in a case that the attribute information includes the second position information of the contact point, acquiring two-dimensional labeling data of the sample image includes:
The method comprises the steps of obtaining the coordinate information of the contact point in the horizontal direction of an image coordinate system corresponding to the sample image, determining the coordinate information of the target vehicle in the vertical direction indicated by the detection frame information as the coordinate information of the contact point in the vertical direction of the image coordinate system corresponding to the sample image, and/or,
And determining the coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image as the coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image.
By adopting the method, one coordinate information in the second position information of the contact point can be determined, the other coordinate information in the second position information of the contact point can be acquired, the regression data types are reduced, the influence on other regression data (such as detection frame information, detection frame types and the like) except the second position information of the contact point when the regression data types are more can be avoided, and the accuracy of the other regression data is reduced.
In a second aspect, the present disclosure provides an image detection method, the method comprising:
acquiring an image to be detected;
Inputting the image to be detected into a trained target neural network comprising a plurality of branch networks, and obtaining two-dimensional detection frame information of a vehicle in the image to be detected, which is output by the plurality of branch networks in parallel, and at least one attribute information capable of representing the three-dimensional pose of the vehicle;
And determining a detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.
In the method, the target neural network is obtained based on the neural network training method in the first aspect, so that the trained target neural network can more accurately output the two-dimensional detection frame information and at least one attribute information of the vehicle, and further, the detection result of the image to be detected can be more accurately determined based on the two-dimensional detection frame information and the at least one attribute information of the vehicle.
In a possible implementation manner, determining a detection result of the image to be detected based on two-dimensional detection frame information of the vehicle and the at least one attribute information includes:
Depth information of the vehicle is determined based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.
In a possible implementation manner, determining a detection result of the image to be detected based on two-dimensional detection frame information of the vehicle and the at least one attribute information includes:
And determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected and at least one attribute information corresponding to the vehicle.
In a possible embodiment, in a case where the at least one attribute information includes first position information of any demarcation point on a demarcation line between adjacent visible faces of the vehicle and second position information of a contact point between at least one visible wheel of the vehicle and the ground, the determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle includes:
determining two-dimensional compact frame information characterizing a single plane of the vehicle based on the first location information of the demarcation point, the second location information of the contact point, and the two-dimensional detection frame information of the vehicle;
and determining three-dimensional detection data of the vehicle based on the two-dimensional compact frame information and the image to be detected.
By adopting the method, because the space information contained in the two-dimensional compact frame information is more accurate, the three-dimensional detection data of the vehicle can be more accurately determined based on the two-dimensional compact frame information and the image to be detected.
In a third aspect, the present disclosure provides a running control method including:
acquiring a road image acquired by a running device in the running process;
Detecting the road image by using the target neural network trained by the neural network training method according to any one of the first aspect to obtain target detection data of a target vehicle included in the road image;
the running apparatus is controlled based on target detection data of a target vehicle included in the road image.
In a fourth aspect, the present disclosure provides a neural network training device, comprising:
The system comprises a first acquisition module, a second acquisition module and a first detection module, wherein the first acquisition module is used for acquiring a sample image and two-dimensional annotation data of the sample image, and the two-dimensional annotation data comprises detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing the three-dimensional pose of the target vehicle;
the training module is used for training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional annotation data;
Wherein, after the sample image is input to the target neural network, each branch network outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of characterizing a three-dimensional pose of the target vehicle, respectively.
In a fifth aspect, the present disclosure provides an image detection apparatus including:
The second acquisition module is used for acquiring an image to be detected;
The first generation module is used for inputting the image to be detected into a trained target neural network which comprises a plurality of branch networks, obtaining two-dimensional detection frame information of a vehicle in the image to be detected, which is output by the plurality of branch networks in parallel, and at least one attribute information capable of representing the three-dimensional pose of the vehicle;
and the determining module is used for determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.
In a sixth aspect, the present disclosure provides a travel control apparatus including:
the third acquisition module is used for acquiring road images acquired by the driving device in the driving process;
The second generating module is configured to detect the road image by using the target neural network trained by the neural network training method according to any one of the first aspects, so as to obtain target detection data of a target vehicle included in the road image;
and a control module for controlling the running apparatus based on target detection data of a target vehicle included in the road image.
In a seventh aspect, the present disclosure provides an electronic device comprising a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is in operation, the machine-readable instructions when executed by the processor performing the steps of the neural network training method of the first aspect or any of the embodiments described above, or performing the steps of the image detection method of the second aspect or any of the embodiments described above, or performing the steps of the travel control method of the third aspect described above, when executed.
In an eighth aspect, the present disclosure provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the neural network training method according to the first aspect or any of the embodiments, or performs the steps of the image detection method according to the second aspect or any of the embodiments, or performs the steps of the travel control method according to the third aspect.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
Generally, a camera may be disposed on a vehicle, an image is collected by the camera, a running vehicle is detected according to the collected image, a two-dimensional detection result of the running vehicle is obtained, and the obtained two-dimensional detection result of the running vehicle is input into a downstream module such as a tracking module and a ranging module to obtain three-dimensional information of the running vehicle, however, because the collected image lacks space information, the accuracy of the three-dimensional information of the running vehicle obtained based on the two-dimensional detection result is lower. In order to alleviate the above-mentioned problems, embodiments of the present disclosure provide a neural network training method.
The defects of the scheme are all results obtained by the inventor after practice and careful study, and therefore, the discovery process of the above problems and the solutions to the above problems set forth hereinafter by the present disclosure should be all contributions of the inventors to the present disclosure during the course of the present disclosure.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
For the convenience of understanding the embodiments of the present disclosure, a neural network training method, an image detection method and a driving control method disclosed in the embodiments of the present disclosure will be described in detail. The execution subject of the neural network training method, the image detection method, and the driving control method provided in the embodiments of the present disclosure is generally a computer device with a certain computing capability, where the computer device includes, for example, a terminal device or a server or other processing devices, and the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and so on. In some possible implementations, the neural network training method, the image detection method, and the driving control method may be implemented by a processor calling computer readable instructions stored in a memory.
Referring to fig. 1, a flowchart of a neural network training method according to an embodiment of the disclosure is shown, where the method includes S101-S102, where:
S101, acquiring a sample image and two-dimensional annotation data of the sample image, wherein the two-dimensional annotation data comprises detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing the three-dimensional pose of the target vehicle.
S102, training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional labeling data.
Wherein, after the sample image is input to the target neural network, each branch network outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of characterizing a three-dimensional pose of the target vehicle, respectively.
According to the method, the sample image and the two-dimensional annotation data of the sample image are obtained, the two-dimensional annotation data comprise detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle, the sample image and the two-dimensional annotation data are utilized to train the target neural network, and the two-dimensional annotation data comprise the at least one attribute information capable of representing the three-dimensional pose of the target vehicle, so that the trained target neural network can output and forecast to obtain the at least one attribute information, the data type for determining the three-dimensional pose information of the target vehicle is enriched, and then the three-dimensional pose information of the target vehicle can be determined more accurately by utilizing the forecast output by the target neural network to obtain the at least one attribute information and the detection frame information.
Meanwhile, the target neural network comprises a plurality of branch networks, after the sample image is input into the target neural network, each branch network respectively outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle, and the two-dimensional detection frame information and the at least one attribute information are output in parallel, so that the information detection efficiency is improved.
S101-S102 are specifically described below.
For S101:
the sample image may be any acquired image containing the target vehicle. The image may be an image containing the target vehicle in any scene, for example, an image of the target vehicle traveling on a road, an image of the target vehicle parked on a parking space, or the like may be included in the sample image. The target vehicle may be any motor vehicle, for example, a truck, a car, a minibus, or the like.
The two-dimensional annotation data comprises detection frame information of the target vehicle in the sample image, wherein the detection frame information can comprise position information of the detection frame, size information of the detection frame, types of the detection frame and the like, for example, the types of the detection frame can comprise small vehicles, medium vehicles, large vehicles and the like, or the types can also comprise cars, minibuses, off-road vehicles and the like, and the types can also be a first type belonging to the motor vehicle and a second type not belonging to the motor vehicle.
The detection frame information may be position information of four vertexes of the detection frame, a category of the detection frame, or the detection frame information may be position information of a center point of the detection frame, size information of the detection frame, a category of the detection frame, or the like.
The at least one attribute information capable of characterizing the three-dimensional pose of the target vehicle can comprise at least one of first position information of any demarcation point on a demarcation line between adjacent visible faces of the target vehicle, second position information of a contact point between at least one visible wheel of the target vehicle and the ground, and orientation information of the target vehicle, wherein the orientation information comprises first orientation information and/or second orientation information, and the second orientation indicated by the second orientation information is covered in the first orientation indicated by the first orientation information. Here, a plurality of attribute information are set, so that the content of the attribute information is enriched, and the three-dimensional pose information of the target vehicle can be accurately determined according to at least one attribute information.
Any demarcation point of the target vehicle may be any point on a demarcation line between two adjacent visible surfaces of the target vehicle on the sample image, for example, the demarcation line may be a line perpendicular to the ground where a lamp on the target vehicle is located, and the demarcation point may be an intersection point of the demarcation line and the detection frame. The visible wheel may be the wheel on the side of the visible face of the target vehicle contained in the sample image, and preferably the number of contact points may be two, i.e. the contact points of the two visible wheels with the ground.
Referring to fig. 2, a schematic diagram of a sample image includes a demarcation point 21, a contact point between two wheels and the ground, i.e., a first contact point 22, and a second contact point 23.
In an alternative embodiment, in the case where the attribute information includes the first location information of the demarcation point, in S101, acquiring the two-dimensional labeling data of the sample image may include:
s1011, acquiring the coordinate information of the demarcation point in the horizontal direction of an image coordinate system corresponding to the sample image;
S1012, determining the coordinate information of the target vehicle in the vertical direction indicated by the detection frame information as the coordinate information of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image.
Here, the first position information of the demarcation point includes abscissa information and ordinate information, that is, the abscissa information is coordinate information in the horizontal direction in the image coordinate system corresponding to the sample image, and the ordinate information is coordinate information in the vertical direction in the image coordinate system corresponding to the sample image.
Taking the demarcation point included in fig. 2 as an example, the detection frame information of the target object included in the sample image of fig. 2 may include position information of four vertices, i.e., (x 1,y1)、(x1,y2)、(x2,y1)、(x2,y2). Further, coordinate information x A of the demarcation point 21 in the horizontal direction of the image coordinate system corresponding to the sample image may be obtained, and coordinate information in the vertical direction indicated by the detection frame information of the target vehicle may be determined as coordinate information y 2 of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image, so that the first position information of the demarcation point may be obtained as (x A,y2).
By adopting the method, the coordinate information of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image can be determined, the coordinate information of the demarcation point in the horizontal direction can be obtained, and further, when the target neural network regresses to determine the first position information of the demarcation point, the coordinate information of the demarcation point in the horizontal direction can be determined, and the coordinate information of the demarcation point in the vertical direction does not need to be determined, so that the regression data types are reduced, the influence on other regression data (such as detection frame information) except the first position information of the demarcation point when the regression data types are more can be avoided, and the accuracy of other regression data is reduced.
In an alternative embodiment, in the case where the attribute information includes the second position information of the contact point, in S101, acquiring the two-dimensional labeling data of the sample image may include:
the method comprises the steps of firstly, acquiring coordinate information of a contact point in the horizontal direction of an image coordinate system corresponding to a sample image, and determining the coordinate information of a target vehicle in the vertical direction indicated by detection frame information as the coordinate information of the contact point in the vertical direction of the image coordinate system corresponding to the sample image;
And determining the coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image as the coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image.
In one aspect, the abscissa information of the contact point may be obtained, and the coordinate information of the contact point in the vertical direction of the image coordinate system corresponding to the sample image may be determined as the ordinate information of the contact point. In the second mode, ordinate information of the contact point and coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image can be acquired, and the coordinate information is determined as abscissa information of the contact point.
When the number of contact points is two, the second position information of one contact point may be determined in the first alternative, and the second position information of the other contact point may be determined in the second alternative.
Taking the touch point included in fig. 2 as an example, coordinate information x B1 of the first touch point 22 in the horizontal direction of the image coordinate system corresponding to the sample image may be obtained, and coordinate information of the vertical direction indicated by the detection frame information of the target vehicle may be determined as coordinate information y 2 of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image, that is, the second position information of the first touch point 22 may be (x B1,y2).
For the second contact point 23, coordinate information y B2 of the second contact point 23 in the vertical direction of the image coordinate system corresponding to the sample image may be acquired, and coordinate information of the horizontal direction indicated by the detection frame information of the target vehicle may be determined as coordinate information x 1 of the first contact point 22 in the horizontal direction of the image coordinate system corresponding to the sample image. I.e. the second position information of the first contact point 22 may be (x 1,yB2).
The selection of x 1 or x 2 may be determined according to the orientation information of the target vehicle in the sample image, or may be determined according to the location of the first contact point in the sample image. For example, x 2 is selected if the second contact point is to the right of the first contact point, and x 1 is selected if the first contact point is to the left of the second contact point. Or if the direction information of the target vehicle is in the fourth intermediate category or the seventh intermediate category, determining to select x 1, and if the direction information of the target vehicle is in the fifth intermediate category or the eighth intermediate category, determining to select x 2.
By adopting the method, one coordinate information in the second position information of the contact point can be determined, and the other coordinate information in the second position information of the contact point is acquired, so that regression data types are reduced, influence on other regression data (such as detection frame information, detection frame type and the like) except the second position information of the contact point when the regression data types are more can be avoided, and accuracy of the other regression data is reduced.
In an alternative embodiment, where the orientation information includes first orientation information, the first orientation information includes a first category that characterizes a front of the vehicle and an rear of the vehicle as invisible, a second category that characterizes a rear of the vehicle as visible and a front of the vehicle as invisible, and a third category that characterizes a front of the vehicle as visible and a rear of the vehicle as invisible.
In an alternative embodiment, where the orientation information includes second orientation information, the second orientation information includes a first intermediate category in which the front and rear sides of the vehicle are not visible and the left side of the vehicle is visible, a second intermediate category in which the front and rear sides of the vehicle are not visible and the right side of the vehicle is visible, a third intermediate category in which the rear and front sides of the vehicle are not visible and the right side of the vehicle is not visible, a fourth intermediate category in which the rear side of the vehicle is visible, the front side of the vehicle is not visible and the right side of the vehicle is visible, a fifth intermediate category in which the rear side of the vehicle is visible, the front side of the vehicle is not visible and the left side of the vehicle is visible, a sixth intermediate category in which the front side of the vehicle is not visible, the rear side of the vehicle is not visible and the right side of the vehicle is visible, and an eighth intermediate category in which the front side of the vehicle is visible, the rear of the vehicle is not visible and the left side of the vehicle is visible.
The second direction indicated by the second direction information covers the first direction indicated by the first direction information, namely the first category of the first direction information can comprise a first middle category and a second middle category in the second direction information, the second category of the first direction information can comprise a third middle category, a fourth middle category and a fifth middle category in the second direction information, and the third category of the first direction information can comprise a sixth middle category, a seventh middle category and an eighth middle category in the second direction information.
It can be seen that the first orientation information of the target vehicle included in fig. 2 is of the third category, and the second orientation information is of the seventh intermediate category.
Referring to fig. 3, a schematic diagram of a target vehicle is shown. For example, in fig. 3, the first orientation information corresponding to the target vehicle 31 may be a first type, the second orientation information may be a first intermediate type, the first orientation information corresponding to the target vehicle 32 may be a third type, the second orientation information may be an eighth intermediate type, the first orientation information corresponding to the target vehicle 33 may be a second type, the second orientation information may be a third intermediate type, the first orientation information corresponding to the target vehicle 34 may be a second type, and the second orientation information may be a fourth intermediate type.
For S102:
The sample image containing the two-dimensional labeling data can be input into a target neural network to be trained, wherein the target neural network comprises a plurality of score networks, and the target neural network is trained for a plurality of times until the accuracy of the trained target neural network is greater than a set accuracy threshold value or until the loss value of the trained target neural network is less than the set loss threshold value, so that the trained target neural network is obtained.
After the sample image is input into the target neural network, each branch network respectively outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle, namely a plurality of branch networks in the trained target neural network, wherein the branch networks are used for outputting the two-dimensional detection frame information of the vehicle and the at least one attribute information capable of representing the three-dimensional pose of the vehicle in parallel for any image to be detected.
Referring to fig. 4, a schematic diagram of a target neural network is shown. The sample image 41, the backbone network 42, and the plurality of branch networks 43 are included in fig. 4, and the plurality of branch networks may include a branch network corresponding to the detection frame information, a branch network corresponding to the category of the target vehicle, a branch network corresponding to the first position information of the demarcation point, a branch network corresponding to the second position information of the first contact point, a branch network corresponding to the second position information of the second contact point, and a branch network corresponding to the orientation information.
Referring to fig. 5, a flowchart of an image detection method according to an embodiment of the disclosure is shown, where the method includes S501-S503, where:
s501, acquiring an image to be detected;
S502, inputting an image to be detected into a trained target neural network comprising a plurality of branch networks, and obtaining two-dimensional detection frame information of a vehicle in the image to be detected, which is output by the plurality of branch networks in parallel, and at least one attribute information capable of representing the three-dimensional pose of the vehicle;
S503, determining a detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and at least one attribute information.
In the method, the target neural network is obtained based on the neural network training method in the first aspect, so that the trained target neural network can more accurately output the two-dimensional detection frame information and at least one attribute information of the vehicle, and further, the detection result of the image to be detected can be more accurately determined based on the two-dimensional detection frame information and the at least one attribute information of the vehicle.
For S501 and S502:
the image to be detected can be any image, the acquired image to be detected is input into a trained target neural network comprising a plurality of branch networks, and two-dimensional detection frame information of the vehicle and at least one attribute information capable of representing the three-dimensional pose of the vehicle are obtained, wherein the two-dimensional detection frame information of the vehicle is included in the image to be detected and is output by the plurality of branch networks in parallel. For example, two-dimensional detection frame information, first position information of a demarcation point, second position information of a contact point, orientation information and the like corresponding to the vehicle included in the image to be detected can be obtained.
For S503:
in an alternative embodiment, determining a detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and at least one attribute information includes:
depth information of the vehicle is determined based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.
The depth information of the vehicle may be a distance between a center of the vehicle and an image capturing device that captures an image to be detected. For example, the bird's eye view corresponding to the vehicle may be determined according to the two-dimensional detection frame information and at least one attribute information of the vehicle through a coordinate transformation operation process, and further the depth information of the vehicle may be determined according to the bird's eye view, or the trained neural network for determining the depth information may be used to determine the depth information of the vehicle according to the two-dimensional detection frame information and at least one attribute information of the vehicle.
In an alternative embodiment, determining a detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and at least one attribute information includes:
and determining three-dimensional detection data of the vehicle based on the image to be detected, the two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle.
For example, the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle may be input into a trained three-dimensional detection neural network, and three-dimensional detection data of the vehicle may be determined. The three-dimensional detection data of the vehicle may include position information of a three-dimensional detection frame of the vehicle, size information of the three-dimensional detection frame, a category of the three-dimensional detection frame, and the like.
In an alternative embodiment, in the case that the at least one attribute information includes first position information of any demarcation point on a demarcation line between adjacent visible faces of the vehicle and second position information of a contact point between at least one visible wheel of the vehicle and the ground, determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle, includes:
determining two-dimensional compact frame information representing a single plane of a vehicle based on first position information of a demarcation point, second position information of a contact point and two-dimensional detection frame information of the vehicle;
And step two, determining three-dimensional detection data of the vehicle based on the two-dimensional compact frame information and the image to be detected.
When the at least one attribute information includes a demarcation point, the two-dimensional detection frame of the vehicle may be divided into two detection frames using the first location information of the demarcation point corresponding to the vehicle, and the detection frames including the contact points may be determined as two-dimensional compact frame information characterizing a single plane of the vehicle. Referring to fig. 6, a schematic diagram of a two-dimensional compact frame is shown, and the detection frame on the right side of the dividing line is a determined two-dimensional compact frame, where 21 in fig. 6 is a dividing point, and 22 and 23 are two different contact points.
And then the two-dimensional compact frame information and the image to be detected can be input into a trained three-dimensional detection neural network to determine three-dimensional detection data of the vehicle.
By adopting the method, because the space information contained in the two-dimensional compact frame information is more accurate, the three-dimensional detection data of the vehicle can be more accurately determined based on the two-dimensional compact frame information and the image to be detected.
Referring to fig. 7, a flow chart of a driving control method according to an embodiment of the disclosure is shown, where the method includes S701-S703, where:
S701, acquiring a road image acquired by a running device in the running process;
S702, detecting a road image by using the target neural network trained by the neural network training method described in the embodiment to obtain target detection data of a target vehicle included in the road image;
S703, controlling the traveling apparatus based on the target detection data of the target vehicle included in the road image.
By way of example, the running gear may be an autonomous vehicle, a vehicle equipped with an advanced driving assistance system (ADVANCED DRIVING ASSISTANCE SYSTEM, ADAS), or a robot, etc. The road image may be an image acquired by the driving apparatus in real time during driving.
When the running device is controlled, the running device can be controlled to accelerate, decelerate, turn, brake and the like, or voice prompt information can be played to prompt a driver to control the running device to accelerate, decelerate, turn, brake and the like.
In the specific implementation, the road image can be input into the trained target neural network, the road image is detected to obtain target detection data of the target vehicle included in the road image, and the running device can be controlled based on the target detection data of the target vehicle. The control of the traveling apparatus based on the target detection data of the target vehicle may include inputting the target detection data of the target vehicle into a trained three-dimensional detection neural network, obtaining three-dimensional detection information of the target object, and controlling the traveling apparatus based on the detected three-dimensional detection information of the target object.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same concept, the embodiment of the present disclosure further provides a neural network training device, which is shown in fig. 8, and is a schematic architecture diagram of the neural network training device provided by the embodiment of the present disclosure, including a first obtaining module 801, a training module 802, and specifically:
The first acquisition module 801 is configured to acquire a sample image and two-dimensional annotation data of the sample image, where the two-dimensional annotation data includes detection frame information of a target vehicle in the sample image and at least one attribute information capable of characterizing a three-dimensional pose of the target vehicle;
A training module 802 for training a target neural network including a plurality of branch networks based on the sample image and the two-dimensional annotation data;
Wherein, after the sample image is input to the target neural network, each branch network outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of characterizing a three-dimensional pose of the target vehicle, respectively.
In a possible implementation manner, the attribute information includes at least one of the following:
First position information of any demarcation point on a demarcation line between adjacent visible faces of the target vehicle, second position information of a contact point between at least one visible wheel of the target vehicle and the ground, orientation information of the target vehicle;
the orientation information comprises first orientation information and/or second orientation information, and the second orientation indicated by the second orientation information is covered in the first orientation indicated by the first orientation information.
In a possible embodiment, where the orientation information includes first orientation information, the first orientation information includes a first category that characterizes a front of the vehicle and is not visible behind the vehicle, a second category that characterizes a rear of the vehicle and is not visible on the front of the vehicle, and a third category that characterizes a front of the vehicle and is not visible behind the vehicle.
In a possible embodiment, in case the orientation information comprises second orientation information, the second orientation information comprises a first intermediate category in which the front and rear sides of the vehicle are not visible and the left side of the vehicle is visible, a second intermediate category in which the front and rear sides of the vehicle are not visible and the right side of the vehicle is visible, a third intermediate category in which the rear and front sides of the vehicle are not visible and the right side of the vehicle is not visible, a fourth intermediate category in which the rear side of the vehicle is visible, the front side of the vehicle is not visible and the right side of the vehicle is visible, a fifth intermediate category in which the rear side of the vehicle is visible, the front side of the vehicle is not visible and the left side of the vehicle is visible, a sixth intermediate category in which the front side of the vehicle is visible, the rear side of the vehicle is not visible, the right side of the vehicle is visible, a seventh intermediate category in which the front side of the vehicle is visible, the rear of the vehicle is not visible and the left side of the vehicle is visible.
In a possible implementation manner, in a case where the attribute information includes the first position information of the demarcation point, the first obtaining module 801 is configured to, when obtaining the two-dimensional labeling data of the sample image:
acquiring the coordinate information of the demarcation point in the horizontal direction of an image coordinate system corresponding to the sample image;
and determining the coordinate information of the target vehicle in the vertical direction indicated by the detection frame information as the coordinate information of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image.
In a possible implementation manner, in a case where the attribute information includes the second position information of the contact point, the first obtaining module 801 is configured to, when obtaining the two-dimensional labeling data of the sample image:
The method comprises the steps of obtaining the coordinate information of the contact point in the horizontal direction of an image coordinate system corresponding to the sample image, determining the coordinate information of the target vehicle in the vertical direction indicated by the detection frame information as the coordinate information of the contact point in the vertical direction of the image coordinate system corresponding to the sample image, and/or,
And determining the coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image as the coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image.
Based on the same concept, the embodiment of the present disclosure further provides an image detection apparatus, referring to fig. 9, which is a schematic structural diagram of the image detection apparatus provided by the embodiment of the present disclosure, including a second obtaining module 901, a first generating module 902, and a determining module 903, specifically:
a second acquiring module 901, configured to acquire an image to be detected;
A first generating module 902, configured to input the image to be detected to a trained target neural network that includes a plurality of branch networks, obtain two-dimensional detection frame information of a vehicle in the image to be detected that is output by the plurality of branch networks in parallel, and at least one attribute information that can represent a three-dimensional pose of the vehicle;
A determining module 903, configured to determine a detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.
In a possible implementation manner, the determining module 903 is configured to, when determining a detection result of the image to be detected based on two-dimensional detection frame information of the vehicle and the at least one attribute information,:
Depth information of the vehicle is determined based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.
In a possible implementation manner, the determining module 903 is configured to, when determining a detection result of the image to be detected based on two-dimensional detection frame information of the vehicle and the at least one attribute information,:
And determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected and at least one attribute information corresponding to the vehicle.
In a possible implementation manner, in a case where the at least one attribute information includes first position information of any demarcation point on a demarcation line between adjacent visible faces of the vehicle and second position information of a contact point between at least one visible wheel of the vehicle and the ground, the determining module 903 is configured, when determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle, to:
determining two-dimensional compact frame information characterizing a single plane of the vehicle based on the first location information of the demarcation point, the second location information of the contact point, and the two-dimensional detection frame information of the vehicle;
and determining three-dimensional detection data of the vehicle based on the two-dimensional compact frame information and the image to be detected.
Based on the same concept, the embodiment of the present disclosure further provides a running control apparatus, referring to fig. 10, which is a schematic structural diagram of the running control apparatus provided by the embodiment of the present disclosure, including a third obtaining module 1001, a second generating module 1002, and a control module 1003, specifically:
a third acquiring module 1001, configured to acquire a road image acquired by a driving device during driving;
A second generating module 1002, configured to detect the road image by using the target neural network trained by the neural network training method described in the foregoing embodiment, to obtain target detection data of a target vehicle included in the road image;
A control module 1003 for controlling the traveling apparatus based on target detection data of a target vehicle included in the road image.
In some embodiments, the functions or templates included in the apparatus provided by the embodiments of the present disclosure may be used to perform the methods described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
Based on the same technical concept, the embodiment of the disclosure also provides electronic equipment. Referring to fig. 11, a schematic structural diagram of an electronic device according to an embodiment of the disclosure includes a processor 1101, a memory 1102, and a bus 1103. The memory 1102 is used for storing execution instructions, including a memory 11021 and an external memory 11022, where the memory 11021 is also called an internal memory, and is used for temporarily storing operation data in the processor 1101 and data exchanged with the external memory 11022 such as a hard disk, the processor 1101 exchanges data with the external memory 11022 through the memory 11021, and when the electronic device 1100 operates, the processor 1101 and the memory 1102 communicate with each other through the bus 1103, so that the processor 1101 executes the following instructions:
The method comprises the steps of acquiring a sample image and two-dimensional annotation data of the sample image, wherein the two-dimensional annotation data comprises detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing the three-dimensional pose of the target vehicle;
Training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional annotation data;
Wherein, after the sample image is input to the target neural network, each branch network outputs one of two-dimensional detection frame information of the target vehicle and at least one attribute information capable of characterizing a three-dimensional pose of the target vehicle, respectively.
Based on the same technical concept, the embodiment of the disclosure also provides electronic equipment. Referring to fig. 12, a schematic structural diagram of an electronic device according to an embodiment of the disclosure includes a processor 1201, a memory 1202, and a bus 1203. The memory 1202 is configured to store execution instructions, including a memory 12021 and an external memory 12022, where the memory 12021 is also referred to as an internal memory, and is configured to temporarily store operation data in the processor 1201 and data exchanged with the external memory 12022, such as a hard disk, where the processor 1201 exchanges data with the external memory 12022 through the memory 12021, and when the electronic device 1200 is running, the processor 1201 and the memory 1202 communicate with each other through the bus 1203, so that the processor 1201 executes the following instructions:
acquiring an image to be detected;
Inputting the image to be detected into a trained target neural network comprising a plurality of branch networks, and obtaining two-dimensional detection frame information of a vehicle in the image to be detected, which is output by the plurality of branch networks in parallel, and at least one attribute information capable of representing the three-dimensional pose of the vehicle;
And determining a detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.
Based on the same technical concept, the embodiment of the disclosure also provides electronic equipment. Referring to fig. 13, a schematic structural diagram of an electronic device according to an embodiment of the disclosure includes a processor 1301, a memory 1302, and a bus 1303. The memory 1302 is configured to store execution instructions, including a memory 13021 and an external memory 13022, where the memory 13021 is also referred to as an internal memory, and is configured to temporarily store operation data in the processor 1301 and data exchanged with the external memory 13022, such as a hard disk, where the processor 1301 exchanges data with the external memory 13022 through the memory 13021, and when the electronic device 1300 is running, the processor 1301 and the memory 1302 communicate with each other through the bus 1303, so that the processor 1301 executes the following instructions:
acquiring a road image acquired by a running device in the running process;
Detecting the road image by using the target neural network trained by the neural network training method described in the above embodiment to obtain target detection data of a target vehicle included in the road image;
the running apparatus is controlled based on target detection data of a target vehicle included in the road image.
In addition, the embodiment of the present disclosure further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, performs the steps of the neural network training method, the image detection method, and the travel control method described in the above method embodiments. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, where the computer program product carries program codes, and instructions included in the program codes may be used to execute the steps of the neural network training method, the image detection method, and the driving control method described in the foregoing method embodiments, and specifically, reference may be made to the foregoing method embodiments, which are not repeated herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The foregoing is merely a specific embodiment of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it should be covered in the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.