CN113807315B

CN113807315B - Method, device, equipment and medium for constructing object recognition model to be recognized

Info

Publication number: CN113807315B
Application number: CN202111171015.0A
Authority: CN
Inventors: 陈茜
Original assignee: Wensihai Huizhike Technology Co ltd
Current assignee: Wensihai Huizhike Technology Co ltd
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2024-06-04
Anticipated expiration: 2041-10-08
Also published as: CN113807315A

Abstract

The application provides a method, a device, equipment and a medium for constructing an object recognition model to be recognized, wherein the method comprises the following steps: acquiring a sample picture; aiming at each sample picture with an object to be identified in the sample pictures, acquiring a first position coordinate of the object graph to be identified in the sample picture; inputting the sample picture into an object to be identified for identifying an initial model aiming at each sample picture to obtain a second position coordinate of a predicted graph of the object to be identified; and training the initial model for identifying the object to be identified based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified, so as to obtain the model for identifying the object to be identified. According to the method and the device, the problem that the recognition accuracy of the recognition model of the object to be recognized obtained through training in the prior art is not high is solved.

Description

Method, device, equipment and medium for constructing object recognition model to be recognized

Technical Field

The present application relates to the field of computer information technology, and in particular, to a method, an apparatus, a device, and a medium for constructing an object recognition model to be recognized.

Background

With the vigorous development of automation technology in recent years, the requirements for automatic detection and automatic identification of pictures are also increasing. For example, traffic signs are used as important components of road facilities and important carriers of road traffic information, which contain many key traffic information such as speed limit prompts, vehicles, road conditions and the like, and can provide road information for drivers and timely provide safety warning for the drivers so as to urge the drivers to drive carefully, so that the recognition of traffic signs in the field of automatic driving needs to be faster and more accurate.

In the prior art, there are many methods for identifying pictures, more commonly, by constructing an identification model, and inputting a picture to be identified into the identification model, it can be obtained whether the input picture to be identified contains a desired object to be identified. However, in the method, when the recognition model is constructed, a sample picture containing the object to be recognized and a sample picture not containing the object to be recognized are used as training sets, whether the input sample picture contains a predicted result of the object to be recognized or not is compared with an actual result of whether the sample picture contains the object to be recognized or not through the model, so that training of the object to be recognized recognition model is completed, but the recognition accuracy of the object to be recognized recognition model obtained through training is not high because the training is performed only according to whether the sample picture contains the object to be recognized or not.

Disclosure of Invention

In view of the above, the present application aims to provide a method, apparatus, device and medium for constructing an object recognition model to be recognized, so as to solve the problem of low recognition accuracy of the object recognition model to be recognized obtained by training in the prior art.

In a first aspect, an embodiment of the present application provides a method for constructing an object to be identified identification model, where the method includes:

acquiring a sample picture;

Aiming at each sample picture with an object to be identified in the sample pictures, acquiring a first position coordinate of the object graph to be identified in the sample picture;

inputting the sample picture into an object to be identified for identifying an initial model aiming at each sample picture to obtain a second position coordinate of a predicted graph of the object to be identified;

And training the initial model for identifying the object to be identified based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified, so as to obtain the model for identifying the object to be identified.

Further, the first position coordinate is a contour position coordinate of the object to be identified in the sample picture, and training the initial model to be identified based on the second position coordinate of the predicted graph of the object to be identified and the first position coordinate of the graph of the object to be identified includes:

If the sample picture corresponding to the object to be identified prediction graph is a picture without the object to be identified, adjusting training parameters of the initial model to be identified until the object to be identified prediction graph output by the initial model to be identified is empty;

If the sample picture corresponding to the object prediction graph to be identified is a picture with the object to be identified, acquiring a first pixel point of the object prediction graph to be identified from the object prediction graph to be identified;

acquiring a first pixel number marked as an object to be identified in a sample picture, and acquiring a second pixel number marked as the object to be identified from an object to be identified prediction graph;

Calculating a loss value based on the second position coordinates of the first pixel points, the first position coordinates corresponding to the first pixel points, and the first pixel number and the second pixel number;

And if the loss value is larger than a preset loss threshold value, adjusting the training parameters of the initial model of the object to be identified until the loss value of the initial model of the object to be identified is not larger than the loss threshold value.

Further, the method further comprises:

adjusting the acquired sample picture to the input picture size required by the object identification model to be identified;

performing data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture;

selecting the enhancement pictures of the random numbers for splicing to obtain spliced pictures;

adjusting the spliced picture to the size of the input picture, and acquiring the position coordinates of each object to be identified in the spliced picture with the adjusted size;

and expanding the sample picture with the adjusted size according to the spliced picture with the adjusted size.

Further, the data enhancement includes: random scaling, color gamut variation, flipping.

Further, the data enhancement includes random amplification, and the data enhancement processing is performed on the sample picture with the adjusted size to obtain an enhanced picture, including:

And adding an additional bar around the sample picture with the adjusted size to obtain an enhanced picture with the additional bar.

Further, the method further comprises:

Acquiring a picture to be identified, and adjusting the acquired picture to be identified to the size of an input picture required by the object identification model to be identified;

and inputting the size-adjusted picture to be identified into the object identification model to be identified, so as to obtain the object picture to be identified.

Further, the object to be identified is a traffic sign, and the method further comprises:

And inquiring a preset mapping relation library of each traffic sign template diagram and traffic sign types, and identifying the traffic sign type of the object diagram to be identified.

In a second aspect, an embodiment of the present application provides an apparatus for constructing an object recognition model to be recognized, where the apparatus includes:

The sample picture acquisition module is used for acquiring a sample picture;

The first position coordinate acquisition module is used for acquiring first position coordinates of the object graph to be identified in each sample picture with the object to be identified in the sample pictures;

the second position coordinate acquisition module inputs the sample picture into an object to be identified identification initial model aiming at each sample picture to obtain second position coordinates of an object to be identified prediction graph;

The object to be identified identification model determining module is used for training the initial model to be identified to obtain the object to be identified identification model based on the second position coordinates of the object to be identified prediction graph and the first position coordinates of the object to be identified graph.

In a third aspect, an embodiment of the present application further provides an electronic device, including: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device is running, and the machine-readable instructions are executed by the processor to perform the steps of the method for constructing the object recognition model to be recognized.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of constructing an object recognition model to be recognized as described above.

The method and the device for constructing the object recognition model to be recognized firstly acquire a sample picture; then, aiming at each sample picture with an object to be identified in the sample pictures, acquiring a first position coordinate of the object graph to be identified in the sample picture; inputting the sample picture into an object to be identified for identifying an initial model aiming at each sample picture to obtain a second position coordinate of a predicted graph of the object to be identified; and finally, training the initial model for identifying the object to be identified based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified, so as to obtain the model for identifying the object to be identified.

According to the method and the device for constructing the object recognition model to be recognized, when the initial model to be recognized is trained, the position coordinates of the object graph to be recognized in the sample picture are compared with the position coordinates of the predicted object to be recognized in the predicted picture, and then the pixel marks are used for comparison, so that the model is trained based on the coordinate positions in the training mode, the training precision is higher and finer, and the accuracy of predicting the obtained object recognition model to be recognized is further improved. The higher the accuracy of the object to be identified identification model prediction, the more accurate the identified object to be identified.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for constructing an object recognition model to be recognized according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for training an initial model of object recognition to be recognized according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an apparatus for constructing an object recognition model to be recognized according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment obtained by a person skilled in the art without making any inventive effort falls within the scope of protection of the present application.

It has been found that there are many methods of identifying objects to be identified in the prior art, such as color-based detection, shape-based detection, multi-feature fusion-based detection, and candidate region-based target detection algorithms. However, the above-described methods have many disadvantages.

The color-based detection method is divided into two types, namely an RGB color model method, the method is used for directly dividing the acquired RGB picture, so that the calculated amount can be reduced, the speed is greatly improved, the algorithm real-time requirement is met, but the method also has certain defects, when the environment where the traffic sign is located is complex, the traffic sign can be mixed with background noise, and the algorithm cannot achieve a good detection effect; the other is the HSI color model method, and the HSI color space has the characteristics of constant illumination, so the robustness is better, but the RGB is converted into the HSI color space with a certain amount of calculation, and the real-time performance is required to be improved by means of hardware processing.

The basic idea of the shape-based detection method is to divide an image into cells and accumulate histograms of edge directions within the cells, and finally generate features to describe an object by combining the histogram entries. This approach has the advantage of rotational scaling invariance, but is computationally too extensive.

The detection method based on multi-feature fusion combines the information of RGB and HIS color channels to divide traffic signs. The algorithm combines the segmentation results of RGB and HIS color spaces, overcomes the defect of image information deficiency caused by S space segmentation in the HIS space, improves the detection accuracy, but the method has extremely low detection speed and cannot meet the requirement of real-time application.

The candidate region-based object detection algorithm contains rich feature layer structures for accurate object detection and semantic segmentation, and excellent object detection accuracy is achieved by classifying object proposals using a deep convolutional neural network, but the detection speed of this method is slow because it repeatedly extracts and stores features of each candidate region, which takes a lot of computation time and storage resources.

Whether the detection method of the object to be identified is based on color, shape, multi-feature fusion or candidate region, the corresponding identification models exist, and in the prior art, when the models are trained, the training of the identification models is basically performed by using the whole picture to be identified, for example, the identification models are trained based on the color of the whole object to be identified or the shape of the whole object to be identified. However, the accuracy of the training mode is not high, and the accuracy of prediction of the obtained recognition model is also not high, so that recognition errors possibly occur when the object to be recognized is recognized, and when the model with low recognition accuracy is applied to the automatic driving field, not only can wrong recognition signals be sent to an automatic driving vehicle, but also traffic accidents are likely to be caused due to the fact that wrong traffic signs are recognized, and large potential safety hazards exist.

Based on the above, the embodiment of the application provides a method for constructing an object to be identified, which aims to solve the problem that the identification precision of the object to be identified obtained by training in the prior art is not high, and improve the identification precision of the object to be identified obtained by training.

Referring to fig. 1, fig. 1 is a flowchart of a method for constructing an object recognition model to be recognized according to an embodiment of the present application. As shown in fig. 1, a method for constructing an object to be identified identification model according to an embodiment of the present application includes:

s101, acquiring a sample picture.

It should be noted that the sample picture refers to each training sample in the model training set for training the prediction model. The sample picture can be a picture with or without the object to be identified. As an alternative implementation manner, the sample picture may be a picture with a traffic sign or a picture without a traffic sign. Traffic signs refer to assets that communicate guidance, restriction, warning, or instructional information in words or symbols. In traffic signs, the traffic signs are generally important measures for implementing traffic management and ensuring the safety and smoothness of road traffic. The sample picture with the traffic sign can contain various types of traffic signs and can be distinguished in various ways: a primary flag and a secondary flag; a movable flag and a fixed flag; illumination signs, light-emitting signs and light-reflecting signs; and a variable information flag reflecting a change in driving environment. After the sample picture is obtained, the traffic sign in the sample picture is also required to be identified. In particular, there are many methods for identifying traffic signs in sample pictures, for example, manually identifying sample pictures, or identifying traffic signs by using existing color-based, shape-based, multi-feature fusion-based, candidate region-based target detection algorithms. The technical scheme of how to perform traffic sign recognition based on the color, shape, multi-feature fusion and candidate region based target detection algorithm is described in detail in the prior art, and is not repeated here. As an optional implementation manner, the sample picture may be a picture taken by a camera or a picture uploaded by a user, which is not particularly limited in the present application.

Here, it should be noted that the above examples for sample pictures are merely examples, and in practice, sample pictures are not limited to the above examples.

When the sample pictures are used for training the object recognition model to be recognized, the sizes of different sample pictures are possibly different, so that the sizes of the obtained sample pictures are adjusted to the same size, and the speed of constructing the object recognition model to be recognized can be improved. As an alternative embodiment, a sample picture is obtained by:

And step 1011, adjusting the acquired sample picture to the input picture size required by the object identification model to be identified.

It should be noted that the object to be identified identifying model refers to a model for identifying an object to be identified in a picture. The input picture size refers to the size of a picture required by an object to be identified identification model, which is preset in advance.

For the above step 1011, in implementation, the sample picture obtained in step S101 is adjusted to the input picture size required by the object recognition model to be recognized, so as to obtain a sample picture with the same size as the input picture. Judging whether the size of the sample picture is larger than the size of the input picture, if the size of the sample picture is larger than the size of the input picture, reducing the size of the sample picture to the size of the input picture, and obtaining a sample picture with the same size. If the size of the sample picture is smaller than the size of the input picture, adding an additional bar around the sample picture to obtain a sample picture with the same size, so that the size of the sample picture with the same size is the same as the size of the input picture. Here, the additional bar refers to an additional bar of the same color that is added one turn around the original picture outside the normal picture of the original picture. As an alternative embodiment, the additional strip may be black or gray, and the present application is not particularly limited thereto. In the implementation, after judging that the size of the sample picture is smaller than the size of the input picture, adding additional strips around the sample picture, wherein the sample picture with the additional strips added is the same as the size of the input picture, and the sample picture with the same size is obtained. For example, the sample picture obtained is 16:9, the input picture size is 4:3, and an additional strip is needed around the original sample picture to make the adjusted sample picture size reach 4:3.

Here, it should be noted that the above selection of the color of the additional bar is merely an example, and in practice, the color of the additional bar is not limited to the above example.

In this way, when the object to be identified is constructed, the sample pictures are adjusted to the same size, and all the sample pictures can be adjusted to the picture size required by the traffic sign identification model, so that the size problem of the sample pictures is not required to be considered when the object to be identified is constructed, the size of each processed sample picture is the same as the size of the input picture, and the construction speed of the object to be identified can be improved.

Step 1012, performing data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture.

As an alternative embodiment, the data enhancement includes: random scaling, color gamut variation, flipping.

The enhanced picture refers to a picture obtained by performing data enhancement processing on a sample picture of an adjusted size. Random scaling refers to an operation of performing size scaling on a sample picture of an adjusted size, color gamut variation refers to an operation of changing brightness, saturation, and hue of the sample picture of the adjusted size, and flipping refers to an operation of flipping left and right of the sample picture of the adjusted size.

As an optional implementation manner, the data enhancement includes random amplification, and performing data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture, including:

It should be noted that random magnification refers to an operation of randomly magnifying a sample picture of an adjusted size. The additional bars refer to additional bars with the same color, which are added around the original picture by one circle outside the normal picture of the original picture. As an alternative embodiment, the additional strip may be black or gray, and the present application is not particularly limited thereto. In a specific implementation, when the data enhancement operation is a random amplification operation, an additional bar may be added around the resized sample picture, resulting in an enhancement map with the additional bar. Specifically, because the colors in the additional bars are uniform, when the initial model for identifying the object to be identified identifies the enhanced picture with the additional bars, when the color of a pixel point in the enhanced picture is detected to be the same as the color of the preset additional bars, the position corresponding to the pixel point is considered to not necessarily contain the object to be identified, so that the initial model for identifying the object to be identified only identifies images except the additional bars when the initial model for identifying the object to be identified identifies the enhanced picture with the additional bars.

And step 1013, selecting the enhancement pictures with random numbers for splicing to obtain spliced pictures.

It should be noted that stitching refers to stitching at least two enhanced pictures into one stitched picture. As an alternative embodiment, four enhancement pictures may be randomly selected and stitched. The Mosaic of the enhanced pictures can be enhanced by using mosaics data. Specifically, the mosaics data enhancement is to randomly select four enhancement pictures, and splice the four enhancement pictures in random distribution to obtain a spliced picture. In the specific implementation, the four enhancement pictures are first randomly read, and the four enhancement pictures are spliced together in a random distribution manner, for example, the four enhancement pictures are arranged in the order of arranging the first enhancement picture at the upper left corner, the second enhancement picture at the upper right corner, the third enhancement picture at the lower left corner and the fourth enhancement picture at the lower right corner. After the placement of the four enhanced pictures is completed, the fixed areas of the four enhanced pictures are cut off in a matrix mode, and then the four enhanced pictures are spliced to form a new picture serving as a spliced picture.

The splicing mode greatly enriches the model training set, particularly random scaling increases a plurality of small targets, and the robustness of the prediction model can be better. And splice a plurality of pictures before predicting to obtain a spliced picture, then transfer the spliced picture into the object to be identified and identify the initial model for learning, which is equivalent to transmitting four reinforcements to the neural network for learning at a time, enriches the background of the detected object, and can calculate the data of a plurality of sample pictures at a time when identifying the object to be identified, so that a GPU can achieve a better effect.

Here, it should be noted that the above selection of the splicing manner of the enhanced pictures and the selection of the splicing number of the enhanced pictures are merely examples, and in practice, the splicing manner of the enhanced pictures and the splicing number of the enhanced pictures are not limited to the above examples.

Step 1014, adjusting the stitched image to the size of the input image, and obtaining the position coordinates of each object to be identified in the stitched image with the adjusted size.

For the step 1014, after obtaining the stitched image, the stitched image is adjusted to the size of the input image, and specifically, the method for adjusting the size of the stitched image is the same as the method for adjusting the obtained sample image to the size of the input image in the step 1011, which is not described herein. After the size is adjusted, the position coordinates of each object to be identified in the adjusted size spliced picture are also required to be obtained. When the sample pictures are randomly scaled and spliced, the position coordinates of the object to be identified in the sample pictures are changed. For example, the sample picture has a size of 500 pixels×500 pixels, and the position coordinates of the object to be identified in the sample picture are (100, 50). When the splicing is performed, the size of the sample picture is reduced to 50% of the original sample picture. The size of the sample picture after the reduction is 250 pixels×250 pixels, and the position coordinates of the object to be identified in the sample picture after the reduction are (50, 25).

Step 1015, expanding the sample picture with the adjusted size according to the spliced picture with the adjusted size.

For step 1024, after obtaining the resized stitched image, the resized stitched image is also used as the resized sample image, so that training data for constructing the initial model for identifying the object to be identified can be richer, and the constructed model for identifying the object to be identified is more accurate.

S102, acquiring a first position coordinate of an object graph to be identified in each sample picture with the object to be identified in the sample pictures.

It should be noted that, the object to be identified refers to an object that exists in the sample picture and is to be identified from the sample picture. The first position coordinates are used to characterize contour position coordinates of the graphic of the object to be identified in the sample picture. Continuing with the previous embodiment, when the sample picture is a picture with a traffic sign, the image to be identified is the traffic sign in the sample picture, and the first position coordinate is the first position coordinate of the traffic sign in the sample picture.

For the step S102, for each sample picture with the object to be identified in the sample pictures, a first position coordinate of the object graph to be identified in the sample picture is obtained. Specifically, after a sample picture with an object to be identified is identified, a contour of the object to be identified is obtained, contour pixel points of the object to be identified in the sample picture are marked relative to the sample picture according to pixel points in the contour, after the contour pixel points of the object to be identified in the sample picture are obtained, a coordinate system can be established by taking a top point of the lower left corner of the sample picture as an origin, and a first position coordinate of the image of the object to be identified in the sample picture is determined based on the coordinate system.

Here, it should be noted that the above-described manner of acquiring the first position coordinates of the object graphic to be recognized in the sample picture is merely an example, and in practice, the manner of stitching the enhanced picture and the number of stitches of the enhanced picture are not limited to the above-described examples.

And S103, inputting the sample picture into an object to be identified identification initial model for each sample picture to obtain a second position coordinate of the predicted graph of the object to be identified.

It should be noted that, the object to be identified identifies an initial model, which is used to identify the object to be identified in the sample picture. The object to be identified predicts the graph and refers to the graph identified by the initial model for identifying the object to be identified aiming at the sample picture. Since the sample picture may be a picture with or without the object to be identified, the second position coordinates of the predicted pattern of the object to be identified by the initial model may not exist.

For the above step S103, in implementation, for each sample picture, the sample picture is input into the initial model for identifying the object to be identified, and the neural network in the initial model for identifying the object to be identified is used to determine the second position coordinates of the predicted graph of the object to be identified in the sample picture.

Specifically, after the initial model for identifying the object to be identified determines the predicted graph of the object to be identified in the sample picture, the obtained predicted graph of the object to be identified is also required to be marked so as to obtain the second position coordinate of the predicted graph of the object to be identified in the sample picture. Specifically, after the predicted graph of the object to be identified is identified, the contour of the predicted graph of the object to be identified can be obtained, a coordinate system can be established by taking the top point of the lower left corner of the sample picture as the origin after the contour pixel point of the predicted graph of the object to be identified in the sample picture is obtained according to the pixel point in the contour relative to the contour pixel point of the predicted graph of the object to be identified in the sample picture, and the second position coordinate of the predicted graph of the object to be identified in the sample picture can be determined based on the coordinate system.

S104, training the initial model of the object to be identified based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified, and obtaining the model of the object to be identified.

Aiming at the step S104, after determining the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified, training the initial model of the object to be identified by using the two parameters to obtain the model of the object to be identified.

The first position coordinates are contour position coordinates of the object to be identified in the same-size picture of the sample.

Referring to fig. 2, fig. 2 is a flowchart of a method for training an initial model of object recognition to be recognized according to an embodiment of the present application. As shown in fig. 2, the training the initial model for identifying the object to be identified based on the second position coordinate of the predicted graph of the object to be identified and the first position coordinate of the graph of the object to be identified includes:

S201, if the sample picture corresponding to the object to be identified prediction graph is a picture without the object to be identified, adjusting the training parameters of the initial model to be identified until the object to be identified prediction graph output by the initial model to be identified is empty.

For the above step S201, the sample picture includes a picture with the object to be identified, and also includes a picture without the object to be identified. When the initial model for identifying the object to be identified is identified aiming at the picture without the object to be identified, an object to be identified prediction graph is obtained, and when the identification of the initial model for identifying the object to be identified is considered to be wrong, the training parameters of the initial model for identifying the object to be identified are required to be modified, and specifically, the training parameters can be the learning rate, the network parameters and the like of the initial model for identifying the object to be identified. The method comprises the steps that an initial model to be identified is subjected to continuous adjustment of training parameters of the initial model to be identified in an iterative mode, an initial model to be identified predicted graph of the initial model to be identified is output again in each iterative step, when the initial model to be identified is not empty, the training parameters of the initial model to be identified are continuously adjusted, new initial model to be identified is output to obtain new initial model to be identified, and the initial model to be identified is identified accurately until the initial model to be identified is empty.

S202, if a sample picture corresponding to the object prediction graph to be identified is a picture with the object to be identified, a first pixel point of the object prediction graph to be identified is obtained from the object prediction graph to be identified.

For the step S202, when the initial model for identifying the object to be identified identifies the picture with the object to be identified, the initial model for identifying the object to be identified outputs a predicted graph of the object to be identified. The range of the object to be identified exists in the object to be identified prediction graph, and the range of the object not to be identified exists, so that a first pixel point marked as the object to be identified needs to be acquired. Here, the pixel means that in one image, the image is divided into a plurality of small squares, each small square becomes one pixel. According to the embodiment provided by the application, the obtained predicted graph of the object to be identified is divided into a plurality of small squares, and the pixel marked as the object to be identified is obtained as a first pixel.

S203, the first pixel number marked as the object to be identified in the sample picture is obtained, and the second pixel number marked as the object to be identified is obtained from the object to be identified prediction graph.

The number of pixels refers to the total number of pixels for marking the object to be identified. For the above step S203, in implementation, the total number of pixels of the object to be identified marked in the sample picture is obtained from the sample picture based on the object to be identified in the sample picture as the first pixel number of the object to be identified. And acquiring the total number of the marked pixels of the object to be recognized based on the first pixels marked as the object to be recognized in the object to be recognized prediction graph output by the initial model to be recognized as the second pixel number of the object to be recognized.

S204, calculating a loss value based on the second position coordinates of the first pixel point, the first position coordinates corresponding to the first pixel point, the first pixel number and the second pixel number.

The loss value (loss function) is a function value that maps the value of a random event or a random variable related thereto to a non-negative real number to represent the "risk" or "loss" of the random event. In applications, the loss values are typically associated with optimization problems as learning criteria, i.e. solving and evaluating the model by minimizing the loss function.

For the step S204, when calculating the loss value of the initial model for identifying the object to be identified, the method includes two parts, wherein one part calculates the loss value by using the error between the first position coordinate and the second position coordinate of the first pixel point, and the other part judges the accuracy of identifying the initial model for identifying the object to be identified, and calculates the loss value by using the first pixel number and the second pixel number.

When the loss is calculated by utilizing the error between the first position coordinate and the second position coordinate of the first pixel point, whether the prediction of the object to be recognized for recognizing the initial model is accurate or not is judged by comparing the first position coordinate and the second position coordinate of the first pixel point, and when the first position coordinate and the second position coordinate of the first pixel point are different, the prediction of the object to be recognized for recognizing the initial model is considered to be inaccurate. For example, if the determined first position coordinate of the first pixel point is (250 ) and the determined second position coordinate of the first pixel point is (100, 50), then an error exists between the first position coordinate and the second position coordinate of the first pixel point, that is, the prediction of the object to be identified for identifying the initial model is inaccurate. At this time, a loss value of the initial model for identifying the object to be identified in the current state needs to be calculated. The manner in which the loss value is calculated is described in detail in the prior art and will not be described in any greater detail herein.

When the first pixel number and the second pixel number are used for calculating the loss value, whether the prediction of the initial model of the object to be recognized is accurate or not is judged by comparing the first pixel number with the second pixel number, when the first pixel number and the second pixel number are different, the prediction of the initial model of the object to be recognized is considered to be inaccurate, and then the loss value of the initial model of the object to be recognized in the current state needs to be calculated. The manner in which the loss value is calculated is described in detail in the prior art and will not be described in any greater detail herein.

When the sample picture is a spliced sample picture, that is, the sample picture may include a plurality of sample pictures, the obtained first pixel point marked as the object to be identified may also have a plurality of corresponding first position coordinates, second position coordinates, first pixel number and second pixel number. At this time, the parameters corresponding to the objects to be identified need to be compared respectively. For example, the sample picture is obtained by splicing two sample pictures, wherein the sample picture comprises a sample picture A and a sample picture B, the sample picture A comprises an object A to be identified, and the sample picture B comprises an object B to be identified. After the sample picture is input into the initial model for identifying the object to be identified, the initial model for identifying the object to be identified correspondingly outputs two predicted graphs of the object to be identified, namely the predicted graph A of the object to be identified in the sample picture A, and the predicted graph B of the object to be identified in the sample picture B. At this time, the two object prediction graphs to be identified are required to be compared respectively, the object prediction graph A to be identified is compared with the object A to be identified in the sample picture A, the object prediction graph B to be identified is compared with the object B to be identified in the sample picture B, and whether the prediction of the initial model for identifying the object to be identified is accurate or not is judged.

And S205, if the loss value is larger than a preset loss threshold value, adjusting the training parameters of the initial model to be identified until the loss value of the initial model to be identified is not larger than the loss threshold value.

In the embodiment provided by the application, the loss threshold refers to a standard set in advance, and as an alternative implementation manner, the minimum threshold may be set to be close to 0 as the second derivative of the loss value is close to 0, because the slope of the loss value is minimum when the second derivative is close to 0, that is, the change of the loss value between two iterations of the initial model to be identified is small, when the loss value is close to the loss threshold, the initial model to be identified is considered to reach a convergence state, and the prediction of the initial model to be identified at this time is relatively accurate.

For step S205, after calculating the loss value of the initial model for object recognition to be recognized in the current state in step S204, training parameters in the initial model for object recognition to be recognized are continuously adjusted, where the training parameters may be learning rate, network parameters, and the like of the initial model for object recognition to be recognized. Specifically, the loss of the initial model to be identified is continuously minimized in an iterative manner, the loss value of the initial model to be identified is calculated in each step of iteration, when the loss value of the initial model to be identified cannot reach the loss threshold value, the training parameters of the initial model to be identified are continuously updated, and new parameters are calculated to obtain new loss values, so that the loss values show a trend of fluctuation reduction in the iteration process. And finally, when the loss value reaches smoothness, namely, the loss value of the initial model of the object to be identified which is trained is not more than a loss threshold value, namely, the loss value is not obviously reduced compared with the loss value calculated last time, the initial model of the object to be identified is considered to be converged, and then the training is finished to obtain the model of the object to be identified.

After the object to be identified is constructed, the object to be identified in the sample picture is identified by using the object to be identified identification model, and the method specifically further comprises:

A: and acquiring a picture to be identified, and adjusting the acquired picture to be identified to the size of the input picture required by the object to be identified identification model.

It should be noted that the picture to be identified refers to a picture to be identified, which may include an object to be identified. As an optional implementation manner, the picture to be identified may be a picture taken by a camera or a picture uploaded by a user, which is not particularly limited in the present application.

Aiming at the steps, when the method is implemented, after the picture to be identified is acquired, the acquired picture to be identified is adjusted to the size of the input picture required by the object identification model to be identified. Specifically, the method for adjusting the size of the picture to be identified is the same as the method for adjusting the acquired sample picture to the size of the input picture in step 1011, and will not be described herein.

B: and inputting the size-adjusted picture to be identified into the object identification model to be identified, so as to obtain the object picture to be identified.

Aiming at the steps, in the specific implementation, the image to be identified with the adjusted size is input into an object identification model to be identified, so as to obtain the object image to be identified. Here, the object to be identified refers to a picture with an object to be identified. Specifically, when the image to be identified is obtained, first, determining the pixel point of the object to be identified in the size-adjusted image to be identified. And marking by using pixel points of the object to be identified in the size-adjusted picture to be identified and acquiring the position coordinates of the object to be identified in the size-adjusted picture to be identified. And drawing in the size-adjusted picture to be identified by using the determined position coordinates, namely connecting the positions corresponding to the determined position coordinates by using lines to obtain a position block diagram in the size-adjusted picture to be identified. The picture content in the position block diagram is the object to be identified, so the picture content in the position block diagram is taken as the object diagram to be identified in the pictures to be identified.

As an alternative embodiment, the object to be identified is a traffic sign, and the method further comprises:

It should be noted that, the traffic sign template map refers to a pre-stored template map for distinguishing the traffic sign type. The mapping relation library refers to a database for storing mapping relation between objects, and corresponds to a database for representing information in the form of objects. The mapping relationship generally refers to an object relationship mapping, which is used to implement conversion between data of different types of systems in an object-oriented programming language. According to the embodiment provided by the application, the preset traffic sign template diagrams and the preset traffic sign types can be stored in the mapping relation library, and one traffic sign template diagram corresponds to one traffic sign type.

Traffic sign types can be distinguished in various ways: a primary flag and a secondary flag; a movable flag and a fixed flag; illumination signs, light-emitting signs and light-reflecting signs; and a variable information flag reflecting a change in driving environment. The main marks can comprise the following four major categories: the road traffic warning sign is a sign for warning drivers and pedestrians of danger and timely taking measures; the road traffic indication mark is used for indicating drivers and pedestrians to travel according to the specified direction and place; the road traffic direction sign is used for indicating the direction of the transmission road; road traffic ban flag: is a flag used to impose restrictions on part of the traffic behavior of vehicles and pedestrians.

Here, it should be noted that the above description for the traffic sign type in the map library is merely an example, and in practice, the traffic sign type in the map library is not limited to the above example.

As an optional implementation manner, after the traffic sign map in the picture to be identified is obtained, the traffic sign type of the picture to be identified can be identified by querying a preset mapping relation library of each traffic sign template map and the traffic sign type.

According to the embodiment provided by the application, the picture to be identified can be input into the object identification model to be identified, the traffic sign graph in the picture to be identified is fast, and the traffic sign type of the traffic sign graph is identified by inquiring the preset mapping relation library of each traffic sign template graph and the traffic sign type, so that the road information is provided for the vehicle in time, and the unmanned vehicle can be helped to select the correct road running.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for constructing an object recognition model to be recognized according to an embodiment of the present application. As shown in fig. 3, the apparatus 300 for constructing an object recognition model to be recognized includes:

A sample picture obtaining module 301, configured to obtain a sample picture;

The first position coordinate acquiring module 302 is configured to acquire, for each sample picture with an object to be identified in the sample pictures, a first position coordinate of the object graph to be identified in the sample picture;

The second position coordinate obtaining module 303 inputs each sample picture into the object to be identified and identifies an initial model to obtain the second position coordinate of the predicted graph of the object to be identified;

The to-be-identified object identifying model determining module 304 is configured to train the to-be-identified object identifying initial model based on the second position coordinate of the to-be-identified object prediction graph and the first position coordinate of the to-be-identified object graph, to obtain the to-be-identified object identifying model.

Further, the apparatus 300 for constructing the object recognition model to be recognized is further configured to:

Further, the object to be identified is a traffic sign, and the apparatus 300 for constructing an object identification model to be identified is further configured to:

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 4, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.

The memory 420 stores machine-readable instructions executable by the processor 410, when the electronic device 400 is running, the processor 410 communicates with the memory 420 through the bus 430, and when the machine-readable instructions are executed by the processor 410, the steps of the method for constructing the object recognition model to be recognized in the method embodiments shown in fig. 1 and fig. 2 can be executed, so that the problem that the recognition accuracy of the object recognition model to be recognized obtained by training in the prior art is not high is solved, and specific implementation manners can be referred to the method embodiments and are not repeated herein.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, can execute the steps of the method for constructing the object recognition model to be recognized in the method embodiments shown in fig. 1 and fig. 2, so as to solve the problem of low recognition precision of the object recognition model to be recognized obtained by training in the prior art, and the specific implementation manner can refer to the method embodiment and is not described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RandomAccess Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method of constructing an object recognition model to be recognized, the method comprising:

acquiring a sample picture;

Training the initial model of the object to be identified based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified to obtain an object identification model of the object to be identified;

The first position coordinates are contour position coordinates of the object to be identified in the sample picture, and the training of the initial model for object identification to be identified based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified comprises the following steps:

Calculating a loss value based on the second position coordinates of the first pixel points, the first position coordinates corresponding to the first pixel points, and the first pixel number and the second pixel number; the loss value is obtained by comparing a first position coordinate and a second position coordinate of the first pixel point and then comparing the first pixel number and the second pixel number;

2. The method according to claim 1, wherein training the initial model for object recognition based on the second position coordinates of the predicted pattern of the object to be recognized and the first position coordinates of the pattern of the object to be recognized comprises:

And if the sample picture corresponding to the object to be identified prediction graph is a picture without the object to be identified, adjusting the training parameters of the initial model to be identified until the object to be identified prediction graph output by the initial model to be identified is empty.

3. The method according to claim 1, wherein the method further comprises:

4. A method according to claim 3, wherein the data enhancement comprises: random scaling, color gamut variation, flipping.

5. The method of claim 4, wherein the data enhancement comprises random amplification, and wherein performing data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture comprises:

6. The method according to any one of claims 1 to 5, further comprising:

7. The method of claim 6, wherein the object to be identified is a traffic sign, the method further comprising:

8. An apparatus for constructing an object recognition model to be recognized, the apparatus comprising:

The sample picture acquisition module is used for acquiring a sample picture;

The object to be identified identification model determining module is used for training the initial model to be identified to obtain an object to be identified identification model based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified;

the first position coordinates are contour position coordinates of the object to be identified in the sample picture, and the object to be identified identification model determining module is further configured to, when training the initial model to be identified based on the second position coordinates of the predicted graph of the object to be identified and the first position coordinates of the graph of the object to be identified:

9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating via said bus when the electronic device is running, said machine readable instructions being executable by said processor to perform the steps of the method of constructing an object recognition model to be recognized as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, performs the steps of the method of constructing an object recognition model to be recognized as claimed in any one of claims 1 to 7.