CN117671244A

CN117671244A - Image recognition method and device combined with region division

Info

Publication number: CN117671244A
Application number: CN202311677489.1A
Authority: CN
Inventors: 王路远; 李原超; 尚德龙; 周玉梅
Original assignee: Zhongke Nanjing Intelligent Technology Research Institute
Current assignee: Zhongke Nanjing Intelligent Technology Research Institute
Priority date: 2023-12-07
Filing date: 2023-12-07
Publication date: 2024-03-08

Abstract

The invention discloses an image recognition method and device combining region division, wherein the method corrects a trained image recognition model to enable an input image received by the method to be an image with any size; then, scaling the image to be identified to the size required by the original image identification model, and carrying out image identification; when the identification result is that no target exists in the image, firstly performing self-adaptive scaling, boundary filling or cutting treatment on the image to be identified, implicitly dividing the image into a plurality of areas, and inputting the treated image into a corrected image identification model for secondary identification to obtain identification confidence coefficients of the plurality of areas; bilinear interpolation and binarization are carried out on the recognition confidence coefficient of the plurality of areas, the recognition result and the area information of the target are obtained, and then the final image recognition result is obtained. The invention can identify smaller targets in the image and targets deviating from the center in the image, can give the range of the targets, and improves the image identification rate.

Description

Image recognition method and device combined with region division

Technical Field

The invention discloses an image identification method and device combining region division, and belongs to the technical field of image processing.

Background

In the existing image recognition method, when the image is recognized, in order to enable the image size to meet the size required by an image recognition model, the image to be recognized needs to be scaled, and then the processed image is recognized. This approach has certain limitations.

Limitations one: when the target occupies a relatively small area in the image, the image resolution is reduced by scaling the image, and the pixels occupied by the target are reduced, so that the image recognition model cannot correctly recognize the target of the image.

Limitations two: when the target deviates from the center of the image, the existing pattern recognition model has low recognition rate on the image.

Disclosure of Invention

The invention aims to provide an image recognition method and device combining region division, which are used for implicitly dividing an image into a plurality of regions by means of implicit division of the image on the basis of the existing image recognition, and recognizing the divided regions by using a modified image recognition model, so that the recognition performance of the image recognition model in a scene with a smaller target region or a target deviating from the center of the image is improved.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the invention provides an image recognition method combining region division, which comprises the following steps:

constructing an image recognition model and correcting;

based on the corrected image recognition model, performing preliminary recognition on the image to be recognized;

dividing the image to be identified which is not identified to the target object according to the size of the input image of the image identification model;

inputting the divided image to be identified into the corrected image identification model, and constructing a confidence coefficient diagram based on an identification result;

and identifying the target object based on the confidence map.

Further, the constructing and correcting the image recognition model includes:

constructing a model formed by a feature detector and a classifier, wherein the feature detector comprises a convolution layer and a pooling layer, and the last layer is a global average pooling layer; the classifier comprises a full connection layer;

training the model according to a target task acquisition training data set to obtain an image recognition model;

replacing the global average pooling layer by adopting the average pooling layer, and enabling the kernel function size of the average pooling layer to be consistent with the characteristic input size of the global average pooling layer;

using convolution kernel of size 1 x C ₁ ×C ₂ Replacing the fully connected layer and assigning values to the convolutional layer using trained weights and offsets of the fully connected layer, wherein C ₁ Inputting the characteristics of the full connection layer, C ₂ And outputting the characteristics of the full connection layer.

Further, the performing preliminary recognition on the image to be recognized based on the corrected image recognition model includes:

scaling an image to be identified into L multiplied by L, inputting the corrected image identification model, acquiring identification confidence, and outputting an object identification result when the identification confidence of the object is greater than that of a non-object; and screening out images with the identification confidence of the target object smaller than that of the non-target object.

Further, the dividing the image to be identified, in which the target object is not identified, according to the size of the input image of the image identification model includes:

and carrying out self-adaptive scaling, cutting or filling treatment on the screened image to be identified, so that each side length of the image to be identified is an integral multiple of the side length of the image input by the image identification model, and dividing the image to be identified into a plurality of L multiplied by L areas.

Further, the adaptively scaling the screened image to be identified includes:

short of the image to be identifiedThe edge is marked as L _min ：

Wherein W is ₀ And H ₀ The width and the height of the image to be identified are respectively;

when 0 is<L _min When L is less than or equal to L, the scaling factor is as follows: r=l/L _min ；

When L<L _min When the ratio is less than or equal to 2L, the scaling factor is r=2L/L _min ；

When L _min >At 3L, the scaling factor is r=3l/L _min ；

The width and height of the image to be identified are multiplied by the scaling factor for scaling.

Further, the clipping or filling process includes:

long side L of image to be identified after self-adaptive scaling treatment _max Calculate L _max And the remainder m of L, when the remainder m is larger than L/2, filling the two sides of the long side as follows:

wherein,and->Filling widths on two sides of the long side respectively, wherein floor is downward rounding operation, ceil is upward rounding operation;

filling widths of the image to be identified in the up-down, left-right and up-down directions are denoted as dy ₁ ,dy ₂ ,dx ₁ ,dx ₂ Then:

when the remainder m is smaller than L/2, the two sides of the long side are cut as follows:

wherein,and->Respectively cutting the width of two sides of the long side;

the cutting width of the image to be identified in the up-down, left-right and up-down directions is recorded as dy ₁ ,dy ₂ ,dx ₁ ,dx ₂ Then:

further, the inputting the divided image to be identified into the corrected image identification model, and constructing a confidence map based on the identification result includes:

inputting the divided image to be identified into the corrected image identification model,

and acquiring the recognition confidence coefficient of the target object and the non-target object of each region, and forming a recognition confidence coefficient diagram.

Further, identifying the target object based on the confidence map includes:

bilinear interpolation is carried out on the confidence coefficient map, so that the size of the confidence coefficient map is consistent with that of an original image to be identified after cutting or filling;

performing binarization processing on the interpolated confidence map;

adopting a contour detection algorithm to obtain an external rectangular contour in the binarized confidence coefficient graph, and if the external rectangular contour cannot be obtained, no target object exists and the identification is finished; if there is a circumscribed rectangular outline, then it is denoted as (x ₀ ,y ₀ ,w ₀ ,h ₀ ) Wherein (x) ₀ ,y ₀ ) Is the upper left corner coordinate, w, of the circumscribed rectangular outline ₀ And h ₀ The width and the height are respectively;

mapping the circumscribed rectangular outline to the original image as follows:

wherein max () is a maximum operation, min () is a minimum operation, (x) ₁ ,y ₁ ) The left upper corner coordinate, w, of the circumscribed rectangular outline on the original image to be identified ₁ And h ₁ And the width and the height of the circumscribed rectangular outline on the original image to be identified are obtained.

Further, the image to be identified is an image acquired by collecting on a street, and the target object is a vehicle.

A second aspect of the present invention provides an image recognition apparatus for combining region division, for implementing the above image recognition method for combining region division, the apparatus comprising:

the model construction module is used for constructing an image recognition model and correcting the image recognition model;

the preliminary identification module is used for carrying out preliminary identification on the image to be identified based on the corrected image identification model;

the preprocessing module is used for dividing the image to be identified, which is not identified as the target object, according to the size of the input image of the image identification model;

the secondary recognition module is used for inputting the divided image to be recognized into the corrected image recognition model and constructing a confidence coefficient diagram based on a recognition result;

and the third identification module is used for identifying the target object based on the confidence coefficient diagram.

Compared with the existing image recognition method, the image recognition method combining region division provided by the invention carries out further image recognition on the basis of the existing image recognition method; when the existing image recognition method cannot correctly recognize the target in the image, the image is implicitly divided into a plurality of areas by means of implicit division of the image, and the divided areas are recognized by using the modified image recognition model, so that the smaller target in the image and the target which is deviated from the center in the image can be recognized, and the range of the target can be given. The image recognition is carried out by the method of the invention, which has higher recognition rate and better generalization.

Drawings

FIG. 1 is a flow chart of an image recognition method combining region division provided by the invention;

FIG. 2 is an example of clipping and filling an image to be identified according to an embodiment of the present invention;

FIG. 3 is an image to be identified in one embodiment of the invention;

FIG. 4 is a graph of vehicle confidence after processing of FIG. 3;

FIG. 5 is a graph of vehicle confidence after binarizing the graph of FIG. 4;

fig. 6 is a final recognition result of fig. 3.

Detailed Description

The invention is further described below. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Referring to fig. 1, an embodiment of an image recognition method combined with region division according to the present invention is provided, where the application scene is a street, the image to be recognized is an image acquired on the street, and a vehicle in the image to be recognized is recognized. The method comprises the following steps:

step 1, constructing an image recognition model and correcting,

the image recognition model is used for recognizing the image input into the model and judging whether the vehicle exists in the model.

The image recognition model generally consists of a feature detector and a classifier. The feature detector consists of a convolution layer, a pooling layer and the like, wherein the last layer is a global average pooling layer, and the feature detector is used for extracting features of an image to be identified; the classifier consists of a full-connection layer, classifies the image to be identified by utilizing the features extracted by the feature detector, and determines the image identification result.

It should be noted that, the image recognition model needs to be obtained through pre-training, for those skilled in the art, the existing neural network may be used to construct the feature detector and the classifier, and the feature detector and the classifier may be constructed, and the constructed model may be trained according to the training data set collected by the target task to obtain the image recognition model, which is obvious and easy to be implemented by those skilled in the art, so that the description is omitted herein.

For the image recognition model after training, the size of an input image received by the image recognition model is fixed (the common input size is 224×224) due to the inherent property of the fully connected layer, and is recorded as L×L, and accordingly, the feature input size of the global average pooling layer is also fixed. For the classifier, the feature input of the full connection layer is recorded as C ₁ The characteristic output is C ₂ Using convolution kernel sizes of 1×1×c ₁ ×C ₂ Instead of full-join layers, and assigned values using trained full-join layer weights and offsets. After the correction of the image recognition model is completed, the input image size of the image recognition model is not limited to the size of L multiplied by L, so that the further image recognition is facilitated.

Step 2, based on the corrected image recognition model, carrying out preliminary recognition on the image to be recognized, wherein the method specifically comprises the following steps:

for the image to be identified, the image width is W ₀ Length of H ₀ Firstly, scaling the image recognition model into L multiplied by L, and then inputting the image recognition model into the corrected image recognition model in the step 1 for recognition, so as to obtain the recognition confidence of the vehicle and the non-vehicle, namely the recognition probability of the vehicle and the non-vehicle. When the identification confidence of the vehicle is greater than the identification confidence of the non-vehicle, the vehicle is considered to be present in the image. When the vehicle is identifiedAnd when the confidence is smaller than the recognition confidence of the non-vehicle, further executing the step 3.

Step 3, performing self-adaptive scaling processing on the image to be identified, of which the identification confidence coefficient of the vehicle is smaller than that of the non-vehicle after identification,

the short side of the image to be identified is marked as L _min The method comprises the following steps:

when 0 is<L _min When L is less than or equal to L, the scaling factor r=L/L is recorded _min The method comprises the steps of carrying out a first treatment on the surface of the When L<L _min When less than or equal to 2L, the scaling factor r=2L/L _min When L _min >At 3L, the scaling factor r=3l/L _min And scaling the image to be identified according to the scaling factor.

Step 4, after the self-adaptive scaling of the image to be identified is completed, further cutting or filling is carried out on the image, so that the long side and the short side of the image are integer multiples of L, and the method specifically comprises the following steps:

after the step 3 is completed, the short side of the image is an integer multiple of L, and the filling width of the short side is 0, namely, filling or cutting processing is not needed.

Using p ₁ ,p ₂ Representing the extent to which the image is to be filled or cropped on both sides of the long side, when p ₁ ,p ₂ If the image length is larger than 0, filling is carried out on two sides of the image length, otherwise, the image length is needed to be cut.

For the long side L of the image _max And (3) obtaining a remainder m of the pixel value and L, and filling two sides of the long side by using a pixel value (114,114,114) when the remainder m is larger than L/2, wherein the filling width of the two sides is as follows:

wherein floor is a downward rounding operation and ceil is an upward rounding operation.

The long side is just an integer multiple of L after filling, and the image is divided into four directions up, down, left and rightThe filling width in the direction is denoted as dy ₁ ,dy ₂ ,dx ₁ ,dx ₂ Then:

when the remainder m is smaller than L/2, cutting the two sides of the long side, wherein the cutting widths of the two sides are respectively as follows:

The long side is just an integer multiple of L after cutting, and the cutting width of the image in the up-down, left-right directions is recorded as dy ₁ ,dy ₂ ,dx ₁ ,dx ₂ Then:

step 5, through step 3 and step 4, the short side of the image to be identified is L _min * r, the long side is floor (L) _max *r)+p ₁ +p ₂ Both of which are integer multiples of L, the image to be identified can be implicitly divided into a plurality of l×l sized regions.

Referring specifically to fig. 2, the side length of the square in the broken line is L, and the area is the input size of the conventional image recognition model. The image to be identified processed through the step 3 and the step 4 implicitly comprises 2×3l×l sized areas, i.e., the step 3 and the step 4 implicitly divide the image to be identified into 2×3l×l sized areas.

And (3) inputting the images to be identified processed in the step (3) and the step (4) into an image identification model, and acquiring the vehicle and non-vehicle identification confidence degrees of a plurality of areas. For example, for fig. 2, the image to be identified is input into the image identification model modified in the step 1, so as to obtain the confidence levels of vehicle and non-vehicle identification in the corresponding 6 areas;

extracting the vehicle recognition confidence coefficient of each region to form a vehicle recognition confidence coefficient map;

performing bilinear interpolation on the vehicle identification confidence coefficient map to enable the size of the vehicle identification confidence coefficient map to be consistent with the size of the image processed in the step 4;

setting a threshold T, and performing binarization processing on the confidence map, namely setting the pixel lower than the threshold T as 0 and setting the pixel higher than the confidence T as 255;

and acquiring an external rectangular outline in the binarized confidence coefficient diagram by using an existing outline detection algorithm, and if the rectangular area cannot be acquired, indicating that no vehicle exists in the image, and ending the identification. If such a rectangular region exists, a (x ₀ ,y ₀ ,w ₀ ,h ₀ ) Represents the rectangular region, wherein (x ₀ ,y ₀ ) Is the upper left corner coordinate of the rectangular area, w ₀ And h ₀ The width and the length of the rectangular area are respectively, and the area included by the outline is the area where the vehicle is located.

And 6, further processing the circumscribed rectangular outline obtained in the step 5, and mapping the circumscribed rectangular outline to an original image. The specific treatment is as follows:

where max is the operation of taking the maximum value and min is the operation of taking the minimum value. Through the above operation, a left upper corner coordinate (x) can be obtained on the original image to be identified ₁ ,y ₁ ) Length w ₁ Width is h ₁ Is a rectangular region (x) ₁ ,y ₁ ,w ₁ ,h ₁ ) The area is the area containing the vehicle, and the identification is finished.

In one embodiment of the present invention, taking the image to be identified shown in fig. 3 as an example, after the image to be identified is adaptively scaled and filled, the confidence levels of vehicle and non-vehicle identification in a plurality of areas are obtained after the image identification model, a vehicle identification confidence level diagram is shown in fig. 4, fig. 5 is obtained after binarization processing, and the area containing the vehicle shown in fig. 6 is obtained by performing contour detection and mapping to an original image.

Based on the same inventive concept, the present invention also provides an image recognition device for combining region division, for implementing the above image recognition method for combining region division, the device comprising:

the third recognition module is used for recognizing the target object based on the confidence coefficient diagram

It should be noted that the embodiment of the apparatus corresponds to the embodiment of the method, and the implementation manner of the embodiment of the method is applicable to the embodiment of the apparatus and can achieve the same or similar technical effects, so that the description thereof is omitted herein.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. An image recognition method combining region division, comprising:

constructing an image recognition model and correcting;

and identifying the target object based on the confidence map.

2. The method for image recognition in combination with area division according to claim 1, wherein the constructing and correcting the image recognition model includes:

3. The image recognition method according to claim 2, wherein the preliminary recognition of the image to be recognized based on the corrected image recognition model includes:

4. A method of image recognition in combination with region division according to claim 3, wherein the dividing the image to be recognized, in which no object is recognized, according to the image size input by the image recognition model, comprises:

5. The method for image recognition combined with area division according to claim 4, wherein the performing adaptive scaling on the screened image to be recognized comprises:

the short side of the image to be identified is marked as L _min ：

When L _min >At 3L, the scaling factor is r=3l/L _min ；

6. The image recognition method of claim 5, wherein performing clipping or filling processing comprises:

long side L of image to be identified after self-adaptive scaling treatment _max Calculate L _max Remainder m from L, when remainder m is greater than L/2In this case, the two sides of the long side are filled as follows:

wherein,and->Respectively cutting the width of two sides of the long side;

7. the method for recognizing an image combined with area division according to claim 6, wherein said inputting the divided image to be recognized into the corrected image recognition model, constructing a confidence map based on the recognition result, comprises:

8. The method of claim 7, wherein identifying the target based on the confidence map comprises:

performing binarization processing on the interpolated confidence map;

mapping the circumscribed rectangular outline to the original image as follows:

wherein max () is a maximum operation, min () is a minimum operation, (x) ₁ ,y ₁ ) The upper left corner of the original image to be identified is the outline of the circumscribed rectangleCoordinates, w ₁ And h ₁ And the width and the height of the circumscribed rectangular outline on the original image to be identified are obtained.

9. The method for recognizing images by combining regional division according to any one of claims 1 to 8, wherein the image to be recognized is an image acquired by capturing on a street, and the object is a vehicle.

10. An image recognition apparatus for combined area division, characterized in that it is used for realizing the combined area division image recognition method according to any one of claims 1 to 8, characterized in that it comprises: