CN106803071B - Method and device for detecting object in image - Google Patents
Method and device for detecting object in image Download PDFInfo
- Publication number
- CN106803071B CN106803071B CN201611249792.1A CN201611249792A CN106803071B CN 106803071 B CN106803071 B CN 106803071B CN 201611249792 A CN201611249792 A CN 201611249792A CN 106803071 B CN106803071 B CN 106803071B
- Authority
- CN
- China
- Prior art keywords
- grid
- image
- central point
- size
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 239000013598 vector Substances 0.000 claims abstract description 128
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 104
- 238000001514 detection method Methods 0.000 claims abstract description 49
- 238000012549 training Methods 0.000 claims description 44
- 230000008569 process Effects 0.000 claims description 35
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 12
- 238000002372 labelling Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 101150115304 cls-2 gene Proteins 0.000 description 6
- 101150053100 cls1 gene Proteins 0.000 description 6
- 101150058580 cls-3 gene Proteins 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 241000282472 Canis lupus familiaris Species 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- OLBCVFGFOZPWHH-UHFFFAOYSA-N propofol Chemical compound CC(C)C1=CC=CC(C(C)C)=C1O OLBCVFGFOZPWHH-UHFFFAOYSA-N 0.000 description 1
- 229960004134 propofol Drugs 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a method and a device for detecting an object in an image, which are used for improving the real-time property of target detection. According to the method, an image to be detected is divided into a plurality of grids according to a preset dividing mode, the divided image is input into a convolutional neural network which is trained in advance, a feature vector corresponding to each grid of the image output by the convolutional neural network is obtained, the maximum value of a category parameter in each feature vector is identified, and when the maximum value is larger than a set threshold value, the position information of an object of a category corresponding to the category parameter is determined according to a central point position parameter and an outline size parameter in the feature vector. In the embodiment of the invention, the type and the position of the object in the image are determined through the pre-trained convolutional neural network, so that the detection of the position and the type of the object can be realized simultaneously, a plurality of characteristic areas are not required to be selected, the detection time is saved, the detection real-time performance and the detection efficiency are improved, and the integral optimization is facilitated.
Description
Technical Field
The invention relates to the technical field of machine learning, in particular to a method and a device for detecting an object in an image.
Background
With the development of video monitoring technology, intelligent video monitoring is applied to more and more scenes, such as traffic, markets, hospitals, communities, parks and the like, and the application of intelligent video monitoring lays a foundation for target detection through images in various scenes.
In the prior art, when target detection is performed in an image, a Convolutional Neural Network (R-CNN) based on a candidate Region and its extension Fast RCNN and FasterRCNN are generally adopted. Fig. 1 is a schematic flow chart of object detection by using R-CNN, and the detection process includes: receiving an input image, extracting candidate regions (region probable) in the image, calculating CNN (CNN) characteristics of each candidate region, and determining the type and position of an object by adopting a classification and regression method. In the process, 2000 candidate regions need to be extracted from the image, the whole extraction process needs 1-2 s of time, then for each candidate region, the CNN features of the candidate region need to be calculated, and many candidate regions are overlapped, so that many repeated works exist when the CNN features are calculated; and the detection process also comprises the following steps: the method comprises the steps of characteristic learning of the propofol, correction of the determined position of the object, false alarm elimination and the like, the whole detection process possibly needs 2-40 s of time, and the real-time performance of object detection is greatly influenced.
In addition, in the process of detecting the object by adopting the R-CNN, the extraction of the image is extracted by adopting a selective search, then the CNN characteristic is calculated by adopting a convolutional neural network, and finally a Support Vector Machine (SVM) is used for classification, so that the position of the target is determined. The three steps are mutually independent methods, and the whole detection process cannot be optimized integrally.
Fig. 2 is a schematic diagram of a process of object detection using fast RCNN, which uses a convolutional neural network, and each sliding window will generate 256-dimensional data in an intermediate layer (intermediate layer), detect the type of the object in a classification layer (clslide layer), and detect the position of the object in a regression layer (reg layer). The detection of the object type and the position is two independent steps, and the two steps need to respectively detect 256-dimensional data, so that the detection time length is increased, and the real-time performance of the object detection is affected.
Disclosure of Invention
The embodiment of the invention discloses a method and a device for detecting an object in an image, which are used for improving the real-time property of object detection and facilitating the integral optimization of the object detection.
In order to achieve the above object, an embodiment of the present invention discloses a method for detecting an object in an image, which is applied to an electronic device, and the method includes:
dividing an image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image to be detected is a target size;
inputting the divided images into a convolutional neural network trained in advance, and acquiring a plurality of feature vectors of the images output by the convolutional neural network, wherein each grid corresponds to one feature vector;
and identifying the maximum value of the category parameters in the feature vector aiming at the feature vector corresponding to each grid, and determining the position information of the object of the category corresponding to the category parameters according to the central point position parameters and the overall dimension parameters in the feature vector when the maximum value is larger than a set threshold value.
Further, before dividing the image to be detected into a plurality of grids according to a preset dividing manner, the method further includes:
judging whether the size of the image is a target size;
and if not, adjusting the size of the image to the target size.
Further, the training process of the convolutional neural network comprises:
aiming at each sample image in the sample image set, adopting a rectangular frame to mark a target object;
dividing each sample image into a plurality of grids according to a preset dividing mode, determining a characteristic vector corresponding to each grid, wherein the size of each sample image is a target size, when the grid contains a central point of a target object, setting a value of a category parameter corresponding to the category in the characteristic vector corresponding to the grid to be a preset maximum value according to the category of the target object, determining a value of a central point position parameter in the characteristic vector according to the position of the central point in the grid, determining a value of an outline dimension parameter in the characteristic vector according to the size of a marked rectangular frame of the target object, and when the grid does not contain the central point of the target object, setting the value of each parameter in the characteristic vector corresponding to the grid to be zero;
the convolutional neural network is trained from each sample image for which a feature vector for each mesh is determined.
Further, before dividing each sample image into a plurality of grids according to a preset dividing manner, the method further includes:
judging whether the size of each sample image is a target size or not;
and if not, adjusting the size of the sample image to the target size.
Further, the training the convolutional neural network according to each sample image for which the feature vector of each mesh is determined includes:
selecting sub-sample images from the sample image set, wherein the number of the selected sub-sample images is smaller than the number of the sample images in the sample image set;
and training the convolutional neural network by adopting each selected subsample image.
Further, the preset dividing manner includes:
dividing the image and the sample image into a plurality of grids with the same row number and column number; or the like, or, alternatively,
the image and the sample image are divided into a plurality of grids different in the number of rows and the number of columns.
Further, the method further comprises:
determining the error of the convolutional neural network according to the prediction of the convolutional neural network on the position and the type of the object in the subsample image and the information of the target object marked in the subsample image;
determining that the convolutional neural network training is complete when the error converges, wherein the error is determined using the following loss function:
wherein, S is the number of rows or columns of the divided grids with the same number of columns and the number of predicted rectangular frames per grid preset in B, and is generally 1 or 2, xiAs the marked center point of the target object is on the abscissa of the grid i,for predicting the center point of the object on the abscissa, y, of the grid iiAs the center point of the labeled target object is on the ordinate of the grid i,for the central point of the predicted object on the ordinate, h, of the grid iiFor the height, w, of the rectangular frame marked by the target objectiThe width of the rectangular frame marked with the target object is,to predict the height of the rectangular box where the object is located,for the width of the rectangular frame in which the object is predicted, CiTo label the probability of whether the target object currently exists in the grid i,for the predicted probability of whether an object is currently present in the grid i, Pi(c) For the labeled probability of the target object in the grid i belonging to the category c,for the predicted probability, λ, of an object within this grid i belonging to class ccoordAnd λnoobjIn order to set the weight value of the user,and taking 1 when the central point of the object in the jth predicted rectangular frame is positioned in the grid i, otherwise taking 0,taking 1 when the center point of the object exists in the predicted grid i, otherwise taking 0,taking 1 if the predicted grid i does not have the center point of the object, otherwise taking 0, wherein,determined according to the following formula:
Pr(Object) is the predicted oneProbability of whether object is currently present in grid i, Pr(Class | Object) is the conditional probability that an Object within the predicted mesh i belongs to Class c.
Further, the determining the position information of the object of the category corresponding to the category parameter according to the center point position parameter and the outline dimension parameter in the feature vector includes:
determining the position information of the central point in the grid according to the position parameter of the central point;
and determining the central point according to the position information, taking the central point as the center of a rectangular frame, determining the position information of the rectangular frame according to the outline dimension parameter, taking the position information of the rectangular frame as the position information of the object, and taking the object type corresponding to the type parameter as the type of the object.
Further, the determining the position information of the central point in the grid according to the position parameter of the central point includes:
using the set points of the grid as reference points; and determining the position information of the central point in the grid according to the reference point and the position parameters of the central point.
The embodiment of the invention discloses a device for detecting an object in an image, which comprises:
the dividing module is used for dividing the image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image to be detected is a target size;
the detection module is used for inputting the divided images into a convolutional neural network which is trained in advance, and acquiring a plurality of feature vectors of the images output by the convolutional neural network, wherein each grid corresponds to one feature vector;
and the determining module is used for identifying the maximum value of the category parameters in the feature vector aiming at the feature vector corresponding to each grid, and determining the position information of the object of the category corresponding to the category parameters according to the central point position parameters and the outline dimension parameters in the feature vector when the maximum value is larger than a set threshold value.
Further, the apparatus further comprises:
the judging and adjusting module is used for judging whether the size of the image is a target size; and if not, adjusting the size of the image to the target size.
Further, the apparatus further comprises:
the training module is used for adopting a rectangular frame to mark a target object aiming at each sample image in the sample image set; dividing each sample image into a plurality of grids according to a preset dividing mode, determining a characteristic vector corresponding to each grid, wherein the size of each sample image is a target size, when the grid contains a central point of a target object, setting a value of a category parameter corresponding to the category in the characteristic vector corresponding to the grid to be a preset maximum value according to the category of the target object, determining a value of a central point position parameter in the characteristic vector according to the position of the central point in the grid, determining a value of an outline dimension parameter in the characteristic vector according to the size of a marked rectangular frame of the target object, and when the grid does not contain the central point of the target object, setting the value of each parameter in the characteristic vector corresponding to the grid to be zero; the convolutional neural network is trained from each sample image for which a feature vector for each mesh is determined.
Further, the training module is further configured to determine, for each sample image, whether the size of the sample image is a target size; and if not, adjusting the size of the sample image to the target size.
Further, the training module is specifically configured to select sub-sample images from the sample image set, where the number of the selected sub-sample images is smaller than the number of the sample images in the sample image set; and training the convolutional neural network by adopting each selected subsample image.
Further, the apparatus further comprises:
the error calculation module is used for determining the error of the convolutional neural network according to the prediction of the position and the category of the object in the subsample image by the convolutional neural network and the target object marked in the subsample image;
determining that the convolutional neural network training is complete when the error converges, wherein the error is determined using the following loss function:
wherein, S is the number of rows or columns of the divided grids with the same number of columns and the number of predicted rectangular frames per grid preset in B, and is generally 1 or 2, xiAs the marked center point of the target object is on the abscissa of the grid i,for predicting the center point of the object on the abscissa, y, of the grid iiAs the center point of the labeled target object is on the ordinate of the grid i,for the central point of the predicted object on the ordinate, h, of the grid iiFor the height, w, of the rectangular frame marked by the target objectiThe width of the rectangular frame marked with the target object is,to predict the height of the rectangular box where the object is located,for the width of the rectangular frame in which the object is predicted, CiTo label the probability of whether the target object currently exists in the grid i,for the predicted probability of whether an object is currently present in the grid i, Pi(c) For the labeled probability of the target object in the grid i belonging to the category c,for the predicted probability, λ, of an object within this grid i belonging to class ccoordAnd λnoobjIn order to set the weight value of the user,and taking 1 when the central point of the object in the jth predicted rectangular frame is positioned in the grid i, otherwise taking 0,taking 1 when the center point of the object exists in the predicted grid i, otherwise taking 0,taking 1 if the predicted grid i does not have the center point of the object, otherwise taking 0, wherein,determined according to the following formula:
Pr(Object) is the predicted probability of whether an Object currently exists in the grid i, Pr(Class | Object) is the conditional probability that an Object within the predicted mesh i belongs to Class c.
Further, the determining module is specifically configured to determine, according to the position parameter of the central point, the position information of the central point in the grid;
and determining the central point according to the position information, taking the central point as the center of a rectangular frame, determining the position information of the rectangular frame according to the outline dimension parameter, taking the position information of the rectangular frame as the position information of the object, and taking the object type corresponding to the type parameter as the type of the object.
Further, the determination module is specifically configured to use a set point of the grid as a reference point; and determining the position information of the central point in the grid according to the reference point and the position parameters of the central point.
The embodiment of the invention provides a method and a device for detecting an object in an image, wherein the method comprises the steps of dividing the image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image is a target size, inputting the divided image into a convolutional neural network trained in advance, obtaining a plurality of characteristic vectors of the image output by the convolutional neural network, wherein each grid corresponds to one characteristic vector, identifying the maximum value of a category parameter in each characteristic vector, and determining the position information of the object of the category corresponding to the category parameter according to the position parameter of a central point and the outline size parameter in the characteristic vector when the maximum value is larger than a set threshold value. In the embodiment of the invention, each feature vector corresponding to the image is determined through the convolutional neural network trained in advance, the category and the position of the object in the image are determined according to the category parameter and the position related parameter in the feature vector, the detection of the position and the category of the object can be realized simultaneously, the integral optimization is convenient, in addition, the position and the category of the object are determined according to the feature vector corresponding to each grid, a plurality of feature areas are not required to be selected, the detection time is saved, and the detection real-time performance and the detection efficiency are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic view of a process for object detection using R-CNN;
FIG. 2 is a schematic diagram of an object detection process using fast RCNN;
FIG. 3 is a schematic diagram of an object detection process in an image according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a detailed implementation process of object detection in an image according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a training process of a convolutional neural network according to an embodiment of the present invention;
FIGS. 6A-6D are schematic diagrams illustrating labeling results of a target object according to an embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating a process for constructing the cube structure of FIG. 6D;
fig. 8 is a schematic structural diagram of an object detection apparatus in an image according to an embodiment of the present invention.
Detailed Description
In order to effectively improve the efficiency of object detection, improve the real-time performance of object detection and facilitate the overall optimization of object detection, the embodiment of the invention provides a method and a device for detecting an object in an image.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 3 is a schematic diagram of an object detection process in an image according to an embodiment of the present invention, where the process includes the following steps:
step S301: dividing an image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image to be detected is a target size.
The embodiment of the invention is applied to the electronic equipment, and the electronic equipment can be a desktop computer, a notebook computer, other intelligent equipment with processing capacity and the like.
After an image to be detected of a target size is obtained, the image to be detected is divided into a plurality of grids according to a preset dividing mode, wherein the preset dividing mode is the same as the dividing mode of the image when a convolutional neural network is trained. For example, the image may be formed into a plurality of rows, a plurality of columns, and the intervals between the rows and the intervals between the columns may be equal or different for convenience. Of course, the image may be divided into a plurality of irregular grids as long as the image to be detected and the image subjected to convolutional neural network training are ensured to adopt the same grid division mode.
When the image is divided into a plurality of rows and a plurality of columns, the image may be divided into a plurality of grids having the same number of rows and columns, or may be divided into a plurality of grids having different numbers of rows and columns, and the aspect ratio of each grid after division may be the same or different.
Step S302: and inputting the divided image into a convolutional neural network trained in advance, and acquiring a plurality of feature vectors of the image output by the convolutional neural network, wherein each grid corresponds to one feature vector.
In order to detect the category and the position of an object in an image, in the embodiment of the present invention, a convolutional neural network is trained, a feature vector corresponding to each mesh is obtained through the trained convolutional neural network, for example, the image may be divided into 49 meshes of 7 × 7, and after the divided image is input to the trained convolutional neural network, 49 feature vectors may be output, where each feature vector corresponds to one mesh.
Step S303: and identifying the maximum value of the category parameters in the feature vector aiming at the feature vector corresponding to each grid, and determining the position information of the object of the category corresponding to the category parameters according to the central point position parameters and the overall dimension parameters in the feature vector when the maximum value is larger than a set threshold value.
Specifically, the feature vector obtained in the embodiment of the present invention is a multidimensional vector, and the feature vector at least includes: the device comprises a category parameter and a position parameter, wherein the category parameter comprises a plurality of parameters, and the position parameter comprises: a center point location parameter and a physical dimension parameter. And after the feature vector corresponding to each grid is obtained, respectively judging whether the grid detects the object or not according to the feature vector corresponding to each grid. If the maximum value of the plurality of category parameters in the feature vector corresponding to the grid is larger than the set threshold value, the grid detects the object, the category corresponding to the category parameter is the category of the object, and the position of the object can be determined according to the feature vector corresponding to the grid.
Since the position parameters in the feature vectors employed in the convolutional neural network training are determined according to a set method, the position of the object can be determined according to the set method.
In the embodiment of the invention, each feature vector corresponding to the image is determined through the convolutional neural network trained in advance, and the category and the position of the object in the image are determined according to the category parameter and the position related parameter in the feature vector, so that the prediction of the position and the category of the object can be realized simultaneously, the integral optimization is convenient, in addition, the position and the category of the object are determined according to the feature vector corresponding to each grid, a plurality of feature areas are not required to be selected, the detection time is saved, and the detection real-time performance and the detection efficiency are improved.
The object detection in the embodiment of the present invention is performed on an image with a target size, where the target size is a uniform size of an image used in training the convolutional neural network, and the size may be any size as long as the size of the image is the same as that of the image in object detection in training the convolutional neural network. The target size may be, for example, 1024 x 1024, or may be 256 x 512, etc.
Therefore, in an embodiment of the present invention, in order to ensure that the images input into the convolutional neural network are all images of a target size, before dividing the image to be detected into a plurality of grids according to a preset dividing manner, the method further includes:
judging whether the size of the image is a target size;
and if not, adjusting the size of the image to the target size.
And when the image to be detected is in a target size, directly carrying out subsequent processing on the image, and when the image to be detected is in a non-target size, adjusting the image to be detected to the target size. The adjustment of the image size belongs to the prior art, and the process is not described in detail in the embodiment of the present invention.
Specifically, in the embodiment of the present invention, determining the position information of the object of the category corresponding to the category parameter according to the center point position parameter and the outline dimension parameter in the feature vector includes:
determining the position information of the central point in the grid according to the position parameter of the central point;
and determining the central point according to the position information, taking the central point as the center of a rectangular frame, determining the position information of the rectangular frame according to the outline dimension parameter, taking the position information of the rectangular frame as the position information of the object, and taking the object type corresponding to the type parameter as the type of the object.
Wherein the determining the position information of the central point in the grid according to the position parameter of the central point comprises:
using the set points of the grid as reference points; and determining the position information of the central point in the grid according to the reference point and the position parameters of the central point.
Fig. 4 is a schematic diagram of a detailed implementation process of object detection in an image according to an embodiment of the present invention, where the process includes the following steps:
step S401: an image to be detected is received.
Step S402: and judging whether the size of the image is the target size, if so, performing step S404, and otherwise, performing step S403.
Step S403: and adjusting the size of the image to a target size.
Step S404: dividing an image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image to be detected is a target size.
Step S405: and inputting the divided image into a convolutional neural network trained in advance, and acquiring a plurality of feature vectors of the image output by the convolutional neural network, wherein each grid corresponds to one feature vector.
Step S406: and identifying the maximum value of the category parameter in the feature vector aiming at the feature vector corresponding to each grid.
Step S407: and when the maximum value is larger than a set threshold value, taking the set point of the grid as a reference point, and determining the position information of the central point in the grid according to the reference point and the position parameters of the central point.
Step S408: and determining the central point according to the position information, taking the central point as the center of a rectangular frame, determining the position information of the rectangular frame according to the outline dimension parameter, taking the position information of the rectangular frame as the position information of the object, and taking the object type corresponding to the type parameter as the type of the object.
The target detection is performed based on the trained convolutional neural network, and in order to detect the object, the convolutional neural network needs to be trained. In the embodiment of the invention, when the convolutional neural network is trained, a sample image with a target size is divided into a plurality of grids, and if the central point of a certain target object is located in a certain grid, the grid is responsible for detecting the target object, including detecting the type and the corresponding position (bounding box) of the target object.
Fig. 5 is a schematic diagram of a training process of a convolutional neural network according to an embodiment of the present invention, where the training process includes the following steps:
step S501: and marking the target object by adopting a rectangular frame aiming at each sample image in the sample image set.
In the embodiment of the invention, a large number of sample images are adopted to train the convolutional neural network, and then the large number of sample images form a sample image set. A rectangular frame is used to mark the target object in each sample image.
Specifically, as shown in fig. 6A to fig. 6D, the labeling result of the target object is illustrated schematically, and 3 target objects, namely, a dog, a bicycle, and a car, exist in the sample image in fig. 6A. When labeling each target object, the vertices of each target object in four directions, i.e., up, down, left, and right (with respect to the up, down, left, and right directions shown in fig. 6A) are identified in the sample image, and if the vertices are the up and down vertices, two lines parallel to the upper and lower bottom sides of the sample image passing through the up and down vertices are defined as two sides of the rectangular frame, and if the vertices are the left and right vertices, two lines parallel to the left and right sides of the sample image passing through the left and right vertices are defined as the other two sides of the rectangular frame. Such as the rectangular boxes of dogs, bicycles and cars marked with dashed lines in fig. 6A.
Step S502: dividing each sample image into a plurality of grids according to a preset dividing mode, determining a characteristic vector corresponding to each grid, wherein the size of each sample image is a target size, when the grid contains a central point of a target object, setting a value of a category parameter corresponding to the category in the characteristic vector corresponding to the grid to be a preset maximum value according to the category of the target object, determining a value of a central point position parameter in the characteristic vector according to the position of the central point in the grid, determining a value of an outline dimension parameter in the characteristic vector according to the size of a marked rectangular frame of the target object, and when the grid does not contain the central point of the target object, setting the value of each parameter in the characteristic vector corresponding to the grid to be zero.
In the embodiment of the present invention, the sample image may be divided into a plurality of grids according to a preset division manner, where the division manner of the sample image is the same as the division manner of the image to be detected in the detection process.
For example, the image may be formed into a plurality of rows, a plurality of columns, and the intervals between the rows and the intervals between the columns may be equal or different for convenience. Of course, the image may be divided into a plurality of irregular grids as long as the image to be detected and the image subjected to convolutional neural network training are ensured to adopt the same grid division mode.
When the image is divided into a plurality of rows and a plurality of columns, the image may be divided into a plurality of grids having the same number of rows and columns, or may be divided into a plurality of grids having different numbers of rows and columns, and the aspect ratio of each grid after division may be the same or different. For example, the sample image is divided into a plurality of grids of 12 × 10, or 15 × 15, or 6 × 6, etc. When the size of the mesh is equal, the mesh size may be normalized. As shown in fig. 6B, in the embodiment of the present invention, the sample image is divided into a plurality of grids of 7 rows in the transverse direction and 7 columns in the longitudinal direction, and each grid is a square grid, so that the size of each grid after normalization can be regarded as 1 × 1.
Each grid in the sample image corresponds to a feature vector, the feature vector is a multi-dimensional vector, and the feature vector at least comprises: the device comprises a category parameter and a position parameter, wherein the category parameter comprises a plurality of parameters, and the position parameter comprises: a center point location parameter and a physical dimension parameter.
Step S503: the convolutional neural network is trained from each sample image for which a feature vector for each mesh is determined.
Specifically, in the embodiment of the present invention, the convolutional neural network may be trained by using all sample images in the sample image set. However, because the sample image set includes a large number of sample images, in order to improve the training efficiency, in the embodiment of the present invention, training the convolutional neural network according to each sample image for which the feature vector of each mesh is determined includes:
selecting sub-sample images from the sample image set, wherein the number of the selected sub-sample images is smaller than the number of the sample images in the sample image set;
and training the convolutional neural network by adopting each selected subsample image.
By randomly selecting sub-sample images far smaller than the total number of the sample images, the convolutional neural network is trained, and parameters of the convolutional neural network are continuously updated until the error between the information of the object predicted by each grid and the information of the labeled target object is converged.
Similarly, in the embodiment of the present invention, when training the convolutional neural network, a sample image of a target size is used, and therefore, in the embodiment of the present invention, in order to ensure that the sample images input into the convolutional neural network are all of the target size, before dividing each sample image into a plurality of grids according to a preset dividing manner, the method further includes:
judging whether the size of each sample image is a target size or not;
and if not, adjusting the size of the sample image to the target size.
And when the sample image is in the target size, directly carrying out subsequent processing on the sample image, and when the sample image is not in the target size, adjusting the sample image to the target size. The adjustment of the image size belongs to the prior art, and the process is not described in detail in the embodiment of the present invention.
In the above process, the target size of the sample image may be adjusted first, or the rectangular frame may be labeled first in the sample image. The marking of the rectangular frame is firstly carried out to ensure that the target object can be accurately marked when the size of the sample image is larger, and the target size is firstly adjusted to ensure that the target object can be accurately marked when the size of the sample image is smaller.
In the above labeling process, a feature vector corresponding to each mesh in the sample image may be determined, and in an embodiment of the present invention, the feature vector corresponding to each mesh may be represented as (confidence, cls1, cls2, cls3, …, cls20, x, y, w, h), where confidence is a probability parameter, cls1, cls2, cls3, …, and cls20 are category parameters, and x, y, w, and h are position parameters, where x and y are center point position parameters, and w and y are outline size parameters. When the grid contains the central point of the target object, the value of each parameter in the feature vector corresponding to the grid is determined, and when the grid does not contain the central point of the target object, the value of each parameter in the feature vector corresponding to the grid is 0.
Specifically, since each target object is labeled by using a rectangular frame in the sample image, the center point of the rectangular frame may be considered as the center point of the target object, such as the center points of the three rectangular frames shown in fig. 6C. When the grid includes the center point of the target object, then during labeling, the probability parameter in the feature vector corresponding to the grid may be considered to be 1, that is, the probability that the target object exists in the grid is 1 at present.
Since there are a plurality of categories of target objects included in the sample image, the category parameter cls is used in the embodiment of the present invention to represent the target objects of different categories, namely, cls1, cls2, … … and clsn. For example, n may be 20, i.e., 20 classes of objects are shared, the class of objects represented by cls1 is a car, the class of objects represented by cls2 is a dog, and the class of objects represented by cls3 is a bicycle. When the grid includes the center point of the target object, the class parameter value corresponding to the target object is set to be a maximum value, where the maximum value is greater than a set threshold, for example, the maximum value may be 1, the threshold may be 0.4, and the like.
For example, as shown in fig. 6C, from bottom to top (top and bottom shown in fig. 6C), in the feature vector corresponding to the grid where each central point is located, cls2 in the class parameter in the feature vector corresponding to the first central point is 1, the other class parameters are 0, cls3 in the class parameter in the feature vector corresponding to the second central point is 1, the other class parameters are 0, cls1 in the class parameter in the feature vector corresponding to the third central point is 1, and the other class parameters are 0.
The feature vector further includes position parameters x, y, w, and h of the target object, where x and y are center point position parameters, and values of the center point position parameters are horizontal and vertical coordinate values of the center point of the target object relative to a set point, where the set points corresponding to each grid may be the same or different, for example, the upper left corner of the sample image may be considered as the set point, i.e., the origin of coordinates, because each grid is normalized, and thus the coordinates of each position in each grid are uniquely determined. Of course, in order to simplify the process and reduce the amount of calculation, the corresponding set point of each grid may also be different, and each grid may be regarded as an independent unit, and the upper left corner of the grid is the set point, i.e. the origin of coordinates. Therefore, when labeling is performed, the values of x and y in the feature vector corresponding to the grid can be determined according to the offset of the central point relative to the upper left corner of the grid where the central point is located. The process of determining the x and y values according to the offset of the relative position belongs to the prior art, and is not described in detail in the embodiment of the present invention. In the position parameters, w and h are outline dimension parameters, and the values of the outline dimension parameters are the length and width values of a rectangular frame where the target object is located.
Because the feature vector is a multidimensional vector, in order to accurately represent the feature vector corresponding to each mesh, in the embodiment of the present invention, the cubic structure shown in fig. 6D is constructed according to the construction method shown in fig. 7, and the mesh is correspondingly processed in the convolutional layer, the max pooling layer, the full connection layer, and the output layer, so as to generate a cubic mesh structure, where the depth of the cubic mesh in the Z-axis direction is determined according to the dimension of the feature vector. In the present embodiment, the depth of the cubic grid in the Z-axis direction is 25. The above process of performing corresponding processing in each layer of the convolutional neural network to generate the cubic grid structure belongs to the prior art, and is not described in detail in the embodiment of the present invention.
After a large number of sample images are labeled by adopting the mode, the convolutional neural network is trained by adopting the labeled sample images. Specifically, the plurality of subsample images used in the embodiment of the present invention train the convolutional neural network. In the training process, for each subsample image, obtaining a convolution feature map of the subsample image through a convolution neural network, wherein the convolution feature map comprises a feature vector (confidence, cls1, cls2, cls3, …, cls20, x, y, w, h) corresponding to each grid, the feature vector comprises a position parameter and a category parameter of an object predicted in the grid, and a probability parameter confidence, and the probability parameter confidence represents the overlapping degree of a rectangular frame where the object is predicted by the grid and a rectangular frame marked with the target object.
In the training process, for each sub-sample image, network parameters of the convolutional neural network are adjusted by calculating the error between the prediction information and the labeling information, the sub-sample images which are far less than the total number (batch) of the sample images are randomly selected each time, the convolutional neural network is trained, and the network parameters are updated until the error between the prediction information and the labeling information of each grid is converged. The process of training the convolutional neural network according to the sub-sample image and adjusting the network parameters of the convolutional neural network until the training of the convolutional neural network is completed belongs to the prior art, and is not described in detail in the embodiment of the invention.
In the training process of the convolutional neural network, in order to accurately predict the position and the category information of the object, in the embodiment of the present invention, the last fully-connected layer of the convolutional neural network uses a logic activation function, and the convolutional layer and other fully-connected layers use a Leak ReLU function. Wherein the leakage ReLU function is:
in order to complete the training of the convolutional neural network and make it converge in the embodiment of the present invention, when training the convolutional neural network, the method further includes:
determining the error of the convolutional neural network according to the prediction of the position and the type of the target object in the subsample image by the convolutional neural network and the information of the target object marked in the subsample image;
determining that the convolutional neural network training is complete when the error converges, wherein the error is determined using the following loss function:
wherein S is the row number or column number of the divided grids with the same row number and column number, B is the number of the preset rectangular frames predicted by each grid, 1 or 2 is generally taken, xi is the abscissa of the marked target object center point on the grid i,the horizontal coordinate of the central point of the predicted object in the grid i, yi is the vertical coordinate of the central point of the labeled target object in the grid i,for the central point of the predicted object on the ordinate, h, of the grid iiFor the height, w, of the rectangular frame marked by the target objectiThe width of the rectangular frame marked with the target object is,to predict the height of the rectangular box where the object is located,for the width of the rectangular frame in which the object is predicted, CiTo label the probability of whether the target object currently exists in the grid i,for the predicted probability of whether an object is currently present in the grid i, Pi(c) For the labeled probability of the target object in the grid i belonging to the category c,for the predicted probability, λ, of an object within this grid i belonging to class ccoordAnd λnoobjIn order to set the weight value of the user,and taking 1 when the central point of the object in the jth predicted rectangular frame is positioned in the grid i, otherwise taking 0,taking 1 when the center point of the object exists in the predicted grid i, otherwise taking 0,taking 1 if the predicted grid i does not have the center point of the object, otherwise taking 0, wherein,determined according to the following formula:
Pr(Object) is the predicted probability of whether an Object currently exists in the grid i, Pr(Class | Object) to classify objects within predicted mesh i as belonging to Class cThe conditional probability.
In order to make the contribution of the prediction process to the position prediction smaller when the error between the prediction result and the labeled result is larger, in the embodiment of the present invention, the above-mentioned loss function is adopted.
As shown in fig. 6B, in an embodiment of the present invention, each sample image is divided into 49 grids of 7 × 7, each grid can detect 20 classes, so that one sample image can generate 980 detection probabilities, and most grids have a detection probability of 0. This will lead to training discretization where a variable is introduced to solve the problem: i.e. the probability of whether an object is present in a certain grid. Thus, in addition to the 20 class parameters, there is also a predicted probability P of whether an object is currently present in the gridr(Object), the probability that a target Object within a certain mesh belongs to class cIs Pr(Object) conditional probability P of an Object in a predicted mesh belonging to class cr(Class | Object). At each mesh, P is pairedr(Object) is updated, P only if there are objects in the gridr(Class | Object) is updated.
Fig. 8 is a schematic structural diagram of an apparatus for detecting an object in an image according to an embodiment of the present invention, where the apparatus is located in an electronic device, and the apparatus includes:
the dividing module 81 is configured to divide an image to be detected into a plurality of grids according to a preset dividing manner, where the size of the image to be detected is a target size;
the detection module 82 is configured to input the divided image into a convolutional neural network trained in advance, and obtain a plurality of feature vectors of the image output by the convolutional neural network, where each grid corresponds to one feature vector;
the determining module 83 is configured to identify, for a feature vector corresponding to each grid, a maximum value of a category parameter in the feature vector, and determine, when the maximum value is greater than a set threshold, position information of an object of a category corresponding to the category parameter according to a center point position parameter and an outer dimension parameter in the feature vector.
The device further comprises:
a judgment adjustment module 84, configured to judge whether the size of the image is a target size; and if not, adjusting the size of the image to the target size.
The device further comprises:
a training module 85, configured to focus on a target object with a rectangular frame for each sample image in the sample image set; dividing each sample image into a plurality of grids according to a preset dividing mode, determining a characteristic vector corresponding to each grid, wherein the size of each sample image is a target size, when the grid contains a central point of a target object, setting a value of a category parameter corresponding to the category in the characteristic vector corresponding to the grid to be a preset maximum value according to the category of the target object, determining a value of a central point position parameter in the characteristic vector according to the position of the central point in the grid, determining a value of an outline dimension parameter in the characteristic vector according to the size of a marked rectangular frame of the target object, and when the grid does not contain the central point of the target object, setting the value of each parameter in the characteristic vector corresponding to the grid to be zero; the convolutional neural network is trained from each sample image for which a feature vector for each mesh is determined.
The training module 85 is further configured to determine, for each sample image, whether the size of the sample image is a target size; and if not, adjusting the size of the sample image to the target size.
The training module 85 is specifically configured to select sub-sample images from the sample image set, where the number of the selected sub-sample images is smaller than the number of the sample images in the sample image set; and training the convolutional neural network by adopting each selected subsample image.
The device further comprises:
an error calculation module 86, configured to determine an error of the convolutional neural network according to the prediction of the position and the category of the object in the subsample image by the convolutional neural network and information of the target object labeled in the subsample image;
determining that the convolutional neural network training is complete when the error converges, wherein the error is determined using the following loss function:
wherein, S is the number of rows or columns of the divided grids with the same number of columns and the number of predicted rectangular frames per grid preset in B, and is generally 1 or 2, xiAs the marked center point of the target object is on the abscissa of the grid i,for predicting the center point of the object on the abscissa, y, of the grid iiAs the center point of the labeled target object is on the ordinate of the grid i,for the central point of the predicted object on the ordinate, h, of the grid iiFor the height, w, of the rectangular frame marked by the target objectiThe width of the rectangular frame marked with the target object is,to predict the height of the rectangular box where the object is located,for the width of the rectangular frame in which the object is predicted, CiTo label the probability of whether the target object currently exists in the grid i,for the predicted probability of whether an object is currently present in the grid i, Pi(c) For the labeled probability of the target object in the grid i belonging to the category c,for the predicted probability, λ, of an object within this grid i belonging to class ccoordAnd λnoobjIn order to set the weight value of the user,and taking 1 when the central point of the object in the jth predicted rectangular frame is positioned in the grid i, otherwise taking 0,taking 1 when the center point of the object exists in the predicted grid i, otherwise taking 0,taking 1 if the predicted grid i does not have the center point of the object, otherwise taking 0, wherein,determined according to the following formula:
Pr(Object) is the predicted probability of whether an Object currently exists in the grid i, Pr(Class | Object) is the conditional probability that an Object within the predicted mesh i belongs to Class c.
The determining module 83 is specifically configured to determine the position information of the central point in the grid according to the position parameter of the central point; and determining the central point according to the position information, taking the central point as the center of a rectangular frame, determining the position information of the rectangular frame according to the outline dimension parameter, taking the position information of the rectangular frame as the position information of the object, and taking the object type corresponding to the type parameter as the type of the object.
The determination module 83, in particular for taking the set points of the grid as reference points; and determining the position information of the central point in the grid according to the reference point and the position parameters of the central point.
The embodiment of the invention provides a method and a device for detecting an object in an image, wherein the method comprises the steps of dividing the image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image is a target size, inputting the divided image into a convolutional neural network trained in advance, obtaining a plurality of characteristic vectors of the image output by the convolutional neural network, wherein each grid corresponds to one characteristic vector, identifying the maximum value of a category parameter in each characteristic vector, and determining the position information of the object of the category corresponding to the category parameter according to the position parameter of a central point and the outline size parameter in the characteristic vector when the maximum value is larger than a set threshold value. In the embodiment of the invention, each feature vector corresponding to the image is determined through the convolutional neural network trained in advance, the category and the position of the object in the image are determined according to the category parameter and the position related parameter in the feature vector, the detection of the position and the category of the object can be realized simultaneously, the integral optimization is convenient, in addition, the position and the category of the object are determined according to the feature vector corresponding to each grid, a plurality of feature areas are not required to be selected, the detection time is saved, and the detection real-time performance and the detection efficiency are improved.
For the system/apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (14)
1. A method for detecting an object in an image, which is applied to an electronic device, includes:
dividing an image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image to be detected is a target size;
inputting the divided images into a convolutional neural network trained in advance, and acquiring a plurality of feature vectors of the images output by the convolutional neural network, wherein each grid corresponds to one feature vector;
identifying the maximum value of the category parameters in the feature vector aiming at the feature vector corresponding to each grid, and determining the position information of the object of the category corresponding to the category parameters according to the central point position parameters and the overall dimension parameters in the feature vector when the maximum value is larger than a set threshold value;
wherein, the preset dividing mode comprises:
dividing the image and the sample image into a plurality of grids with the same row number and column number;
the method further comprises the following steps:
determining the error of the convolutional neural network according to the prediction of the position and the type of the object in the subsample image by the convolutional neural network and the information of the target object marked in the subsample image;
determining that the convolutional neural network training is complete when the error converges, wherein the error is determined using the following loss function:
wherein S is the row number or column number of the divided grids with the same row number and column number, B is the number of the predicted rectangular frames of each grid preset, and 1 or 2, x is takeniAs the marked center point of the target object is on the abscissa of the grid i,for predicting the center point of the object on the abscissa, y, of the grid iiAs the center point of the labeled target object is on the ordinate of the grid i,for the central point of the predicted object on the ordinate, h, of the grid iiFor the height, w, of the rectangular frame marked by the target objectiThe width of the rectangular frame marked with the target object is,to predict the height of the rectangular box where the object is located,for the width of the rectangular frame in which the object is predicted, CiTo label the probability of whether the target object currently exists in the grid i,for the predicted probability of whether an object is currently present in the grid i, Pi(c) For the labeled probability of the target object in the grid i belonging to the category c,for the predicted probability, λ, of an object within this grid i belonging to class ccoordAnd λnoobjIn order to set the weight value of the user,taking 1 when the central point of the predicted object in the jth rectangular frame is positioned in the grid I, otherwise, taking 0, Ii objTaking 1 when the center point of the object exists in the predicted grid i, otherwise taking 0,in the predicted grid iIf there is no center point of the object, 1 is taken, otherwise 0 is taken, wherein,determined according to the following formula:
Pr(Object) is the predicted probability of whether an Object currently exists in the grid i, Pr(Class | Object) is the conditional probability that an Object within the predicted mesh i belongs to Class c.
2. The method according to claim 1, wherein before the dividing the image to be detected into a plurality of grids according to the preset dividing manner, the method further comprises:
judging whether the size of the image is a target size;
and if not, adjusting the size of the image to the target size.
3. The method of claim 1, wherein the training process of the convolutional neural network comprises:
aiming at each sample image in the sample image set, adopting a rectangular frame to mark a target object;
dividing each sample image into a plurality of grids according to a preset dividing mode, determining a characteristic vector corresponding to each grid, wherein the size of each sample image is a target size, when the grid contains a central point of a target object, setting a value of a category parameter corresponding to the category in the characteristic vector corresponding to the grid to be a preset maximum value according to the category of the target object, determining a value of a central point position parameter in the characteristic vector according to the position of the central point in the grid, determining a value of an outline dimension parameter in the characteristic vector according to the size of a marked rectangular frame of the target object, and when the grid does not contain the central point of the target object, setting the value of each parameter in the characteristic vector corresponding to the grid to be zero;
the convolutional neural network is trained from each sample image for which a feature vector for each mesh is determined.
4. The method of claim 3, wherein before the dividing each sample image into a plurality of grids according to the preset dividing manner, the method further comprises:
judging whether the size of each sample image is a target size or not;
and if not, adjusting the size of the sample image to the target size.
5. The method of claim 3, wherein training the convolutional neural network based on each sample image for which a feature vector for each mesh is determined comprises:
selecting sub-sample images from the sample image set, wherein the number of the selected sub-sample images is smaller than the number of the sample images in the sample image set;
and training the convolutional neural network by adopting each selected subsample image.
6. The method according to claim 1, wherein the determining, according to the center point position parameter and the outline size parameter in the feature vector, the position information of the object of the category corresponding to the category parameter comprises:
determining the position information of the central point in the grid according to the position parameter of the central point;
and determining the central point according to the position information, taking the central point as the center of a rectangular frame, determining the position information of the rectangular frame according to the outline dimension parameter, taking the position information of the rectangular frame as the position information of the object, and taking the object type corresponding to the type parameter as the type of the object.
7. The method of claim 6, wherein the determining the position information of the center point in the grid according to the position parameter of the center point comprises:
using the set points of the grid as reference points; and determining the position information of the central point in the grid according to the reference point and the position parameters of the central point.
8. An apparatus for detecting an object in an image, the apparatus comprising:
the dividing module is used for dividing the image to be detected into a plurality of grids according to a preset dividing mode, wherein the size of the image to be detected is a target size;
the detection module is used for inputting the divided images into a convolutional neural network which is trained in advance, and acquiring a plurality of feature vectors of the images output by the convolutional neural network, wherein each grid corresponds to one feature vector;
the determining module is used for identifying the maximum value of the category parameters in the feature vector aiming at the feature vector corresponding to each grid, and determining the position information of the object of the category corresponding to the category parameters according to the central point position parameters and the outline dimension parameters in the feature vector when the maximum value is larger than a set threshold;
wherein the apparatus further comprises:
the error calculation module is used for determining the error of the convolutional neural network according to the prediction of the position and the type of the object in the subsample image by the convolutional neural network and the target object marked in the subsample image;
determining that the convolutional neural network training is complete when the error converges, wherein the error is determined using the following loss function:
wherein S is the row number or column number with the same row number and column number of the divided grids, and B is presetThe number of rectangular frames per grid prediction of (1) or (2, x)iAs the marked center point of the target object is on the abscissa of the grid i,for predicting the center point of the object on the abscissa, y, of the grid iiAs the center point of the labeled target object is on the ordinate of the grid i,for the central point of the predicted object on the ordinate, h, of the grid iiFor the height, w, of the rectangular frame marked by the target objectiThe width of the rectangular frame marked with the target object is,to predict the height of the rectangular box where the object is located,for the width of the rectangular frame in which the object is predicted, CiTo label the probability of whether the target object currently exists in the grid i,for the predicted probability of whether an object is currently present in the grid i, Pi(c) For the labeled probability of the target object in the grid i belonging to the category c,for the predicted probability, λ, of an object within this grid i belonging to class ccoordAnd λnoobjIn order to set the weight value of the user,taking 1 when the central point of the predicted object in the jth rectangular frame is positioned in the grid I, otherwise, taking 0, Ii objPresence of an object in the predicted grid iThe central point of the body is 1, otherwise 0 is taken,taking 1 if the predicted grid i does not have the center point of the object, otherwise taking 0, wherein,determined according to the following formula:
Pr(Object) is the predicted probability of whether an Object currently exists in the grid i, Pr(Class | Object) is the conditional probability that an Object within the predicted mesh i belongs to Class c.
9. The apparatus of claim 8, further comprising:
the judging and adjusting module is used for judging whether the size of the image is a target size; and if not, adjusting the size of the image to the target size.
10. The apparatus of claim 8, further comprising:
the training module is used for adopting a rectangular frame to mark a target object aiming at each sample image in the sample image set; dividing each sample image into a plurality of grids according to a preset dividing mode, determining a characteristic vector corresponding to each grid, wherein the size of each sample image is a target size, when the grid contains a central point of a target object, setting a value of a category parameter corresponding to the category in the characteristic vector corresponding to the grid to be a preset maximum value according to the category of the target object, determining a value of a central point position parameter in the characteristic vector according to the position of the central point in the grid, determining a value of an outline dimension parameter in the characteristic vector according to the size of a marked rectangular frame of the target object, and when the grid does not contain the central point of the target object, setting the value of each parameter in the characteristic vector corresponding to the grid to be zero; the convolutional neural network is trained from each sample image for which a feature vector for each mesh is determined.
11. The apparatus of claim 10, wherein the training module is further configured to determine, for each sample image, whether the size of the sample image is a target size; and if not, adjusting the size of the sample image to the target size.
12. The apparatus according to claim 11, wherein the training module is specifically configured to select sub-sample images from the sample image set, wherein the number of the selected sub-sample images is smaller than the number of sample images in the sample image set; and training the convolutional neural network by adopting each selected subsample image.
13. The apparatus according to claim 8, wherein the determining module is specifically configured to determine the position information of the central point in the grid according to the position parameter of the central point;
and determining the central point according to the position information, taking the central point as the center of a rectangular frame, determining the position information of the rectangular frame according to the outline dimension parameter, taking the position information of the rectangular frame as the position information of the object, and taking the object type corresponding to the type parameter as the type of the object.
14. The device according to claim 13, characterized in that said determination module is particularly adapted to take as a reference point a set point of said grid; and determining the position information of the central point in the grid according to the reference point and the position parameters of the central point.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249792.1A CN106803071B (en) | 2016-12-29 | 2016-12-29 | Method and device for detecting object in image |
PCT/CN2017/107043 WO2018121013A1 (en) | 2016-12-29 | 2017-10-20 | Systems and methods for detecting objects in images |
EP17886017.7A EP3545466A4 (en) | 2016-12-29 | 2017-10-20 | Systems and methods for detecting objects in images |
US16/457,861 US11113840B2 (en) | 2016-12-29 | 2019-06-28 | Systems and methods for detecting objects in images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611249792.1A CN106803071B (en) | 2016-12-29 | 2016-12-29 | Method and device for detecting object in image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106803071A CN106803071A (en) | 2017-06-06 |
CN106803071B true CN106803071B (en) | 2020-02-14 |
Family
ID=58985345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611249792.1A Active CN106803071B (en) | 2016-12-29 | 2016-12-29 | Method and device for detecting object in image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106803071B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3545466A4 (en) | 2016-12-29 | 2019-11-27 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for detecting objects in images |
CN107392158A (en) * | 2017-07-27 | 2017-11-24 | 济南浪潮高新科技投资发展有限公司 | A kind of method and device of image recognition |
CN108229307B (en) | 2017-11-22 | 2022-01-04 | 北京市商汤科技开发有限公司 | Method, device and equipment for object detection |
CN108062547B (en) * | 2017-12-13 | 2021-03-09 | 北京小米移动软件有限公司 | Character detection method and device |
CN110110189A (en) * | 2018-02-01 | 2019-08-09 | 北京京东尚科信息技术有限公司 | Method and apparatus for generating information |
CN108460761A (en) * | 2018-03-12 | 2018-08-28 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN108960232A (en) * | 2018-06-08 | 2018-12-07 | Oppo广东移动通信有限公司 | Model training method and device, electronic equipment and computer readable storage medium |
US11373411B1 (en) | 2018-06-13 | 2022-06-28 | Apple Inc. | Three-dimensional object estimation using two-dimensional annotations |
CN110610184B (en) * | 2018-06-15 | 2023-05-12 | 阿里巴巴集团控股有限公司 | Method, device and equipment for detecting salient targets of images |
CN108968811A (en) * | 2018-06-20 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of object identification method and system of sweeping robot |
CN108921840A (en) * | 2018-07-02 | 2018-11-30 | 北京百度网讯科技有限公司 | Display screen peripheral circuit detection method, device, electronic equipment and storage medium |
CN109272050B (en) * | 2018-09-30 | 2019-11-22 | 北京字节跳动网络技术有限公司 | Image processing method and device |
CN109558791B (en) * | 2018-10-11 | 2020-12-01 | 浙江大学宁波理工学院 | Bamboo shoot searching device and method based on image recognition |
CN109726741B (en) * | 2018-12-06 | 2023-05-30 | 江苏科技大学 | Method and device for detecting multiple target objects |
CN109685069B (en) * | 2018-12-27 | 2020-03-13 | 乐山师范学院 | Image detection method, device and computer readable storage medium |
US10460210B1 (en) * | 2019-01-22 | 2019-10-29 | StradVision, Inc. | Method and device of neural network operations using a grid generator for converting modes according to classes of areas to satisfy level 4 of autonomous vehicles |
CN111597845A (en) * | 2019-02-20 | 2020-08-28 | 中科院微电子研究所昆山分所 | Two-dimensional code detection method, device and equipment and readable storage medium |
CN111639660B (en) * | 2019-03-01 | 2024-01-12 | 中科微至科技股份有限公司 | Image training method, device, equipment and medium based on convolution network |
CN109961107B (en) * | 2019-04-18 | 2022-07-19 | 北京迈格威科技有限公司 | Training method and device for target detection model, electronic equipment and storage medium |
CN111914850B (en) * | 2019-05-07 | 2023-09-19 | 百度在线网络技术(北京)有限公司 | Picture feature extraction method, device, server and medium |
CN110338835B (en) * | 2019-07-02 | 2023-04-18 | 深圳安科高技术股份有限公司 | Intelligent scanning three-dimensional monitoring method and system |
CN110930386B (en) * | 2019-11-20 | 2024-02-20 | 重庆金山医疗技术研究院有限公司 | Image processing method, device, equipment and storage medium |
CN111353555A (en) * | 2020-05-25 | 2020-06-30 | 腾讯科技(深圳)有限公司 | Label detection method and device and computer readable storage medium |
CN112084874B (en) * | 2020-08-11 | 2023-12-29 | 深圳市优必选科技股份有限公司 | Object detection method and device and terminal equipment |
CN112446867B (en) * | 2020-11-25 | 2023-05-30 | 上海联影医疗科技股份有限公司 | Method, device, equipment and storage medium for determining blood flow parameters |
CN112785564B (en) * | 2021-01-15 | 2023-06-06 | 武汉纺织大学 | Pedestrian detection tracking system and method based on mechanical arm |
CN113935425B (en) * | 2021-10-21 | 2024-08-16 | 中国船舶集团有限公司第七一一研究所 | Object identification method, device, terminal and storage medium |
CN114739388B (en) * | 2022-04-20 | 2023-07-14 | 中国移动通信集团广东有限公司 | Indoor positioning and navigation method and system based on UWB and laser radar |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104517113A (en) * | 2013-09-29 | 2015-04-15 | 浙江大华技术股份有限公司 | Image feature extraction method and device and image sorting method and device |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
-
2016
- 2016-12-29 CN CN201611249792.1A patent/CN106803071B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104517113A (en) * | 2013-09-29 | 2015-04-15 | 浙江大华技术股份有限公司 | Image feature extraction method and device and image sorting method and device |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
Non-Patent Citations (1)
Title |
---|
End-to-end people detection in crowded scenes;Russell Stewart 等;《网页在线公开:https://arxiv.org/abs/1506.04878》;20150708;第2节、第3节,图2、图4 * |
Also Published As
Publication number | Publication date |
---|---|
CN106803071A (en) | 2017-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106803071B (en) | Method and device for detecting object in image | |
US11878433B2 (en) | Method for detecting grasping position of robot in grasping object | |
CN108805016B (en) | Head and shoulder area detection method and device | |
Zhou et al. | Exploring faster RCNN for fabric defect detection | |
WO2017059576A1 (en) | Apparatus and method for pedestrian detection | |
CN110059558A (en) | A kind of orchard barrier real-time detection method based on improvement SSD network | |
CN104424634A (en) | Object tracking method and device | |
CN112766170B (en) | Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image | |
CN111292377B (en) | Target detection method, device, computer equipment and storage medium | |
CN111382638B (en) | Image detection method, device, equipment and storage medium | |
CN111738164B (en) | Pedestrian detection method based on deep learning | |
CN114882423A (en) | Truck warehousing goods identification method based on improved Yolov5m model and Deepsort | |
CN112784494B (en) | Training method of false positive recognition model, target recognition method and device | |
CN111080697B (en) | Method, apparatus, computer device and storage medium for detecting direction of target object | |
CN112580435B (en) | Face positioning method, face model training and detecting method and device | |
CN116758631B (en) | Big data driven behavior intelligent analysis method and system | |
CN117763350A (en) | Labeling data cleaning method and device | |
CN112733741B (en) | Traffic sign board identification method and device and electronic equipment | |
CN108475339B (en) | Method and system for classifying objects in an image | |
CN112861689A (en) | Searching method and device of coordinate recognition model based on NAS technology | |
CN117036966B (en) | Learning method, device, equipment and storage medium for point feature in map | |
CN113936255B (en) | Intersection multi-steering vehicle counting method and system | |
CN118097785B (en) | Human body posture analysis method and system | |
US20230410477A1 (en) | Method and device for segmenting objects in images using artificial intelligence | |
Zhang et al. | Recognition of italian gesture language based on augmented yolov5 algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |