US20180039853A1 - Object Detection System and Object Detection Method - Google Patents
Object Detection System and Object Detection Method Download PDFInfo
- Publication number
- US20180039853A1 US20180039853A1 US15/226,088 US201615226088A US2018039853A1 US 20180039853 A1 US20180039853 A1 US 20180039853A1 US 201615226088 A US201615226088 A US 201615226088A US 2018039853 A1 US2018039853 A1 US 2018039853A1
- Authority
- US
- United States
- Prior art keywords
- region
- subnetwork
- image
- feature vector
- box
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G06K9/4671—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24143—Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G06T7/0081—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/768—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using context analysis, e.g. recognition aided by known co-occurring patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- This invention relates to neural networks, and more specifically to object detection systems and methods using a neural network.
- Object detection is one of the most fundamental problems in computer vision.
- the goal of an object detection is to detect and localize all instances of pre-defined object classes in the form of bounding boxes with confidence values for given input images.
- An object detection problem can be converted to an object classification problem by a scanning window technique.
- the scanning window technique is inefficient because classification steps are performed for all potential image regions of various locations, scales, and aspect ratios.
- the region-based convolution neural network is used to perform a two-stage approach, in which a set of object proposals are generated as regions of interest (ROI) using a proposal generator and the existence of an object and the classes in the ROI are determined using a deep neural network.
- ROI regions of interest
- the detection accuracy of the R-CNN is insufficient for some case. Accordingly, another approach is required to further improve the object detection performance.
- Some embodiments of the invention are based on recognition that region-based convolution neural network (R-CNN) can use detect objects of different sizes.
- R-CNN region-based convolution neural network
- detecting small objects in an image and/or predicting the class label the small objects in the image is a challenging problem for scene understanding due to small number of pixels in the image representing the small object.
- Some embodiments are based on realization that specific small objects are usually appearing in the specific contexts. For example, a mouse is usually place near a keyboard and a monitor. That context can be part of training and recognition to compensate for the small resolution of the small object. To that end, some embodiments extract feature vectors from different regions including the object. Those regions are of different size and provide different contextual information about the object. In some embodiments, the object is detected and/or classified based on combination of the feature vectors.
- the size of the object is governed by the number of pixels of the image forming the object. For example, a small object is represented by less number of pixels. To that end, one embodiment resizes the region surrounding the object by at least seven times to collect enough contextual information.
- one embodiment discloses a non-transitory computer readable recoding medium storing thereon a program causing a computer to execute an object detection process.
- the object detection process includes extracting a first feature vector from a first region of an image using a first subnetwork; determining a second region of the image by resizing the first region, wherein a size of the first region differs from a size of the second region; extracting a second feature vector from the second region of the image using the first subnetwork; and detecting the object using a third subnetwork on a basis of the first feature vector and the second feature vector to produce a bounding box surrounding the object and a class of the object, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network.
- Another embodiment discloses a method for detecting an object in an image.
- the method includes steps of extracting a first feature vector from a first region of an image using a first subnetwork; determining a second region of the image by resizing the first region; extracting a second feature vector from a second region of the image using a second subnetwork; classifying a class of the object using a third subnetwork on a basis of the first feature vector and the second feature vector; and determining the class of object in the first region according to a result of the classifying, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, wherein steps of the method are performed by a processor.
- the system includes a human machine interface; a storage device including neural networks; a memory; a network interface controller connectable with a network being outside the system; an imaging interface connectable with an imaging device; and a processor configured to connect to the human machine interface, the storage device, the memory, the network interface controller and the imaging interface, wherein the processor executes instructions for detecting an object in an image using the neural networks stored in the storage device, wherein the neural networks perform steps of: extracting a first feature vector from a first region of the image using a first subnetwork; determining a second region of the image by processing the first feature vector with a second subnetwork, wherein a size of the first region differs from a size of the second region; extracting a second feature vector from the second region of the image using the first subnetwork; and detecting the object using a third subnetwork on a basis of the first feature vector and the second feature vector to produce a bounding box surrounding the object and a class of the object, wherein the first sub
- FIG. 1 is a block diagram of an object detection system for detecting small objects in an image according to some embodiments of the invention
- FIG. 2 shows a flowchart of processes for detecting a small object in an image
- FIG. 3 is a block diagram of a neural network used in a computer-implemented object detection method for detecting small objects in an image according to some embodiments;
- FIG. 4A shows a procedure of resizing a target region image and a contest region image in an image
- FIG. 4B shows an example of a procedure applying a proposal box and a context box to a clock image in an image
- FIG. 4C shows a block diagram of a process for detecting a mouse image in an image
- FIG. 5 shows an example of statistics of small object categories
- FIG. 6 shows median bounding box sizes of objects per a category and the corresponding up-sampling ratios
- FIG. 7 shows an example of average precision results performed by different networks.
- FIG. 1 shows a block diagram of an object detection system 100 according to some embodiments of the invention.
- the object detection system 100 includes a human machine interface (HMI) 110 connectable with a keyboard 111 and a pointing device/medium 112 , a processor 120 , a storage device 130 , a memory 140 , a network interface controller 150 (NIC) connectable with a network 190 including local area networks and internet network, a display interface 160 , an imaging interface 170 connectable with an imaging device 175 , a printer interface 180 connectable with a printing device 185 .
- the object detection system 100 can receive electric text/imaging documents 595 via the network 190 connected to the NIC 150 .
- the storage device 130 includes original images 131 , a filter system module 132 , and neural networks 200 .
- the pointing device/medium 112 may include modules that read programs stored on a computer readable recording medium.
- instructions may be transmitted to the object detection system 100 using the keyboard 111 , the pointing device/medium 112 or via the network 190 connected to other computers (not shown in the figure).
- the object detection system 100 receives the instructions using the HMI 110 and executes the instructions for detecting an object in an image using the processor 120 using the neural networks 200 stored in the storage device 130 .
- the processor 120 may be a plurality of processors including one or more than graphics processing units (GPUs).
- the filter system module 132 is operable to perform image processing to obtain predetermined formatted image from given images relevant to the instructions.
- the images processed by the filter system module 132 can be used by the neural networks 200 for detecting objects.
- An object detection process using the neural networks 200 is described below.
- a glimpse region is referred to as a glimpse box, a bounding box, a glimpse bounding box or a bounding box region, which is placed on a target in an image to detect the feature of the target object in the image.
- Some embodiments are based on recognition that a method for detecting an object in an image includes extracting a first feature vector from a first region of an image using a first subnetwork, determining a second region of the image by resizing the first region into a fixed ratio, wherein a size of the first region is smaller than a size of the second region, extracting a second feature vector from the second region of the image using a second subnetwork, and classifying a class of the object using a third subnetwork on a basis of the first feature vector and the second feature vector, and determining the class of object in the first region according to a result of the classifying, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, wherein steps of the method are performed by a processor.
- Some embodiments of the invention are based on recognition that detecting small objects in an image and/or predicting the class label the small objects in the image is a challenging problem for scene understanding due to small number of pixels in the image representing the small object.
- some specific small objects are usually appearing in the specific contexts. For example, a mouse is usually place near a keyboard and a monitor. That context can be part of training and recognition to compensate for the small resolution of the small object.
- some embodiments extract feature vectors from different regions including the object. Those regions are of different size and provide different contextual information about the object.
- the object is detected and/or classified based on combination of the feature vectors.
- FIG. 2 shows a flowchart of processes for detecting a small object in an image.
- a first feature vector is extracted from a first region in the image by using a first subnetwork.
- a second region in the image is determined by resizing the first region with a predetermined ratio by used of a resize module.
- a second feature vector is extracted from the second region by using a second subnetwork.
- a third subnetwork classifies the object based on the first feature vector and second feature vector. The classification result of the object in the image is output by the third subnetwork in step S 5 .
- the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, and the steps are performed by a processor. Further, the step of resizing the first region is performed such that each of the first region and the second region includes the object and a size of the first region is smaller than a size of the second region.
- FIG. 3 shows a block diagram of an object detection method using the neural networks 200 according to some embodiments of the invention.
- the neural networks 200 includes a region proposal network (RPN) 400 and a neural network 250 .
- the neural network 250 may be referred to as a ContexNet 250 .
- the ContextNet 250 includes a context region module 12 , a resize module 13 , a resize module 14 , a first deep convolutional neural network (DCNN) 210 , a second deep convolutional neural network (DCNN) 220 and a third neural network 300 .
- the third neural network 300 includes a concatenation module 310 , a fully connected neural network 311 and a softmax function module 312 .
- the first DCNN 210 may be referred to as a first subnetwork
- the second DCNN 220 may be referred to as a second subnetwork
- the third neural network 300 may be referred to as a third subnetwork.
- the first subnetwork and second subnetwork may have identical structure.
- the region proposal network (RPN) 400 is applied to the image 10 to generate a proposal box 15 being placed on a region of a target object image in the image.
- the part of the image 10 encompassed by the proposal box 15 is referred to as a target region image.
- the target region image is resized to a resized object image 16 with a predetermined identical size and a predetermined resolution using a resize module 13 , and the resized object image 16 is transmitted to the neural networks 200 .
- a threshold size of small objects is predetermined to classify objects in the image into a small object category.
- the threshold size may be chosen according to the system design of object detection and used in the RPN 400 to generate the proposal box 15 .
- the proposal box 15 also provides the location information 340 of the target object image in the image 10 .
- the threshold size may be determined based on predetermined physical sizes of objects in the image, pixel sizes of objects in the image or a ratio of an area of an object image to the whole area of the image.
- a context box 20 is obtained by enlarging the proposal box 15 by seven times in x and y directions (height and width dimensions) using the context region module 12 .
- the context box 20 is placed on the proposal box 15 of the image 10 to surround the target region image, in which part of the image determined by placing the context box 20 is referred to as a context region image.
- the context region image corresponding to the context box 20 is resized, using the resize module 13 , to a resized context image 21 having the predetermined size and transmitted to the ContexNet 250 .
- the context region image may be obtained by magnifying the target region image by seven times or other values according to the data configurations used in the ContexNet 250 . Accordingly, the target region image corresponding to the proposal box 15 and the context region image corresponding to the context box 20 are converted into the resized target image 16 and the resized context image 21 by using the resize module 13 and the resize module 14 before being transmitted to the ContexNet 250 .
- the resized target image 16 and the resized context image 21 have the predetermined identical size.
- the predetermined identical size may be 227 ⁇ 227 (224 ⁇ 224 for VGG16) patches (pixels).
- the predetermined identical size may be changed according to the data format used in the neural networks.
- the predetermined identical size may be defined based on a predetermined pixel size or a predetermined physical dimension, and the aspect ratios of the target region image and the context region image may be maintained after being resized.
- the ContexNet 250 receives the resized target image 16 and the resized context image 21 from the first DCNN 210 and the second DCNN 220 , respectively.
- the first DCNN 210 in the ContexNet 250 extracts a first feature vector 230 from the resized target image 16 , and transmits the first feature vector 230 to the concatenation module 310 of the third neural network 300 .
- the second DCNN 220 in the ContexNet 250 extracts a second feature vector 240 from the resized context image 21 and transmits the second feature vector 240 to the concatenation module 310 of the third neural network 300 .
- the concatenation module 310 concatenates the first feature vector 230 and the second feature vector 240 and generates a concatenated feature.
- the concatenated feature is transmitted to the fully connected neural network (NN) 311 , and the fully connected NN 311 generates a feature vector from the concatenated feature and transmits the concatenated feature vector to the softmax function module 312 .
- the softmax function module 312 performs a classification of the target object image based on the concatenated feature vector from the fully connected NN 312 and outputs a classification result as a category output 330 .
- the object detection of the target object image corresponding to the proposal box 15 is obtained based on the category output 330 and the location information 340 .
- FIG. 4A shows a procedure of resizing a target region image and a contest region image in an image.
- the neural networks 200 crops the target region image corresponding to the proposal box 15 and resized the target region image to a resized target image 16 , and the resized target image 16 is transmitted to the first DCNN 210 .
- the context region module 12 enlarges the proposal box 15 by seven times in both x and y directions to obtain the context box 20 .
- the context region module 12 also places the context box 20 on the image 10 so that the context box 20 covers the target region image corresponding to the proposal box 15 .
- the context region module 12 applies the context box 20 on the image 10 to define a context region image.
- the neural networks 200 crops the context region image corresponding to the context box 20 and resizes the context region image to a resized context image 21 having the predetermined size that is identical to that of the resized target image 16 .
- the resized context image 21 is transmitted to the second DCNN 220 , in which the second DCNN 220 and the first DCNN 210 have identical structure. This procedure improves detecting small objects because extracting features from greater areas in the image helps to incorporate context information resulting better discriminative operation.
- the center of the context box 20 may be shifted from the center of the proposal box 15 by a predetermined distance according to a predetermined ratio between areas of the context box 20 and the proposal box 15 .
- the context box 20 is set to be greater than the proposal box 15 so that the context box 20 encloses the proposal box 15 .
- each of side lines of the context box 20 may be seven times greater than or equal to that of the proposal box 15 .
- the center of the proposal box 15 is arranged to be identical to that of the context box 20 .
- FIG. 4A also shows a generating process of the context box 20 from the proposal box 15 .
- a vector of the context box 20 is obtained by converting a vector of the proposal box 15 .
- the vector of the proposal box 15 is expressed by a position (x, y), a width w, and h a height of the proposal box 15 .
- the position (x, y) indicates the position of one of corners of the proposal box 15 defined by x-y coordinate in the image 10 .
- the vector of the proposal box 15 is expressed by (x, y, w, h), in which a left side lower corner is given by the position (x, y) and a diagonal position to the position (x, y) of the left side lower corner is obtained by (x+w, y+h).
- the center (x c , y c ) of the proposal box 15 is expressed by a point (x+w/2, y+h/2).
- the vector (x′, y′, w′, h′) of the context box 20 is expressed by (x c ⁇ c ⁇ w/2, y c ⁇ c ⁇ h/2, c ⁇ w, c ⁇ h).
- the proposal box 15 and the context box 20 have the identical center (x c , y c ).
- the center of the context box 20 may be shifted from the center of the proposal box 15 according to predetermined amounts ⁇ x and ⁇ y.
- the predetermined amounts 4 x and Ay may be defined to satisfy the conditions of
- FIG. 4B shows an example of a procedure applying a proposal box and a context box to a clock image in an image 13 , in which an enlarged clock image is indicated at the right upper corner of the image 13 .
- the clock image is much smaller than the other objects, such as furniture, windows, a fireplace, etc.
- a proposal box 17 is applied to part of the clock image as a target image in the image 13 .
- the target image corresponding to the proposal box 17 is enlarged into a resized target image 16 and transmitted to the first DCNN 210 via the resize module 13 .
- the neural network 200 provides a context box 22 based on the proposal box 17 and applied the context box 22 to the clock image, in which the context box 22 is arranged to fully surround the proposal box 17 with a predetermined area as shown in the figure.
- An image region corresponding to the context box 22 is cropped as a context image from the image 13 and the resize module 14 resizes the context image into a resized context image 21 .
- the resized context image 21 is transmitted to the second DCNN 220 .
- the context image encloses the target image as seen in the figure. This procedure makes it possible for the neural network 200 to obtain the crucial information of a small object in the image, resulting higher accuracy for small object classifications.
- FIG. 4C shows a block diagram of a process for detecting a mouse image in an image.
- the region proposal network 400 provides a proposal box 31 corresponding to a target object image showing a back side of a mouse on a desk and provides a context box 32 surrounding the proposal box 31 .
- a resized target image of the target object image is transmitted to the first DCNN 210 (indicated as convolutional layers).
- the first DCNN 210 extracts a first feature vector of the target object image from the resized target image and transmits the first feature vector to the concatenation module 310 .
- the context box 32 is applied to the image 30 to determine a context region image that encloses the target object image.
- a resized context image of the context region image is transmitted to the second DCNN 220 (indicated as convolutional layers).
- the second DCNN 220 extracts a second feature vector of the context region image from the resized context image and transmits the second feature vector to the concatenation module 310 .
- the concatenation module 310 concatenates the first and second feature vectors and generates a concatenated feature.
- the concatenated feature is transmitted to the fully connected NN 311 (indicated as fully connected layers).
- the fully connected NN 311 generates and transmits a feature vector to the softmax function module 312 .
- the softmax function module 312 performs a classification of the target object image based on the feature vector from the fully connected NN 312 and outputs a classification result.
- the classification result indicates that a category of the target object image is a “mouse” as shown in the figure.
- the size of a proposal box is chosen to obtain appropriate sized vectors that accommodate the context information of the proposal box in the object detection system 100 .
- a dataset for detecting small objects may be constructed by selecting predetermined small objects from conventional datasets, such as the SUN and Microsoft COCO datasets. For example, a subset of images of small objects are selected from the conventional datasets, and the ground truth bounding box locations in the conventional datasets are used to prune out big object instances from the conventional datasets and compose a small object dataset that purely contains small objects with small bounding boxes.
- the small object dataset may be constructed by computing the statistics of small objects.
- FIG. 5 shows an example of statistics of small object categories. Ten example categories are listed in the figure. For example, it is seen that there are 2137 instances in 1739 images with respect to “mouse” category. Other categories such as “telephone”, “switch”, “outlet”, “clock”, “toilet paper”, “tissue box”, “faucet”, “plate”, and “jar” are also listed in the figure.
- FIG. 5 also shows the median relative area with respect to each category, in which the median relative area corresponds to the ratio of a bounding box area over the entire image area of object instances in the same category. The median relative area ranges between 0.08% and 0.58%. The relative areas correspond to pixel areas between 16 ⁇ 16 pixels 2 and 42 ⁇ 42 pixels 2 in VGA image.
- the small object dataset constructed according to the embodiment is customized for small objects.
- the sizes of small bounding boxes may be determined based on the small object dataset described above.
- a median of relative areas of object categories in a conventional dataset such as the PASCAL VOC dataset, ranges between 1.38% and 46.40%. Accordingly, the boundary boxes provided by the small object dataset according to some embodiments of the invention can provide more accurate bounding boxes for small objects than the bounding boxes provided by the conventional dataset, because the conventional dataset provides much wider bounding box areas with respect to object categories that are not customized for small objects.
- the predetermined small objects may be determined by categorizing instances having physical dimensions smaller than a predetermined size.
- the predetermined size may be 30 centimeters. In another example, the predetermined size may be 50 centimeters according to the object detection system design.
- FIG. 6 shows median bounding box sizes of objects per a category and the corresponding up-sampling ratios.
- the up-sampling ratio is chosen to be 6 to 7 to match an input size (227 ⁇ 227 in this case) of the deep convolutional neural network.
- the first DCNN 210 and second DCNN 220 are designed to have identical structure, and each of the first DCNN 210 and the second DCNN 220 includes a few convolutional layers.
- the first DCNN 210 and the second DCNN 220 are initialized using the ImageNet pre- trained model. While the training process continues, the first DCNN 210 and the second DCNN 220 separately evolve weights of the networks and do not share the weights.
- the first feature vector 230 and the second feature vector 240 are derived from the first six layers of the AlexNet or from the first six layers of the VGG16.
- the target object image corresponding to the proposal box 15 and the context region image corresponding to the context box 20 are resized to 227 ⁇ 227 for AlexNet and 224 ⁇ 224 for VGG16 image patches.
- the first DCNN 210 and the second DCNN 220 respectively output 4096-dimensional feature vectors, and the 4096-dimensional feature vectors are transmitted to the third neural network 300 that includes the concatenation module 310 , the fully connected NN 311 having two fully connected layers and the softmax function module 312 .
- the third neural network 300 After receiving a concatenated feature from the first DCNN 210 and the second DCNN 220 , the third neural network 300 outputs a predicted object category label using the softmax function module 312 with respect the target object image based on a concatenated feature vector generated by the concatenation module 310 .
- the pre-trained weights are not used for a predetermined number of last layers in the fully connected NN 311 . Instead the convolution layers are used.
- the proposal box 15 can be generated by a Deformable Part Model (DPM) module based on the Histogram of Oriented Gradient (HOG) features and latent support vector module.
- DPM Deformable Part Model
- HOG Histogram of Oriented Gradient
- latent support vector module the DPM module is designed to detect a category-specific objects, and the sizes of a root and part template of the DPM module are adjusted to accommodate a small object size, and then the DMP module is trained for predetermined different classes.
- the proposal box 15 can be generated by a region proposal network (RPN) 400 .
- the proposal box 15 generated by the RPN 400 is designed to have a predetermined number of pixels.
- the number of pixels may be 16 2 , 40 2 or 100 2 pixel 2 according to the configuration design of the object detection system 100 .
- the number of pixels may be greater than 100 2 pixel 2 when the category of small objects in the datasets of an object detection system is defined to be greater than 100 2 pixel 2 .
- the conv4_3 layer of the VGG network is used for feature maps associated with small anchor boxes, in which the receptive field of the conv4_3 layer is 92 ⁇ 92 pixels 2 .
- FIG. 7 shows an example of average precision results performed by different networks.
- the ContextNet is referred to as AlexNet.
- the second row (DPM prop.+AlexNet) is obtained by using DPM proposals, in which training and testing are performed by 500 per an image per a category.
- the third row (RPN prop.+AlexNet) is obtained by using RPN according to some embodiments, in which the training is performed by 2000 par an image and testing is performed by 500 per an image.
- the results show that PRN proposals with AlexNet training provide better performance than the others.
- a correct determination is made if an overlap ratio between the object box and the ground truth bounding box is greater than 0.5, in which the overlap ratio is measured by the Intersection over Union (IoU) measuring module.
- IoU Intersection over Union
- the overlap ration may be changed according to a predetermined detection accuracy designed in the object detection system 100 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
A method for detecting an object in an image includes extracting a first feature vector from a first region of an image using a first subnetwork, determining a second region of the image by resizing the first region into a fixed ratio using a second subnetwork, wherein a size of the first region is smaller than a size of the second region, extracting a second feature vector from the second region of the image using the second subnetwork, classifying a class of the object using a third subnetwork on a basis of the first feature vector and the second feature vector, and determining the class of object in the first region according to a result of the classification, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, wherein steps of the method are performed by a processor.
Description
- This invention relates to neural networks, and more specifically to object detection systems and methods using a neural network.
- Object detection is one of the most fundamental problems in computer vision. The goal of an object detection is to detect and localize all instances of pre-defined object classes in the form of bounding boxes with confidence values for given input images. An object detection problem can be converted to an object classification problem by a scanning window technique. However, the scanning window technique is inefficient because classification steps are performed for all potential image regions of various locations, scales, and aspect ratios.
- The region-based convolution neural network (R-CNN) is used to perform a two-stage approach, in which a set of object proposals are generated as regions of interest (ROI) using a proposal generator and the existence of an object and the classes in the ROI are determined using a deep neural network. However, the detection accuracy of the R-CNN is insufficient for some case. Accordingly, another approach is required to further improve the object detection performance.
- Some embodiments of the invention are based on recognition that region-based convolution neural network (R-CNN) can use detect objects of different sizes. However, detecting small objects in an image and/or predicting the class label the small objects in the image is a challenging problem for scene understanding due to small number of pixels in the image representing the small object.
- Some embodiments are based on realization that specific small objects are usually appearing in the specific contexts. For example, a mouse is usually place near a keyboard and a monitor. That context can be part of training and recognition to compensate for the small resolution of the small object. To that end, some embodiments extract feature vectors from different regions including the object. Those regions are of different size and provide different contextual information about the object. In some embodiments, the object is detected and/or classified based on combination of the feature vectors.
- Various embodiments can be used to detect the object of different sizes. In one embodiment, the size of the object is governed by the number of pixels of the image forming the object. For example, a small object is represented by less number of pixels. To that end, one embodiment resizes the region surrounding the object by at least seven times to collect enough contextual information.
- Accordingly, one embodiment discloses a non-transitory computer readable recoding medium storing thereon a program causing a computer to execute an object detection process. The object detection process includes extracting a first feature vector from a first region of an image using a first subnetwork; determining a second region of the image by resizing the first region, wherein a size of the first region differs from a size of the second region; extracting a second feature vector from the second region of the image using the first subnetwork; and detecting the object using a third subnetwork on a basis of the first feature vector and the second feature vector to produce a bounding box surrounding the object and a class of the object, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network.
- Another embodiment discloses a method for detecting an object in an image. The method includes steps of extracting a first feature vector from a first region of an image using a first subnetwork; determining a second region of the image by resizing the first region; extracting a second feature vector from a second region of the image using a second subnetwork; classifying a class of the object using a third subnetwork on a basis of the first feature vector and the second feature vector; and determining the class of object in the first region according to a result of the classifying, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, wherein steps of the method are performed by a processor.
- Another embodiment discloses an objection detection system. The system includes a human machine interface; a storage device including neural networks; a memory; a network interface controller connectable with a network being outside the system; an imaging interface connectable with an imaging device; and a processor configured to connect to the human machine interface, the storage device, the memory, the network interface controller and the imaging interface, wherein the processor executes instructions for detecting an object in an image using the neural networks stored in the storage device, wherein the neural networks perform steps of: extracting a first feature vector from a first region of the image using a first subnetwork; determining a second region of the image by processing the first feature vector with a second subnetwork, wherein a size of the first region differs from a size of the second region; extracting a second feature vector from the second region of the image using the first subnetwork; and detecting the object using a third subnetwork on a basis of the first feature vector and the second feature vector to produce a bounding box surrounding the object and a class of the object, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network.
-
FIG. 1 is a block diagram of an object detection system for detecting small objects in an image according to some embodiments of the invention; -
FIG. 2 shows a flowchart of processes for detecting a small object in an image; -
FIG. 3 is a block diagram of a neural network used in a computer-implemented object detection method for detecting small objects in an image according to some embodiments; -
FIG. 4A shows a procedure of resizing a target region image and a contest region image in an image; -
FIG. 4B shows an example of a procedure applying a proposal box and a context box to a clock image in an image; -
FIG. 4C shows a block diagram of a process for detecting a mouse image in an image; -
FIG. 5 shows an example of statistics of small object categories; -
FIG. 6 shows median bounding box sizes of objects per a category and the corresponding up-sampling ratios; and -
FIG. 7 shows an example of average precision results performed by different networks. -
FIG. 1 shows a block diagram of anobject detection system 100 according to some embodiments of the invention. Theobject detection system 100 includes a human machine interface (HMI) 110 connectable with akeyboard 111 and a pointing device/medium 112, aprocessor 120, astorage device 130, amemory 140, a network interface controller 150 (NIC) connectable with anetwork 190 including local area networks and internet network, adisplay interface 160, animaging interface 170 connectable with animaging device 175, aprinter interface 180 connectable with a printing device 185. Theobject detection system 100 can receive electric text/imaging documents 595 via thenetwork 190 connected to the NIC 150. Thestorage device 130 includesoriginal images 131, afilter system module 132, andneural networks 200. The pointing device/medium 112 may include modules that read programs stored on a computer readable recording medium. - For detecting an object in an image, instructions may be transmitted to the
object detection system 100 using thekeyboard 111, the pointing device/medium 112 or via thenetwork 190 connected to other computers (not shown in the figure). Theobject detection system 100 receives the instructions using theHMI 110 and executes the instructions for detecting an object in an image using theprocessor 120 using theneural networks 200 stored in thestorage device 130. Theprocessor 120 may be a plurality of processors including one or more than graphics processing units (GPUs). Thefilter system module 132 is operable to perform image processing to obtain predetermined formatted image from given images relevant to the instructions. The images processed by thefilter system module 132 can be used by theneural networks 200 for detecting objects. An object detection process using theneural networks 200 is described below. In the following description, a glimpse region is referred to as a glimpse box, a bounding box, a glimpse bounding box or a bounding box region, which is placed on a target in an image to detect the feature of the target object in the image. - Some embodiments are based on recognition that a method for detecting an object in an image includes extracting a first feature vector from a first region of an image using a first subnetwork, determining a second region of the image by resizing the first region into a fixed ratio, wherein a size of the first region is smaller than a size of the second region, extracting a second feature vector from the second region of the image using a second subnetwork, and classifying a class of the object using a third subnetwork on a basis of the first feature vector and the second feature vector, and determining the class of object in the first region according to a result of the classifying, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, wherein steps of the method are performed by a processor.
- Some embodiments of the invention are based on recognition that detecting small objects in an image and/or predicting the class label the small objects in the image is a challenging problem for scene understanding due to small number of pixels in the image representing the small object. However, some specific small objects are usually appearing in the specific contexts. For example, a mouse is usually place near a keyboard and a monitor. That context can be part of training and recognition to compensate for the small resolution of the small object. To that end, some embodiments extract feature vectors from different regions including the object. Those regions are of different size and provide different contextual information about the object. In some embodiments, the object is detected and/or classified based on combination of the feature vectors.
-
FIG. 2 shows a flowchart of processes for detecting a small object in an image. In step S1, a first feature vector is extracted from a first region in the image by using a first subnetwork. In step S2, a second region in the image is determined by resizing the first region with a predetermined ratio by used of a resize module. In step S3, a second feature vector is extracted from the second region by using a second subnetwork. In step S4, a third subnetwork classifies the object based on the first feature vector and second feature vector. The classification result of the object in the image is output by the third subnetwork in step S5. In this case, the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, and the steps are performed by a processor. Further, the step of resizing the first region is performed such that each of the first region and the second region includes the object and a size of the first region is smaller than a size of the second region. -
FIG. 3 shows a block diagram of an object detection method using theneural networks 200 according to some embodiments of the invention. Theneural networks 200 includes a region proposal network (RPN) 400 and aneural network 250. Theneural network 250 may be referred to as aContexNet 250. TheContextNet 250 includes acontext region module 12, aresize module 13, aresize module 14, a first deep convolutional neural network (DCNN) 210, a second deep convolutional neural network (DCNN) 220 and a thirdneural network 300. The thirdneural network 300 includes aconcatenation module 310, a fully connectedneural network 311 and asoftmax function module 312. Thefirst DCNN 210 may be referred to as a first subnetwork, thesecond DCNN 220 may be referred to as a second subnetwork and the thirdneural network 300 may be referred to as a third subnetwork. The first subnetwork and second subnetwork may have identical structure. - Upon instructions, when an
image 10 is provided to theobjet detection system 100, the region proposal network (RPN) 400 is applied to theimage 10 to generate aproposal box 15 being placed on a region of a target object image in the image. The part of theimage 10 encompassed by theproposal box 15 is referred to as a target region image. The target region image is resized to a resizedobject image 16 with a predetermined identical size and a predetermined resolution using aresize module 13, and the resizedobject image 16 is transmitted to theneural networks 200. Regarding the definition of small objects, a threshold size of small objects is predetermined to classify objects in the image into a small object category. The threshold size may be chosen according to the system design of object detection and used in theRPN 400 to generate theproposal box 15. Theproposal box 15 also provides thelocation information 340 of the target object image in theimage 10. For example, the threshold size may be determined based on predetermined physical sizes of objects in the image, pixel sizes of objects in the image or a ratio of an area of an object image to the whole area of the image. Successively, acontext box 20 is obtained by enlarging theproposal box 15 by seven times in x and y directions (height and width dimensions) using thecontext region module 12. Thecontext box 20 is placed on theproposal box 15 of theimage 10 to surround the target region image, in which part of the image determined by placing thecontext box 20 is referred to as a context region image. In this case, the context region image corresponding to thecontext box 20 is resized, using theresize module 13, to a resizedcontext image 21 having the predetermined size and transmitted to theContexNet 250. The context region image may be obtained by magnifying the target region image by seven times or other values according to the data configurations used in theContexNet 250. Accordingly, the target region image corresponding to theproposal box 15 and the context region image corresponding to thecontext box 20 are converted into the resizedtarget image 16 and the resizedcontext image 21 by using theresize module 13 and theresize module 14 before being transmitted to theContexNet 250. In this case, the resizedtarget image 16 and the resizedcontext image 21 have the predetermined identical size. For example, the predetermined identical size may be 227×227 (224×224 for VGG16) patches (pixels). The predetermined identical size may be changed according to the data format used in the neural networks. Further, the predetermined identical size may be defined based on a predetermined pixel size or a predetermined physical dimension, and the aspect ratios of the target region image and the context region image may be maintained after being resized. - The
ContexNet 250 receives the resizedtarget image 16 and the resizedcontext image 21 from thefirst DCNN 210 and thesecond DCNN 220, respectively. Thefirst DCNN 210 in theContexNet 250 extracts afirst feature vector 230 from the resizedtarget image 16, and transmits thefirst feature vector 230 to theconcatenation module 310 of the thirdneural network 300. Further, thesecond DCNN 220 in theContexNet 250 extracts asecond feature vector 240 from the resizedcontext image 21 and transmits thesecond feature vector 240 to theconcatenation module 310 of the thirdneural network 300. Theconcatenation module 310 concatenates thefirst feature vector 230 and thesecond feature vector 240 and generates a concatenated feature. The concatenated feature is transmitted to the fully connected neural network (NN) 311, and the fully connectedNN 311 generates a feature vector from the concatenated feature and transmits the concatenated feature vector to thesoftmax function module 312. Thesoftmax function module 312 performs a classification of the target object image based on the concatenated feature vector from the fully connectedNN 312 and outputs a classification result as acategory output 330. As a result, the object detection of the target object image corresponding to theproposal box 15 is obtained based on thecategory output 330 and thelocation information 340. - Proposal Box and Context Box
-
FIG. 4A shows a procedure of resizing a target region image and a contest region image in an image. When theproposal box 15 is applied to theimage 10, theneural networks 200 crops the target region image corresponding to theproposal box 15 and resized the target region image to a resizedtarget image 16, and the resizedtarget image 16 is transmitted to thefirst DCNN 210. Further, thecontext region module 12 enlarges theproposal box 15 by seven times in both x and y directions to obtain thecontext box 20. Thecontext region module 12 also places thecontext box 20 on theimage 10 so that thecontext box 20 covers the target region image corresponding to theproposal box 15. Thecontext region module 12 applies thecontext box 20 on theimage 10 to define a context region image. Then theneural networks 200 crops the context region image corresponding to thecontext box 20 and resizes the context region image to a resizedcontext image 21 having the predetermined size that is identical to that of the resizedtarget image 16. The resizedcontext image 21 is transmitted to thesecond DCNN 220, in which thesecond DCNN 220 and thefirst DCNN 210 have identical structure. This procedure improves detecting small objects because extracting features from greater areas in the image helps to incorporate context information resulting better discriminative operation. In another embodiment, the center of thecontext box 20 may be shifted from the center of theproposal box 15 by a predetermined distance according to a predetermined ratio between areas of thecontext box 20 and theproposal box 15. - In some embodiments, the
context box 20 is set to be greater than theproposal box 15 so that thecontext box 20 encloses theproposal box 15. For example, each of side lines of thecontext box 20 may be seven times greater than or equal to that of theproposal box 15. In this case, the center of theproposal box 15 is arranged to be identical to that of thecontext box 20. -
FIG. 4A also shows a generating process of thecontext box 20 from theproposal box 15. A vector of thecontext box 20 is obtained by converting a vector of theproposal box 15. The vector of theproposal box 15 is expressed by a position (x, y), a width w, and h a height of theproposal box 15. The position (x, y) indicates the position of one of corners of theproposal box 15 defined by x-y coordinate in theimage 10. The vector of theproposal box 15 is expressed by (x, y, w, h), in which a left side lower corner is given by the position (x, y) and a diagonal position to the position (x, y) of the left side lower corner is obtained by (x+w, y+h). The center (xc, yc) of theproposal box 15 is expressed by a point (x+w/2, y+h/2). When the width w and height h of theproposal box 15 are enlarged by a factor c to provide thecontext box 20, the vector (x′, y′, w′, h′) of thecontext box 20 is expressed by (xc−c·w/2, yc−c·h/2, c·w, c·h). InFIG. 4A , theproposal box 15 and thecontext box 20 have the identical center (xc, yc). In another embodiment, the center of thecontext box 20 may be shifted from the center of theproposal box 15 according to predetermined amounts Δx and Δy. For example, the predetermined amounts 4 x and Ay may be defined to satisfy the conditions of |Δx|≦(c−1)w/2 and |Δ|≦<(c−1)h/2 wherein c>1 so that theproposal box 15 is included in thecontext box 20 without protruding beyond thecontext box 20. -
FIG. 4B shows an example of a procedure applying a proposal box and a context box to a clock image in animage 13, in which an enlarged clock image is indicated at the right upper corner of theimage 13. It should be noted that the clock image is much smaller than the other objects, such as furniture, windows, a fireplace, etc. InFIG. 4B , aproposal box 17 is applied to part of the clock image as a target image in theimage 13. Subsequently, the target image corresponding to theproposal box 17 is enlarged into a resizedtarget image 16 and transmitted to thefirst DCNN 210 via theresize module 13. Further, theneural network 200 provides acontext box 22 based on theproposal box 17 and applied thecontext box 22 to the clock image, in which thecontext box 22 is arranged to fully surround theproposal box 17 with a predetermined area as shown in the figure. An image region corresponding to thecontext box 22 is cropped as a context image from theimage 13 and theresize module 14 resizes the context image into a resizedcontext image 21. The resizedcontext image 21 is transmitted to thesecond DCNN 220. In this case, the context image encloses the target image as seen in the figure. This procedure makes it possible for theneural network 200 to obtain the crucial information of a small object in the image, resulting higher accuracy for small object classifications. -
FIG. 4C shows a block diagram of a process for detecting a mouse image in an image. When animage 30 is provided, theregion proposal network 400 provides aproposal box 31 corresponding to a target object image showing a back side of a mouse on a desk and provides acontext box 32 surrounding theproposal box 31. After being resized by the resize module 13 (not shown), a resized target image of the target object image is transmitted to the first DCNN 210 (indicated as convolutional layers). Thefirst DCNN 210 extracts a first feature vector of the target object image from the resized target image and transmits the first feature vector to theconcatenation module 310. Further, thecontext box 32 is applied to theimage 30 to determine a context region image that encloses the target object image. After being resized by the resize module 14 (not shown), a resized context image of the context region image is transmitted to the second DCNN 220 (indicated as convolutional layers). Thesecond DCNN 220 extracts a second feature vector of the context region image from the resized context image and transmits the second feature vector to theconcatenation module 310. After obtaining the first feature vector and the second feature vector, theconcatenation module 310 concatenates the first and second feature vectors and generates a concatenated feature. The concatenated feature is transmitted to the fully connected NN 311 (indicated as fully connected layers). The fully connectedNN 311 generates and transmits a feature vector to thesoftmax function module 312. Thesoftmax function module 312 performs a classification of the target object image based on the feature vector from the fully connectedNN 312 and outputs a classification result. The classification result indicates that a category of the target object image is a “mouse” as shown in the figure. - Small Object Dataset
- As a small proposal box corresponding to a small object in an image causes a low dimensional feature vector, the size of a proposal box is chosen to obtain appropriate sized vectors that accommodate the context information of the proposal box in the
object detection system 100. - In some embodiments, a dataset for detecting small objects may be constructed by selecting predetermined small objects from conventional datasets, such as the SUN and Microsoft COCO datasets. For example, a subset of images of small objects are selected from the conventional datasets, and the ground truth bounding box locations in the conventional datasets are used to prune out big object instances from the conventional datasets and compose a small object dataset that purely contains small objects with small bounding boxes. The small object dataset may be constructed by computing the statistics of small objects.
-
FIG. 5 shows an example of statistics of small object categories. Ten example categories are listed in the figure. For example, it is seen that there are 2137 instances in 1739 images with respect to “mouse” category. Other categories such as “telephone”, “switch”, “outlet”, “clock”, “toilet paper”, “tissue box”, “faucet”, “plate”, and “jar” are also listed in the figure.FIG. 5 also shows the median relative area with respect to each category, in which the median relative area corresponds to the ratio of a bounding box area over the entire image area of object instances in the same category. The median relative area ranges between 0.08% and 0.58%. The relative areas correspond to pixel areas between 16×16 pixels2 and 42×42 pixels2 in VGA image. Thus, the small object dataset constructed according to the embodiment is customized for small objects. The sizes of small bounding boxes may be determined based on the small object dataset described above. On the other hand, a median of relative areas of object categories in a conventional dataset, such as the PASCAL VOC dataset, ranges between 1.38% and 46.40%. Accordingly, the boundary boxes provided by the small object dataset according to some embodiments of the invention can provide more accurate bounding boxes for small objects than the bounding boxes provided by the conventional dataset, because the conventional dataset provides much wider bounding box areas with respect to object categories that are not customized for small objects. - In constructing the small object dataset, the predetermined small objects may be determined by categorizing instances having physical dimensions smaller than a predetermined size. For example, the predetermined size may be 30 centimeters. In another example, the predetermined size may be 50 centimeters according to the object detection system design.
-
FIG. 6 shows median bounding box sizes of objects per a category and the corresponding up-sampling ratios. In the embodiment, the up-sampling ratio is chosen to be 6 to 7 to match an input size (227×227 in this case) of the deep convolutional neural network. - Configuration of Networks
- In some embodiments, the
first DCNN 210 andsecond DCNN 220 are designed to have identical structure, and each of thefirst DCNN 210 and thesecond DCNN 220 includes a few convolutional layers. In training process, thefirst DCNN 210 and thesecond DCNN 220 are initialized using the ImageNet pre- trained model. While the training process continues, thefirst DCNN 210 and thesecond DCNN 220 separately evolve weights of the networks and do not share the weights. - The
first feature vector 230 and thesecond feature vector 240 are derived from the first six layers of the AlexNet or from the first six layers of the VGG16. The target object image corresponding to theproposal box 15 and the context region image corresponding to thecontext box 20 are resized to 227×227 for AlexNet and 224×224 for VGG16 image patches. Thefirst DCNN 210 and thesecond DCNN 220 respectively output 4096-dimensional feature vectors, and the 4096-dimensional feature vectors are transmitted to the thirdneural network 300 that includes theconcatenation module 310, the fully connectedNN 311 having two fully connected layers and thesoftmax function module 312. After receiving a concatenated feature from thefirst DCNN 210 and thesecond DCNN 220, the thirdneural network 300 outputs a predicted object category label using thesoftmax function module 312 with respect the target object image based on a concatenated feature vector generated by theconcatenation module 310. In this case, the pre-trained weights are not used for a predetermined number of last layers in the fully connectedNN 311. Instead the convolution layers are used. - The
proposal box 15 can be generated by a Deformable Part Model (DPM) module based on the Histogram of Oriented Gradient (HOG) features and latent support vector module. In this case, the DPM module is designed to detect a category-specific objects, and the sizes of a root and part template of the DPM module are adjusted to accommodate a small object size, and then the DMP module is trained for predetermined different classes. - The
proposal box 15 can be generated by a region proposal network (RPN) 400. Theproposal box 15 generated by theRPN 400 is designed to have a predetermined number of pixels. The number of pixels may be 162, 402 or 1002 pixel2 according to the configuration design of theobject detection system 100. In another example, the number of pixels may be greater than 1002 pixel2 when the category of small objects in the datasets of an object detection system is defined to be greater than 1002 pixel2. For example, the conv4_3 layer of the VGG network is used for feature maps associated with small anchor boxes, in which the receptive field of the conv4_3 layer is 92×92 pixels2. -
FIG. 7 shows an example of average precision results performed by different networks. In this example, the ContextNet is referred to as AlexNet. The second row (DPM prop.+AlexNet) is obtained by using DPM proposals, in which training and testing are performed by 500 per an image per a category. The third row (RPN prop.+AlexNet) is obtained by using RPN according to some embodiments, in which the training is performed by 2000 par an image and testing is performed by 500 per an image. The results show that PRN proposals with AlexNet training provide better performance than the others. - In classifying an object, a correct determination is made if an overlap ratio between the object box and the ground truth bounding box is greater than 0.5, in which the overlap ratio is measured by the Intersection over Union (IoU) measuring module.
- In another embodiment, the overlap ration may be changed according to a predetermined detection accuracy designed in the
object detection system 100. - Although several preferred embodiments have been shown and described, it would be apparent to those skilled in the art that many changes and modifications may be made thereunto without the departing from the scope of the invention, which is defined by the following claims and their equivalents.
Claims (18)
1. A method for detecting an object in an image, comprising:
extracting a first feature vector from a first region of an image using a first subnetwork;
determining a second region of the image by resizing the first region;
extracting a second feature vector from a second region of the image using a second subnetwork;
classifying a class of the object using a third subnetwork on a basis of the first feature vector and the second feature vector; and
determining the class of object in the first region according to a result of the classifying,
wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network, wherein steps of the method are performed by a processor.
2. The method of claim 1 , wherein the resizing the first region is performed such that each of the first region and the second region includes the object, and wherein a size of the first region is smaller than a size of the second region.
3. The method of claim 1 , wherein the resizing is performed according to a fixed ratio, and the second subnetwork is a deep convolutional neural network.
4. The method of claim 1 , wherein at least one of the first subnetwork and second subnetwork is a deep convolutional neural network, and wherein the third subnetwork is a fully-connected neural network.
5. The method of claim 4 , wherein the third subnetwork performs a feature vector concatenation operation of the first feature vector and the second feature vector.
6. The method of claim 1 , further comprising:
rendering the detected object and the class of the object on a display device or transmitting the detected object and the class of the object.
7. The method of claim 1 , wherein the first region is obtained by a region proposal network.
8. The method of claim 7 , wherein the region proposal network is a convolutional neural network.
9. The method of claim 1 , wherein a width of the second region is seven times larger than a width of the first region.
10. The method of claim 1 , wherein a height of the second region is seven times larger than a height of the first region.
11. The method of claim 1 , wherein a width of the second region is three times larger than a width of the first region.
12. The method of claim 1 , wherein a height of the second region is three times larger than a height of the first region.
13. The method of claim 1 , wherein a center of the second region corresponds to a center of the first region.
14. The method of claim 1 , wherein the first region is resized to a first pre-determined size before the first region is input to the first subnetwork.
15. The method of claim 1 , wherein the second region is resized to a second pre-determined size before the second region is input to the second subnetwork.
16. The method of claim 1 , wherein the first region is obtained by using a deformable part model object detector.
17. A non-transitory computer readable recoding medium storing thereon a program causing a computer to execute an object detection process, the object detection process comprising:
extracting a first feature vector from a first region of an image using a first subnetwork;
determining a second region of the image by resizing the first region, wherein a size of the first region differs from a size of the second region;
extracting a second feature vector from the second region of the image using the first subnetwork; and
detecting the object using a third subnetwork on a basis of the first feature vector and the second feature vector to produce a bounding box surrounding the object and a class of the object, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network.
18. An objection detection system comprising:
a human machine interface;
a storage device including neural networks;
a memory;
a network interface controller connectable with a network being outside the system;
an imaging interface connectable with an imaging device; and
a processor configured to connect to the human machine interface, the storage device, the memory, the network interface controller and the imaging interface,
wherein the processor executes instructions for detecting an object in an image using the neural networks stored in the storage device, wherein the neural networks perform steps of:
extracting a first feature vector from a first region of the image using a first subnetwork;
determining a second region of the image by processing the first feature vector with a second subnetwork, wherein a size of the first region differs from a size of the second region;
extracting a second feature vector from the second region of the image using the first subnetwork; and
detecting the object using a third subnetwork on a basis of the first feature vector and the second feature vector to produce a bounding box surrounding the object and a class of the object, wherein the first subnetwork, the second subnetwork, and the third subnetwork form a neural network.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/226,088 US20180039853A1 (en) | 2016-08-02 | 2016-08-02 | Object Detection System and Object Detection Method |
| JP2017144325A JP6956555B2 (en) | 2016-08-02 | 2017-07-26 | How to detect objects in an image and object detection system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/226,088 US20180039853A1 (en) | 2016-08-02 | 2016-08-02 | Object Detection System and Object Detection Method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20180039853A1 true US20180039853A1 (en) | 2018-02-08 |
Family
ID=61069325
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/226,088 Abandoned US20180039853A1 (en) | 2016-08-02 | 2016-08-02 | Object Detection System and Object Detection Method |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US20180039853A1 (en) |
| JP (1) | JP6956555B2 (en) |
Cited By (69)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108491795A (en) * | 2018-03-22 | 2018-09-04 | 北京航空航天大学 | Pedestrian detection method and device for rail transit scene |
| CN108898145A (en) * | 2018-06-15 | 2018-11-27 | 西南交通大学 | A kind of image well-marked target detection method of combination deep learning |
| CN109242801A (en) * | 2018-09-26 | 2019-01-18 | 北京字节跳动网络技术有限公司 | Image processing method and device |
| US10304009B1 (en) * | 2018-10-08 | 2019-05-28 | StradVision, Inc. | Learning method and testing method for object detector based on R-CNN, and learning device and testing device using the same |
| US10387754B1 (en) * | 2019-01-23 | 2019-08-20 | StradVision, Inc. | Learning method and learning device for object detector based on CNN using 1×H convolution to be used for hardware optimization, and testing method and testing device using the same |
| CN110147753A (en) * | 2019-05-17 | 2019-08-20 | 电子科技大学 | Method and device for detecting small objects in image |
| US10387752B1 (en) * | 2019-01-22 | 2019-08-20 | StradVision, Inc. | Learning method and learning device for object detector with hardware optimization based on CNN for detection at distance or military purpose using image concatenation, and testing method and testing device using the same |
| US10402695B1 (en) * | 2019-01-23 | 2019-09-03 | StradVision, Inc. | Learning method and learning device for convolutional neural network using 1×H convolution for image recognition to be used for hardware optimization, and testing method and testing device using the same |
| US10423860B1 (en) * | 2019-01-22 | 2019-09-24 | StradVision, Inc. | Learning method and learning device for object detector based on CNN to be used for multi-camera or surround view monitoring using image concatenation and target object merging network, and testing method and testing device using the same |
| US10430691B1 (en) * | 2019-01-22 | 2019-10-01 | StradVision, Inc. | Learning method and learning device for object detector based on CNN, adaptable to customers' requirements such as key performance index, using target object merging network and target region estimating network, and testing method and testing device using the same to be used for multi-camera or surround view monitoring |
| US10691971B2 (en) * | 2016-11-28 | 2020-06-23 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing object |
| CN111461161A (en) * | 2019-01-22 | 2020-07-28 | 斯特拉德视觉公司 | Object detection method and device based on CNN and strong fluctuation resistance |
| US20200356802A1 (en) * | 2018-08-07 | 2020-11-12 | Shenzhen Sensetime Technology Co., Ltd. | Image processing method and apparatus, electronic device, storage medium, and program product |
| US10867384B2 (en) * | 2017-08-09 | 2020-12-15 | Shenzhen Keya Medical Technology Corporation | System and method for automatically detecting a target object from a 3D image |
| TWI717655B (en) * | 2018-11-09 | 2021-02-01 | 財團法人資訊工業策進會 | Feature determination apparatus and method adapted to multiple object sizes |
| US10942519B2 (en) * | 2017-07-07 | 2021-03-09 | Autox, Inc. | System and method for navigating an autonomous driving vehicle |
| US10990857B2 (en) | 2018-08-23 | 2021-04-27 | Samsung Electronics Co., Ltd. | Object detection and learning method and apparatus |
| CN112766244A (en) * | 2021-04-07 | 2021-05-07 | 腾讯科技(深圳)有限公司 | Target object detection method and device, computer equipment and storage medium |
| US11030774B2 (en) * | 2019-03-19 | 2021-06-08 | Ford Global Technologies, Llc | Vehicle object tracking |
| CN113168705A (en) * | 2018-10-12 | 2021-07-23 | 诺基亚技术有限公司 | Method and apparatus for context-embedded and region-based object detection |
| US11093800B2 (en) * | 2018-04-26 | 2021-08-17 | Boe Technology Group Co., Ltd. | Method and device for identifying object and computer readable storage medium |
| US11113822B2 (en) * | 2019-08-14 | 2021-09-07 | International Business Machines Corporation | Moving object identification from a video stream |
| US11113840B2 (en) * | 2016-12-29 | 2021-09-07 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for detecting objects in images |
| US20210312179A1 (en) * | 2018-08-08 | 2021-10-07 | Samsung Electronics Co., Ltd. | Electronic device for providing recognition result of external object by using recognition information about image, similar recognition information related to recognition information, and hierarchy information, and operating method therefor |
| US11144763B2 (en) * | 2018-04-02 | 2021-10-12 | Canon Kabushiki Kaisha | Information processing apparatus, image display method, and non-transitory computer-readable storage medium for display control |
| US11216694B2 (en) * | 2017-08-08 | 2022-01-04 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing object |
| US11227182B2 (en) * | 2018-04-16 | 2022-01-18 | Tencent Technology (Shenzhen) Company Limited | Method, apparatus, and storage medium for recognizing image object |
| US11250292B2 (en) * | 2018-02-01 | 2022-02-15 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Method and apparatus for generating information |
| US20220058411A1 (en) * | 2018-09-21 | 2022-02-24 | Sony Semiconductor Solutions Corporation | Solid state image capturing system, solid state image capturing device, information processing device, image processing method, information processing method, and program |
| US20220083811A1 (en) * | 2020-09-14 | 2022-03-17 | Panasonic I-Pro Sensing Solutions Co., Ltd. | Monitoring camera, part association method and program |
| US11341398B2 (en) * | 2016-10-03 | 2022-05-24 | Hitachi, Ltd. | Recognition apparatus and learning system using neural networks |
| US11373287B2 (en) * | 2018-09-06 | 2022-06-28 | Accenture Global Solutions Limited | Digital quality control using computer visioning with deep learning |
| US11388325B2 (en) | 2018-11-20 | 2022-07-12 | Walmart Apollo, Llc | Systems and methods for assessing products |
| US11393082B2 (en) * | 2018-07-26 | 2022-07-19 | Walmart Apollo, Llc | System and method for produce detection and classification |
| US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
| US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
| US11450003B2 (en) * | 2018-10-29 | 2022-09-20 | Fujifilm Healthcare Corporation | Medical imaging apparatus, image processing apparatus, and image processing method |
| US11448632B2 (en) | 2018-03-19 | 2022-09-20 | Walmart Apollo, Llc | System and method for the determination of produce shelf life |
| US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
| US11507800B2 (en) * | 2018-03-06 | 2022-11-22 | Adobe Inc. | Semantic class localization digital environment |
| US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| CN115546790A (en) * | 2022-11-29 | 2022-12-30 | 深圳智能思创科技有限公司 | Document layout segmentation method, device, equipment and storage medium |
| US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
| US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
| US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
| US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
| US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
| US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
| US11715059B2 (en) | 2018-10-12 | 2023-08-01 | Walmart Apollo, Llc | Systems and methods for condition compliance |
| US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
| US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
| US11769322B2 (en) | 2020-03-31 | 2023-09-26 | Mitsubishi Heavy Industries, Ltd. | Program creation device, object detection system, anchor setting method, and anchor setting program |
| US20230316694A1 (en) * | 2022-04-04 | 2023-10-05 | Arm Limited | Data processing systems |
| US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
| US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
| US11836674B2 (en) | 2017-05-23 | 2023-12-05 | Walmart Apollo, Llc | Automated inspection system |
| US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
| US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
| US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
| US20240096119A1 (en) * | 2020-12-30 | 2024-03-21 | Synchronoss Technologies, Inc. | Depth Based Image Tagging |
| US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
| US12148207B1 (en) * | 2023-06-14 | 2024-11-19 | Zhejiang Lab | Method and system for intelligent identification of rice growth potential based on UAV monitoring |
| US12175476B2 (en) | 2022-01-31 | 2024-12-24 | Walmart Apollo, Llc | Systems and methods for assessing quality of retail products |
| US12210563B1 (en) * | 2020-03-04 | 2025-01-28 | CSC Holdings, LLC | Flexible image repository for customer premises equipment |
| US12307350B2 (en) | 2018-01-04 | 2025-05-20 | Tesla, Inc. | Systems and methods for hardware-based pooling |
| US12462575B2 (en) | 2021-08-19 | 2025-11-04 | Tesla, Inc. | Vision-based machine learning model for autonomous driving with adjustable virtual camera |
| US12505595B2 (en) * | 2020-05-15 | 2025-12-23 | Nvidia Corporation | Content-aware style encoding using neural networks |
| US12522243B2 (en) | 2021-08-19 | 2026-01-13 | Tesla, Inc. | Vision-based system training with simulated content |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3756058A4 (en) | 2018-02-20 | 2021-11-24 | Uplift Labs, Inc. | Identifying movements and generating prescriptive analytics using movement intelligence |
| US10748033B2 (en) * | 2018-12-11 | 2020-08-18 | Industrial Technology Research Institute | Object detection method using CNN model and object detection apparatus using the same |
| US10395140B1 (en) * | 2019-01-23 | 2019-08-27 | StradVision, Inc. | Learning method and learning device for object detector based on CNN using 1×1 convolution to be used for hardware optimization, and testing method and testing device using the same |
| US10387753B1 (en) * | 2019-01-23 | 2019-08-20 | StradVision, Inc. | Learning method and learning device for convolutional neural network using 1×1 convolution for image recognition to be used for hardware optimization, and testing method and testing device using the same |
| EP4272186B1 (en) * | 2020-12-30 | 2026-01-28 | Zoox, Inc. | Intermediate input for machine learned model |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050147292A1 (en) * | 2000-03-27 | 2005-07-07 | Microsoft Corporation | Pose-invariant face recognition system and process |
| US20130286240A1 (en) * | 2012-04-30 | 2013-10-31 | Samsung Electronics Co., Ltd. | Image capturing device and operating method of image capturing device |
| US9098741B1 (en) * | 2013-03-15 | 2015-08-04 | Google Inc. | Discriminitive learning for object detection |
| US20150363634A1 (en) * | 2014-06-17 | 2015-12-17 | Beijing Kuangshi Technology Co.,Ltd. | Face Hallucination Using Convolutional Neural Networks |
| US9852492B2 (en) * | 2015-09-18 | 2017-12-26 | Yahoo Holdings, Inc. | Face detection |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5660273B2 (en) * | 2010-01-04 | 2015-01-28 | 日本電気株式会社 | Image diagnosis method, image diagnosis apparatus, and image diagnosis program |
-
2016
- 2016-08-02 US US15/226,088 patent/US20180039853A1/en not_active Abandoned
-
2017
- 2017-07-26 JP JP2017144325A patent/JP6956555B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050147292A1 (en) * | 2000-03-27 | 2005-07-07 | Microsoft Corporation | Pose-invariant face recognition system and process |
| US20130286240A1 (en) * | 2012-04-30 | 2013-10-31 | Samsung Electronics Co., Ltd. | Image capturing device and operating method of image capturing device |
| US9098741B1 (en) * | 2013-03-15 | 2015-08-04 | Google Inc. | Discriminitive learning for object detection |
| US20150363634A1 (en) * | 2014-06-17 | 2015-12-17 | Beijing Kuangshi Technology Co.,Ltd. | Face Hallucination Using Convolutional Neural Networks |
| US9852492B2 (en) * | 2015-09-18 | 2017-12-26 | Yahoo Holdings, Inc. | Face detection |
Non-Patent Citations (2)
| Title |
|---|
| Sean et al, ("Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks", IEEE Conference on Computer Vision and Pattern Recognition, 27-30 June 2016, Pages 2874-2883). * |
| Volodymyr et al, (Recurrent Models of Visual Attention, 2014-06-24, retrieved from the internet: https://papers. lips, cc/paper/5542-recurrent-models-of-visual-attention. Pdf) * |
Cited By (97)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11341398B2 (en) * | 2016-10-03 | 2022-05-24 | Hitachi, Ltd. | Recognition apparatus and learning system using neural networks |
| US10691971B2 (en) * | 2016-11-28 | 2020-06-23 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing object |
| US11113840B2 (en) * | 2016-12-29 | 2021-09-07 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for detecting objects in images |
| US12020476B2 (en) | 2017-03-23 | 2024-06-25 | Tesla, Inc. | Data synthesis for autonomous control systems |
| US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
| US11836674B2 (en) | 2017-05-23 | 2023-12-05 | Walmart Apollo, Llc | Automated inspection system |
| US12450564B2 (en) | 2017-05-23 | 2025-10-21 | Walmart Apollo, Llc | Automated inspection system |
| US10942519B2 (en) * | 2017-07-07 | 2021-03-09 | Autox, Inc. | System and method for navigating an autonomous driving vehicle |
| US12216610B2 (en) | 2017-07-24 | 2025-02-04 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
| US12536131B2 (en) | 2017-07-24 | 2026-01-27 | Tesla, Inc. | Vector computational unit |
| US12086097B2 (en) | 2017-07-24 | 2024-09-10 | Tesla, Inc. | Vector computational unit |
| US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
| US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
| US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
| US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
| US11216694B2 (en) * | 2017-08-08 | 2022-01-04 | Samsung Electronics Co., Ltd. | Method and apparatus for recognizing object |
| US10867384B2 (en) * | 2017-08-09 | 2020-12-15 | Shenzhen Keya Medical Technology Corporation | System and method for automatically detecting a target object from a 3D image |
| US12307350B2 (en) | 2018-01-04 | 2025-05-20 | Tesla, Inc. | Systems and methods for hardware-based pooling |
| US11797304B2 (en) | 2018-02-01 | 2023-10-24 | Tesla, Inc. | Instruction set architecture for a vector computational unit |
| US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
| US11250292B2 (en) * | 2018-02-01 | 2022-02-15 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Method and apparatus for generating information |
| US12455739B2 (en) | 2018-02-01 | 2025-10-28 | Tesla, Inc. | Instruction set architecture for a vector computational unit |
| US11507800B2 (en) * | 2018-03-06 | 2022-11-22 | Adobe Inc. | Semantic class localization digital environment |
| US11448632B2 (en) | 2018-03-19 | 2022-09-20 | Walmart Apollo, Llc | System and method for the determination of produce shelf life |
| CN108491795A (en) * | 2018-03-22 | 2018-09-04 | 北京航空航天大学 | Pedestrian detection method and device for rail transit scene |
| US11144763B2 (en) * | 2018-04-02 | 2021-10-12 | Canon Kabushiki Kaisha | Information processing apparatus, image display method, and non-transitory computer-readable storage medium for display control |
| US11227182B2 (en) * | 2018-04-16 | 2022-01-18 | Tencent Technology (Shenzhen) Company Limited | Method, apparatus, and storage medium for recognizing image object |
| US11093800B2 (en) * | 2018-04-26 | 2021-08-17 | Boe Technology Group Co., Ltd. | Method and device for identifying object and computer readable storage medium |
| CN108898145A (en) * | 2018-06-15 | 2018-11-27 | 西南交通大学 | A kind of image well-marked target detection method of combination deep learning |
| US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
| US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
| US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
| US11734813B2 (en) * | 2018-07-26 | 2023-08-22 | Walmart Apollo, Llc | System and method for produce detection and classification |
| US20220351364A1 (en) * | 2018-07-26 | 2022-11-03 | Walmart Apollo, Llc | System and method for produce detection and classification |
| US11393082B2 (en) * | 2018-07-26 | 2022-07-19 | Walmart Apollo, Llc | System and method for produce detection and classification |
| US12079723B2 (en) | 2018-07-26 | 2024-09-03 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
| US20200356802A1 (en) * | 2018-08-07 | 2020-11-12 | Shenzhen Sensetime Technology Co., Ltd. | Image processing method and apparatus, electronic device, storage medium, and program product |
| US20210312179A1 (en) * | 2018-08-08 | 2021-10-07 | Samsung Electronics Co., Ltd. | Electronic device for providing recognition result of external object by using recognition information about image, similar recognition information related to recognition information, and hierarchy information, and operating method therefor |
| US11995122B2 (en) * | 2018-08-08 | 2024-05-28 | Samsung Electronics Co., Ltd. | Electronic device for providing recognition result of external object by using recognition information about image, similar recognition information related to recognition information, and hierarchy information, and operating method therefor |
| US10990857B2 (en) | 2018-08-23 | 2021-04-27 | Samsung Electronics Co., Ltd. | Object detection and learning method and apparatus |
| US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
| US12346816B2 (en) | 2018-09-03 | 2025-07-01 | Tesla, Inc. | Neural networks for embedded devices |
| US11983630B2 (en) | 2018-09-03 | 2024-05-14 | Tesla, Inc. | Neural networks for embedded devices |
| US11373287B2 (en) * | 2018-09-06 | 2022-06-28 | Accenture Global Solutions Limited | Digital quality control using computer visioning with deep learning |
| US12450710B2 (en) | 2018-09-06 | 2025-10-21 | Accenture Global Solutions Limited | Digital quality control using computer visioning with deep learning |
| US12079712B2 (en) * | 2018-09-21 | 2024-09-03 | Sony Semiconductor Solutions Corporation | Solid state image capturing system, solid state image capturing device, information processing device, image processing method, information processing method |
| US20220058411A1 (en) * | 2018-09-21 | 2022-02-24 | Sony Semiconductor Solutions Corporation | Solid state image capturing system, solid state image capturing device, information processing device, image processing method, information processing method, and program |
| US12499356B2 (en) | 2018-09-21 | 2025-12-16 | Sony Semiconductor Solutions Corporation | Solid-state image capturing system, solid-state image capturing device, information processing device, image processing method, and information processing method |
| CN109242801A (en) * | 2018-09-26 | 2019-01-18 | 北京字节跳动网络技术有限公司 | Image processing method and device |
| US10304009B1 (en) * | 2018-10-08 | 2019-05-28 | StradVision, Inc. | Learning method and testing method for object detector based on R-CNN, and learning device and testing device using the same |
| US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
| US11715059B2 (en) | 2018-10-12 | 2023-08-01 | Walmart Apollo, Llc | Systems and methods for condition compliance |
| CN113168705A (en) * | 2018-10-12 | 2021-07-23 | 诺基亚技术有限公司 | Method and apparatus for context-embedded and region-based object detection |
| US12106261B2 (en) | 2018-10-12 | 2024-10-01 | Walmart Apollo, Llc | Systems and methods for condition compliance |
| US20210383166A1 (en) * | 2018-10-12 | 2021-12-09 | Nokia Technologies Oy | Method and apparatus for context-embedding and region-based object detection |
| US11908160B2 (en) * | 2018-10-12 | 2024-02-20 | Nokia Technologies Oy | Method and apparatus for context-embedding and region-based object detection |
| US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
| US11450003B2 (en) * | 2018-10-29 | 2022-09-20 | Fujifilm Healthcare Corporation | Medical imaging apparatus, image processing apparatus, and image processing method |
| TWI717655B (en) * | 2018-11-09 | 2021-02-01 | 財團法人資訊工業策進會 | Feature determination apparatus and method adapted to multiple object sizes |
| US11733229B2 (en) | 2018-11-20 | 2023-08-22 | Walmart Apollo, Llc | Systems and methods for assessing products |
| US11388325B2 (en) | 2018-11-20 | 2022-07-12 | Walmart Apollo, Llc | Systems and methods for assessing products |
| US12367405B2 (en) | 2018-12-03 | 2025-07-22 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
| US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
| US11908171B2 (en) | 2018-12-04 | 2024-02-20 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| US12198396B2 (en) | 2018-12-04 | 2025-01-14 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
| US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
| US12136030B2 (en) | 2018-12-27 | 2024-11-05 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
| US10423860B1 (en) * | 2019-01-22 | 2019-09-24 | StradVision, Inc. | Learning method and learning device for object detector based on CNN to be used for multi-camera or surround view monitoring using image concatenation and target object merging network, and testing method and testing device using the same |
| US10430691B1 (en) * | 2019-01-22 | 2019-10-01 | StradVision, Inc. | Learning method and learning device for object detector based on CNN, adaptable to customers' requirements such as key performance index, using target object merging network and target region estimating network, and testing method and testing device using the same to be used for multi-camera or surround view monitoring |
| CN111461161A (en) * | 2019-01-22 | 2020-07-28 | 斯特拉德视觉公司 | Object detection method and device based on CNN and strong fluctuation resistance |
| US10387752B1 (en) * | 2019-01-22 | 2019-08-20 | StradVision, Inc. | Learning method and learning device for object detector with hardware optimization based on CNN for detection at distance or military purpose using image concatenation, and testing method and testing device using the same |
| US10387754B1 (en) * | 2019-01-23 | 2019-08-20 | StradVision, Inc. | Learning method and learning device for object detector based on CNN using 1×H convolution to be used for hardware optimization, and testing method and testing device using the same |
| US10402695B1 (en) * | 2019-01-23 | 2019-09-03 | StradVision, Inc. | Learning method and learning device for convolutional neural network using 1×H convolution for image recognition to be used for hardware optimization, and testing method and testing device using the same |
| US12014553B2 (en) | 2019-02-01 | 2024-06-18 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
| US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
| US12223428B2 (en) | 2019-02-01 | 2025-02-11 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
| US12164310B2 (en) | 2019-02-11 | 2024-12-10 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
| US12236689B2 (en) | 2019-02-19 | 2025-02-25 | Tesla, Inc. | Estimating object properties using visual image data |
| US11030774B2 (en) * | 2019-03-19 | 2021-06-08 | Ford Global Technologies, Llc | Vehicle object tracking |
| CN110147753A (en) * | 2019-05-17 | 2019-08-20 | 电子科技大学 | Method and device for detecting small objects in image |
| US11113822B2 (en) * | 2019-08-14 | 2021-09-07 | International Business Machines Corporation | Moving object identification from a video stream |
| US12210563B1 (en) * | 2020-03-04 | 2025-01-28 | CSC Holdings, LLC | Flexible image repository for customer premises equipment |
| US11769322B2 (en) | 2020-03-31 | 2023-09-26 | Mitsubishi Heavy Industries, Ltd. | Program creation device, object detection system, anchor setting method, and anchor setting program |
| US12505595B2 (en) * | 2020-05-15 | 2025-12-23 | Nvidia Corporation | Content-aware style encoding using neural networks |
| US12026225B2 (en) * | 2020-09-14 | 2024-07-02 | i-PRO Co., Ltd. | Monitoring camera, part association method and program |
| US20220083811A1 (en) * | 2020-09-14 | 2022-03-17 | Panasonic I-Pro Sensing Solutions Co., Ltd. | Monitoring camera, part association method and program |
| US20240096119A1 (en) * | 2020-12-30 | 2024-03-21 | Synchronoss Technologies, Inc. | Depth Based Image Tagging |
| CN112766244A (en) * | 2021-04-07 | 2021-05-07 | 腾讯科技(深圳)有限公司 | Target object detection method and device, computer equipment and storage medium |
| US12462575B2 (en) | 2021-08-19 | 2025-11-04 | Tesla, Inc. | Vision-based machine learning model for autonomous driving with adjustable virtual camera |
| US12522243B2 (en) | 2021-08-19 | 2026-01-13 | Tesla, Inc. | Vision-based system training with simulated content |
| US12175476B2 (en) | 2022-01-31 | 2024-12-24 | Walmart Apollo, Llc | Systems and methods for assessing quality of retail products |
| US20230316694A1 (en) * | 2022-04-04 | 2023-10-05 | Arm Limited | Data processing systems |
| CN115546790A (en) * | 2022-11-29 | 2022-12-30 | 深圳智能思创科技有限公司 | Document layout segmentation method, device, equipment and storage medium |
| US12148207B1 (en) * | 2023-06-14 | 2024-11-19 | Zhejiang Lab | Method and system for intelligent identification of rice growth potential based on UAV monitoring |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2018022484A (en) | 2018-02-08 |
| JP6956555B2 (en) | 2021-11-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20180039853A1 (en) | Object Detection System and Object Detection Method | |
| US10210418B2 (en) | Object detection system and object detection method | |
| CN111353512B (en) | Obstacle classification method, obstacle classification device, storage medium and computer equipment | |
| US10127199B2 (en) | Automatic measure of visual similarity between fonts | |
| CN110443258B (en) | Character detection method and device, electronic equipment and storage medium | |
| US20190130232A1 (en) | Font identification from imagery | |
| WO2018184195A1 (en) | Joint training of neural networks using multi-scale hard example mining | |
| CN110674804A (en) | Text image detection method and device, computer equipment and storage medium | |
| US20190244028A1 (en) | System and Method for Detecting Objects in Video Sequences | |
| US20160364633A1 (en) | Font recognition and font similarity learning using a deep neural network | |
| US8170303B2 (en) | Automatic cardiac view classification of echocardiography | |
| JP2008171417A (en) | Method for detecting a substantially rectangular object in an image, method for estimating a background color in an image, computer readable medium, apparatus for detecting a substantially rectangular object in an image, and apparatus for estimating a background color in an image | |
| CN119580116B (en) | A method for dynamic target detection in remote sensing images based on poly-kernels | |
| CN103295021A (en) | Method and system for detecting and recognizing feature of vehicle in static image | |
| US12293578B2 (en) | Object detection method, object detection apparatus, and non-transitory computer-readable storage medium storing computer program | |
| CN114708462A (en) | Method, system, device and storage medium for generating detection model for multi-data training | |
| KR20190059083A (en) | Apparatus and method for recognition marine situation based image division | |
| CN115240240A (en) | Infrared face recognition method and system based on YOLO network | |
| US20220092790A1 (en) | Digital Image Boundary Detection | |
| CN113724237B (en) | Tooth trace identification method, device, computer equipment and storage medium | |
| Azaza et al. | Context proposals for saliency detection | |
| US9607398B2 (en) | Image processing apparatus and method of controlling the same | |
| CN116128883A (en) | Photovoltaic panel quantity counting method and device, electronic equipment and storage medium | |
| CN115937863A (en) | Image recognition method, device, computer equipment, storage medium | |
| EP4184432A1 (en) | Information processing device, information processing method, and information processing program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |