[go: up one dir, main page]

US9349189B2 - Occlusion resistant image template matching using distance transform - Google Patents

Occlusion resistant image template matching using distance transform Download PDF

Info

Publication number
US9349189B2
US9349189B2 US14/319,814 US201414319814A US9349189B2 US 9349189 B2 US9349189 B2 US 9349189B2 US 201414319814 A US201414319814 A US 201414319814A US 9349189 B2 US9349189 B2 US 9349189B2
Authority
US
United States
Prior art keywords
pixels
gradient
image data
distance metric
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US14/319,814
Other versions
US20150003741A1 (en
Inventor
Xi Zhang
Xin Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Here Global BV
Original Assignee
Here Global BV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Here Global BV filed Critical Here Global BV
Priority to US14/319,814 priority Critical patent/US9349189B2/en
Publication of US20150003741A1 publication Critical patent/US20150003741A1/en
Application granted granted Critical
Publication of US9349189B2 publication Critical patent/US9349189B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/0085
    • G06K9/00389
    • G06K9/00637
    • G06K9/03
    • G06K9/6203
    • G06T7/0028
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • G06V10/7515Shifting the patterns to accommodate for positional errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs

Definitions

  • the following disclosure relates to image template matching, or more particularly, to a distance transform that provide accurate image template matching even when significant occlusion are present in the image data.
  • Shape matching or object recognition is a fundamental problem in computer vision and has been used in various applications ranging from information retrieval to object tracking.
  • the measurements of similarities between templates and target objects have been studied extensively. Human beings naturally recognize thousands if not millions of shapes instantly with no effort. The lightening, scale, orientation, or viewing direction in which an object is viewed is easily reconciled by the human mind. Computers, on the other hand, do not easily interpret shapes when appearances have been modified. Computer vision focuses on the identification of these shapes.
  • An occlusion is a partially obstructed object in the image. Because a portion of the object is not viewable in the image, the object shape and other properties have changes. Occlusions occur when one object is positioned in front of another. Occlusions may also occur when lighting for an object changes. For example, shadows may change the appearance of an object. An image of an object taken midday may be matched easily to a template but images of the same object taken later in the day may be too occluded by shadows for accurate matching.
  • One particular area of concern is object recognition in aerial photographs of buildings, which tend to cast far reaching shadows.
  • a computing device performs matching between a target image and one or more template images.
  • the computing device receives image data and performs an edge detection algorithm on the image data.
  • the edge detection algorithm includes a distance metric based on angles between gradient vectors in the image data and gradient vectors in one or more templates.
  • the computing device matches a building model to the image data based on results of the edge detection algorithm, wherein the building model is associated with the one or more templates.
  • FIG. 3 illustrates an example system for template matching.
  • FIG. 2 illustrates examples distance transforms for chamfer matching.
  • FIG. 3 illustrates an example of a distance metric with edge orientation.
  • FIG. 4 illustrates an example set of pixel values for a gradient vector.
  • FIG. 5 illustrates example unit vectors for the gradient vector.
  • FIG. 6 illustrates an example of template matching omitting occluded pixels.
  • FIG. 7 illustrates an exemplary computing device of the system of FIG. 1 .
  • FIG. 8 illustrates an example flowchart for template matching using the computing device of FIG. 7 .
  • Sketch and silhouette are primary visual cues that have been shown to be a dominant element in human perception and cognition. Matching objects through sketch and silhouette discards other valuable visual information such as intensity and texture. On the other hand, shape provides a compact form representation of the information which has been shown to be very useful for matching.
  • Techniques for matching objects using sketch edges may include a metric based on the distance measurement between corresponding pixels is adopted to evaluate the similarity among edge images.
  • matching methods can be categorized into two groups: feature-dependent and feature-independent.
  • the feature-dependent group solves the correspondence problem by minimizing distance of feature between corresponding pixels.
  • the feature-independent group determines the distance in a more general way over all of pixels.
  • feature-independent based methods are preferred, when speed is significantly considered, however, the performance of such methods may be affected when the scene comes more complicated and possesses more noise.
  • feature dependent methods perform better and are more resistant to cluttered and occluded scenes. This increased accuracy results in increased computation cost.
  • Chamfer matching methods which are based on integrating edge orientation, are a preferred choice when speed and accuracy are concerned. However, these methods do not support partial matching and may suffer from large error when an object is occluded.
  • a chamfer matching algorithm includes a distance transform for quantifying the differences between two different images.
  • the differences may be distances between edge points in the images.
  • the following embodiments include a distance metric that facilitates template matching even with occlusions and a noisy environment.
  • FIG. 1 illustrates an example system 120 for template matching.
  • the system 120 includes a developer system 121 , a mobile device 122 , a workstation 128 , and a network 127 . Additional, different, or fewer components may be provided. For example, many mobile devices 122 and/or workstations 128 connect with the network 127 .
  • the developer system 121 includes a server 125 and a database 123 .
  • the server 125 includes at least a processor, a communication interface, and a memory. Either or both of the mobile device 122 and the server 125 are configured to perform the methods described herein.
  • the mobile device 122 and the server 125 are configured to perform template matching to register or align image data with a building model.
  • the image data may be collected by a satellite or another aerial vehicle (e.g., airplane, helicopter, low-altitude radio controlled device such as a quad-copter, or an unmanned aerial vehicle).
  • a satellite or another aerial vehicle e.g., airplane, helicopter, low-altitude radio controlled device such as a quad-copter, or an unmanned aerial vehicle.
  • the mobile device 122 and the server 125 are configured to receive and analyze the image data.
  • the analysis may include an edge detection algorithm with a distance transform (distance metric).
  • the distance transform is calculated from a gradient area of the neighborhood of each pixel.
  • the distance transform describes each pixel as a function of neighboring pixels.
  • the edge detection algorithm may include a window that slides across the image data. The size of the window may be configured by a user.
  • the mobile device 122 and the server 125 are configured to analyze the image data to identify occluded areas based on the distance transform.
  • the occluded areas may include structures in shadows (partially or completely) of other structures, shadows from trees, or shadows from clouds. Potentially occluded areas are distinguished from normal areas in the image data.
  • the mobile device 122 and the server 125 are configured to perform a first edge detection algorithm on the occluded areas and a second edge detection algorithm on the normal areas.
  • the database 123 may store the building footprints and the image data.
  • the database 123 may associate building footprints with geographic locations and/or locations in the image data for registering the image data with the building footprints.
  • the mobile device 122 is a smart phone, a mobile phone, a personal digital assistant (“PDA”), a tablet computer, a notebook computer, a personal navigation device (“PND”), a portable navigation device, in-car navigation system, and/or any other known or later developed portable or mobile device.
  • the mobile device 122 includes one or more detectors or sensors as a positioning system built or embedded into or within the interior of the mobile device or vehicle 122 .
  • the mobile device 122 receives location data from the positioning system.
  • the developer system 121 , the workstation 128 , and the mobile device 122 are coupled with the network 127 .
  • the phrase “coupled with” is defined to mean directly connected to or indirectly connected through one or more intermediate components. Such intermediate components may include hardware and/or software-based components.
  • FIG. 2 illustrates examples distance transforms for chamfer matching.
  • one image is a target image and the other image is a template.
  • the target image and the template image may be binary images in which each pixel may have only two values (e.g., 1 or 0).
  • the target image may be an image observed in the real world (e.g., an aerial image or other photograph), and the template may be one of multiple templates that may be potential matched with the target image.
  • the templates include building models.
  • the building model may be a three-dimensional building model or a two-dimensional building model.
  • the two-dimensional building model may be referred to as building footprints.
  • the building model may be measured using a range finding device (e.g., a light detection and ranging (LIDAR) sensor) mounted on a ground vehicle or an aerial vehicle.
  • the building mode may be created through measuring the locations of buildings manually.
  • the building model may include outlines of buildings derived from a point cloud collected by the range finding device.
  • the building model may include locations of the corners and/or edges of the buildings.
  • the building model may be overlaid on a city map and stored in a map database.
  • the transformation (W) between the template image and the target image is Euclidian.
  • the transformation (W) includes one or more of a rotation (R), and a translation (T).
  • the translation may be performed by a translation vector include a translation distance in one or more directions.
  • the rotation may be performed by a rotation matrix including an angle of rotation in one or more directions.
  • the position of any pixel v i after the transformation including translation (T) and rotation (R) may be given by Equation 1.
  • T ]) R ⁇ v i +T ⁇ v′ i Eq. 3.
  • the computation of the distance metric d′ (v′ i , u i ) may involve complex calculations and significant computational resources.
  • the distance from any pixel in the target image to the closest edge pixel is computed ahead of time, stored in memory, and reused whenever V is placed to a new location under transformation.
  • example objects are shown with corresponding distance transforms.
  • the distance transforms are represented with grayscale intensity to show the relative distances from pixels to the nearest obstacle pixel (e.g., a boundary pixel).
  • a simple square 130 is the object
  • the corresponding distance transform in section (b) includes regularly shaped regions 140 around the object.
  • the distance transform may be drastically affected by noise.
  • section (c) a few random noise pixels 131 are added in the vicinity of the square 130 .
  • the distance transform treats the noise pixels that are separated from the rest of the object in the image as boundary pixels, the regularly shaped regions 140 are disrupted as shown by irregular regions 141 in section (d).
  • the distance transform may be drastically affected by missing portions or occlusion of the object, as shown by incomplete square 134 in section (e). Because the incomplete square 134 is missing boundary pixels, the distance transform follows a similar irregular shape 144 shown in section (f). Matching algorithms have difficulty matching the distance transforms in sections (d) or (e) with the original square 130 even though casual observation could easily determine that the differences were simple noise or occlusions.
  • the matching problem is a fundamental problem in image processing and computer vision.
  • a robust method should be resistant to noise and be able to deal with cases where matching targets are incomplete.
  • the ordinary chamfer matching is very sensitive to noise and the result of matching will become unreliable if the target object is occluded.
  • three improvements are described. First the use of edge orientation in the distance metric is extended. Second, an edge distance variance is used to introduce connected components into the distance metric. Third, a method for matching incomplete targets is used.
  • the computation of the distance transform image is very sensitive to noise in that even a few pixels can significantly change the contents of the chamfer distance field, as shown in FIG. 2 .
  • the inverse of the image convolved by a Gaussian as a distance transform image may be used and to some extent reduces the effect of noise.
  • a computing device analyzes a target image.
  • the target image includes at least one shape comprising edge pixels.
  • the computing device identifies the edge pixels.
  • the edge pixels may be at specific locations within the target image.
  • the edge pixels may identified by comparing pixel values of adjacent pixels.
  • the pixel values may be color, brightness, hue, intensity, or another image property.
  • the pixel values may be a numeric value for grayscale intensity.
  • the computing device applies a Gaussian function to the edge pixels.
  • the computing device may convolve the edge pixels of the target image with a Gaussian function.
  • the Gaussian function may be a Gaussian blurring function, a normal distribution, a probability density function of a normally distributed random variable.
  • the Gaussian function creates a gradient region around the edge pixels.
  • the gradient region includes multiple pixels of varying pixel intensities having a gradient direction from higher pixel values to lower pixel values, or vice versa, based on the arrangement of edge pixels and the Gaussian function.
  • Edge orientation is effective when solving matching problem using chamfer matching.
  • the computation of edge direction does not need to generate a linear representation of the edges. Instead, the edge direction is set to be perpendicular to the gradient vector of each pixel.
  • the edge direction of the distance transform image can be computed by taking a vector that is perpendicular to the gradient vector.
  • template image V which is a binary edge image
  • a Gaussian is first applied to the original image to create a gradient area close to edge pixels. This enables a gradient vector to be computed based on the resulting image.
  • FIG. 3 illustrates an example of a distance metric with edge orientation.
  • An original edge image 150 depicts a human hand.
  • the computing device applies (e.g., convolution) the Gaussian function to the original edge image 150 to generate the smoothed edge image 151 .
  • the computing device may analyze the edge pixel of the edge image 150 one and a time in succession.
  • FIG. 3 illustrates the analysis of edge pixel 152 .
  • the computing device determines a gradient direction around the edge pixel 152 .
  • the computing device may consider a predetermined window around the edge pixel.
  • the predetermined window may be a sized square or rectangle.
  • the predetermined may be symmetric around the edge pixel 152 , which is possible with windows sized at 3 pixels by 3 pixels, 5 pixels by 5 pixels, 7 pixels by 7 pixels and so on.
  • the gradient vector 153 may point from the edge pixel 152 to a pixel having the lowest value, or the highest value, within the predetermined window. Note that larger predetermined windows may lead to different directions for the gradient vector 153 .
  • the edge direction vector 154 is perpendicular to the gradient vector 153 .
  • the edge direction vector 154 may be calculated by multiplying the gradient vector 154 by an inverting vector or by multiplying one component of the gradient vector 154 by ⁇ 1. There may be two possible results for the edge direction vector 154 .
  • the computing device may select one of the two possible results based on a dihedral angle. The dihedral angle measures how much the direction of the edge are correlated to each other. Depending on the direction of the edge direction vector, there are two possible angels between 0 and 180 degrees. The computing device may select the smaller of the two angles.
  • FIG. 4 illustrates an example set of pixel values for a gradient vector.
  • the set of pixels may be the pixels surrounding the edge pixel 152 of FIG. 3 as defined by window 156 .
  • the pixel values may range from 15 for black to 0 for white.
  • the computing device may compare the pixel values in the window 156 to identify a minimum value or a maximum value. In the example of FIG. 4 , the minimum value of “8” is associated with minimum pixel 157 .
  • the computing device calculates a gradient vector from the edge pixel 152 to the minimum pixel 157 .
  • FIG. 5 illustrates example unit vectors to each of the cells in the predetermined window.
  • the unit vector to the minimum pixel 157 is [1, 1].
  • the computing device may calculate the edge direction vector (edge orientation) by changing the sign of one of the components of the unit vector (e.g., [ ⁇ 1, 1]).
  • the computing device may perform Equation 4 to calculate a distance function from a pixel v i in the target image to the nearest corresponding pixel u i in the template image.
  • the edge direction vectors for the target image are ⁇ (v′ i ) and nearest corresponding pixels in the template image are ⁇ (u i ).
  • the inner product or dot product of the respective edge direction vectors provides an indication of the relative orientations of the edge direction vectors or the similarity of the edge direction vectors.
  • the term ⁇ is a constant weight, which determines the importance of edge orientations when normalizing the terms in Equation 4.
  • the value for ⁇ determines a weight to pixels with as a function of the dihedral angle for the edge direction vector.
  • may range between 0 and 1 (e.g., 0.7).
  • the distance metric for any given edge pixel location in the target image depends on an angle between the edge direction vector of the pixel location, as determined by the gradient, and the edge direction vector of the nearest corresponding edge pixel location in the template image.
  • the computing device may sum the distance metric for multiple pixel locations.
  • the multiple pixel locations may include all of the edge pixels in the target image or all pixels in the template image as identified by the computing device.
  • the multiple pixel locations may be all pixels of the target image.
  • the sum of the distance metric for the multiple pixel locations may be a match score that describe a degree of similarity between the target image and the template.
  • the computing device may calculate match scores for multiple templates and select the template with the highest match score. Alternatively, the computing device may select a template based on variance of the distance metric within multiple template images.
  • a good matching result at a location in the target image is where a majority of template pixels get a small distance error.
  • matched pixels should be linked as much as possible and have a small variance within an edge piece, so that the matched pixels better describe the shape in the target image. Therefore, the chamfer matching includes a mechanism of checking the variance of distance measures among template edge pixels.
  • the computing device identifies the edge pixels in the target image. For a given edge pixel, the computing device identifies adjacent pixels that are also edge pixels. The series of edge pixels for the chain.
  • a number n of pixels are identified among C vi , having m pixels as a window size, which have the smallest distance measure.
  • the value of m is defined by the number of pixels in the chain.
  • n distances are computed and the distance variance of v i is computed.
  • the computing device selects the n pixels with the smallest distance measure out of the m possible pixels in the window.
  • the distance variance of the n pixels of v i may be calculated from the distance metrics of multiple pixels in the template image.
  • the distance metric may be the sum of the square of the different between distance metrics of the multiple pixels in the template image and a median distance metric or a mean distance metric.
  • the parameters m and n affect the computation of the distance variance. A larger m will cause a higher variance where a lower n will cause a smaller variance.
  • the role of the parameter n is to exclude outliers from the distance variance computation.
  • the values of m and n used in experiments are 15 and 7 respectively. For computational efficiency reasons, each pixel maintains a list of m linked neighbors.
  • the distance variance measure of v i is denoted as ⁇ (v i ) and the distance measure is updated as shown by Equation 5.
  • d ⁇ ( v′ i ,u i ) d ( v′ i ,u i ) ⁇ (1+ ⁇ ( v i )) Eq. 5
  • Equation 5 The significance of multiplying the distance variance measure in Equation 5 lies in several aspects. First, the distance measure is no longer solely dependent on each pixel's individual distance error. The distance now considers the relationship between a pixel and its best matched neighbors. Consequently, each pixel and its connected neighbors may have a small distance. Furthermore, introducing variance in Equation 5 makes it much easier to separate well matched pixels from mismatched ones, since it is more likely that pixels with large error will get a large variance as well.
  • the traditional chamfer matching algorithm does not handle occlusions in the target. This is because the distance transform value changes and becomes unpredictable in parts which are occluded. Including the distance error in these occluded parts when optimizing the distance metric results in a shift from the true optimal location.
  • the challenge is to separate well matched pixels from pixels in occluded areas and retain only pixels with small distance for error computation.
  • edge pixels within occluded areas produce a large error whereas well matched edge pixels produce a small error.
  • a target image may be reliably matched with a template based on the occlusion free pixels while omitting the occluded pixels.
  • FIG. 6 illustrates an example of template matching omitting occluded pixels.
  • Section (a) illustrates a target image including multiple edge pixels. The target image includes high error pixels 158 and low error pixels 159 .
  • Section (b) illustrates the target image after the high error pixels 158 are removed from the analysis, which is shown by missing portions 160 in the outline of the target image. The remaining portions 161 correspond to the low error pixels and are used to match the target image to the template image.
  • Section (c) illustrates the results of matching using the error computations that omits high error pixels 158 , reducing the effects of occlusions, and second (d) illustrates the results with no distinction between the variance of the distance metric. Section (c) illustrates a more accurate result than section (d).
  • the extended chamfer matching approach was applied to perform registration between oblique aerial images and a building footprint vector model.
  • the footprint vector model was produced using a separate process and then simplified to create a coarse silhouette of ground truth building footprints.
  • Two data sets were used, where each contained 1000 building footprints selected from San Francisco and Chicago urban areas.
  • the building footprints were used as template images.
  • the corresponding area of the target aerial images was cropped are cropped for each data set from a map base.
  • the cropped size of buildings in aerial images was based on the template size which was increased to guarantee that the entire building was included in the cropped window.
  • the resolution of the aerial images is about 0.5 meters per pixel.
  • the images were preprocessed to remove noise. Further, small blobs were removed from the extracted edge images.
  • the ground truth matching locations were manually labeled.
  • the accuracy of the result is measured by computing the proportion of area between the algorithm's resulting location and the known ground truth location. Two kinds of accuracy measurement are used in the results. In the first measurement, the average coverage accuracy of the proposed algorithm is computed. In the second measurement, the number of buildings in the test set that are above given coverage accuracy is computed. In both measurements, the weight constant ⁇ is varied between 0 and 1. The result of the algorithm is compared with that of other chamfer matching techniques and provides a higher matching accuracy for both data sets. To further evaluate the performance of the proposed algorithm for matching incomplete target buildings in San Francisco and 74 buildings in Chicago. The best results of running this test are presented in Table 1. As can be observed, on average, the proposed algorithm produces 80% accuracy on both data sets compared with about 30-50% accuracy when using other chamfer matching techniques
  • FIG. 7 illustrates an exemplary computing device 101 .
  • the computing device 101 may be the server 125 or the mobile device 122 of the system of FIG. 1 .
  • the computing device includes a controller 100 , a memory 104 , an input device 103 , a communication interface 105 , and a display 111 . Additional, different, or fewer components may be included in the computing device 101 .
  • FIG. 8 illustrates an example flowchart for template matching using the computing device of FIG. 7 . One or more acts may be added, removed, or substituted in the flowchart. Acts may be repeated.
  • the computing device 101 may receive image data including one or more objects.
  • the image data may be a photograph such as an aerial photograph including buildings.
  • the image data may include objects as vehicles, roadways, road signs, lane markers, pedestrians, or road indicia.
  • the image data may include images in other fields such as hand signals used in sign language, animals, or types of trees.
  • the image data may include pixels having at least one pixel value.
  • the pixel value may be a numerical range (e.g., 0 to 255) that describe the intensity, color, or brightness of the pixel.
  • the computing device 101 applies an operation to at least a portion of the image data to create a gradient.
  • the portion of the image may include edge pixels.
  • the computing device 101 may identify changes in contrast within the image data as the location of edge pixels.
  • the computing device 101 detects images by applying a highpass filter to the image data.
  • the computing device 101 may convolve the image data with a kernel in the spatial domain.
  • the kernel may be a set of values applied in a neighborhood of the object pixel in the image data.
  • the kernel may tend to blur the edge pixels of objects in the target image, resulting in a gradient.
  • the computing device 101 may apply the operation to all of the image data.
  • the computing device 101 calculates a vector for the gradient.
  • the vector for the gradient may referent to a direction of the gradient or a direction perpendicular to the gradient.
  • the vector for the gradient may be perpendicular to a direction with a largest magnitude difference in the gradient extending from the object pixel in the direction of the gradient.
  • the direction of the gradient may be defined as the lowest pixel value or highest pixel value in a gradient window surrounding the object pixel.
  • the gradient window may be a square window having a length including an odd number of pixels greater than one (e.g., a 3 ⁇ 3 window gradient, a 5 ⁇ 5 window gradient or a 9 ⁇ 9 window gradient).
  • the computing device 101 calculates a distance metric from the vector for the gradient and a template.
  • the distance metric may be based on edge direction vectors of the target image defined as being orthogonal to the direction of the gradient and similar edge direction vectors of a template image.
  • the distance metric may be a function of an angle between the edge direction vectors of the target image and the edge direction vectors of the template image.
  • the computing device 101 matches a building model to the image data based on results of the distance metric.
  • the computing device 101 may calculate a variance of error values from the distance metric.
  • the computing device 101 identifies a chain of pixels extending from or including the object pixels. The distance metric for each of the pixels in the chain of pixels are analyzed statistically.
  • error values for the distance metric for the chain of pixels are placed in an ordered list.
  • An elbow, cliff, or other abrupt change along the error values is identified by calculating differences between sequential pairs of the ordered list and comparing the differences between sequential pairs to a threshold value.
  • the error values before the abrupt change are considered low error values and the error values after the abrupt change are considered high error values.
  • the computing device 101 may identify a portion of the chain of pixels in the ordered list before the threshold value is reached as accurate distance metric values, and accordingly, match the building model based on the to the image data as a function of the portion of the chain of pixels.
  • the computing device 101 may compare multiple templates to the image data.
  • the computing device 101 generates templates scores by summing distance metrics for a plurality of edge pixels in multiple templates.
  • the computing device 101 may select the highest template score for the template that matches the image data.
  • the input device 103 may be one or more buttons, keypad, keyboard, mouse, stylist pen, trackball, rocker switch, touch pad, voice recognition circuit, or other device or component for inputting data to the mobile device 122 .
  • the input device 103 and the display 111 may be combined as a touch screen, which may be capacitive or resistive.
  • the display 111 may be a liquid crystal display (LCD) panel, light emitting diode (LED) screen, thin film transistor screen, or another type of display.
  • the input device 103 may be configured to receive a user input to define a window size for the image processing, a neighborhood size for the distance metric, or another parameter.
  • the mobile device 122 may also include range finding device, range finding circuitry, position circuitry and/or a camera.
  • the positioning circuitry is optional and may be excluded for the map-related functions.
  • the positioning circuitry may include a Global Positioning System (GPS), Global Navigation Satellite System (GLONASS), or a cellular or similar position sensor for providing location data.
  • GPS Global Positioning System
  • GLONASS Global Navigation Satellite System
  • the positioning system may utilize GPS-type technology, a dead reckoning-type system, cellular location, or combinations of these or other systems.
  • the positioning circuitry may include suitable sensing devices that measure the traveling distance, speed, direction, and so on, of the mobile device 122 .
  • the positioning system may also include a receiver and correlation chip to obtain a GPS signal.
  • the one or more detectors or sensors may include an accelerometer built or embedded into or within the interior of the mobile device 122 .
  • the accelerometer is operable to detect, recognize, or measure the rate of change of translational and/or rotational movement of the mobile device 122 .
  • the mobile device 122 receives location data from the positioning system.
  • the location data indicates the location of the mobile device 122 .
  • the building map and/or the image data may be accessed based on the location data from the positioning system.
  • the database 123 of the system 120 may be a geographic database.
  • the geographic database 123 includes information about one or more geographic regions. Each road in the geographic region is composed of one or more road segments. A road segment represents a portion of the road.
  • the navigation-related features may include a route calculation application.
  • End users may access a route from an origin to a destination.
  • the route calculation application determines the route for the end user to travel along the road segments to reach the desired destination.
  • the route calculation application is provided with data identifying a starting location (origin) and a desired destination location.
  • the starting location may be the end user's current position and the destination may be entered by the end user.
  • the route calculation application determines one or more solution routes between the starting location and the destination location.
  • a solution route is formed of a series of connected road segments over which the end user can travel from the starting location to the destination location.
  • the route calculation application calculates a route
  • the application accesses the geographic database 123 and obtains data that represent road segments around and between the starting location and the destination location.
  • the road calculation application uses the data to determine at least one valid solution route from the starting location to the destination location.
  • the map-related features may be any of the navigation-related features provided to the user without reference to the current location of the user or the device.
  • map-related features may include display and manipulation of a map of a geographic region.
  • the map-related features may be provided without navigation-related features.
  • the controller 100 may include a general processor, digital signal processor, an application specific integrated circuit (ASIC), field programmable gate array (FPGA), analog circuit, digital circuit, combinations thereof, or other now known or later developed processor.
  • the controller 100 may be a single device or combinations of devices, such as associated with a network, distributed processing, or cloud computing.
  • the memory 104 may be a volatile memory or a non-volatile memory.
  • the memory 104 may include one or more of a read only memory (ROM), random access memory (RAM), a flash memory, an electronic erasable program read only memory (EEPROM), or other type of memory.
  • ROM read only memory
  • RAM random access memory
  • EEPROM electronic erasable program read only memory
  • the memory 104 may be removable from the mobile device 100 , such as a secure digital (SD) memory card.
  • SD secure digital
  • the communication interface 105 may include any operable connection.
  • An operable connection may be one in which signals, physical communications, and/or logical communications may be sent and/or received.
  • An operable connection may include a physical interface, an electrical interface, and/or a data interface.
  • the communication interface 105 provides for wireless and/or wired communications in any now known or later developed format.
  • the network 127 may include wired networks, wireless networks, or combinations thereof.
  • the wireless network may be a cellular telephone network, an 802.11, 802.16, 802.20, or WiMax network.
  • the network 127 may be a public network, such as the Internet, a private network, such as an intranet, or combinations thereof, and may utilize a variety of networking protocols now available or later developed including, but not limited to TCP/IP based networking protocols.
  • Non-transitory computer readable media may be encoded with instructions for performing any of the above acts or functions. While the non-transitory computer-readable medium may be a single medium, the term “computer-readable medium” includes a single medium or multiple media, such as a centralized or distributed database, and/or associated caches and servers that store one or more sets of instructions. The term “computer-readable medium” shall also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor or that cause a computer system to perform any one or more of the methods or operations disclosed herein.
  • the computer-readable medium can include a solid-state memory such as a memory card or other package that houses one or more non-volatile read-only memories. Further, the computer-readable medium can be a random access memory or other volatile re-writable memory. Additionally, the computer-readable medium can include a magneto-optical or optical medium, such as a disk or tapes or other storage device to capture carrier wave signals such as a signal communicated over a transmission medium. A digital file attachment to an e-mail or other self-contained information archive or set of archives may be considered a distribution medium that is a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a computer-readable medium or a distribution medium and other equivalents and successor media, in which data or instructions may be stored.
  • dedicated hardware implementations such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement one or more of the methods described herein.
  • Applications that may include the apparatus and systems of various embodiments can broadly include a variety of electronic and computer systems.
  • One or more embodiments described herein may implement functions using two or more specific interconnected hardware modules or devices with related control and data signals that can be communicated between and through the modules, or as portions of an application-specific integrated circuit. Accordingly, the present system encompasses software, firmware, and hardware implementations.
  • the methods described herein may be implemented by software programs executable by a computer system.
  • implementations can include distributed processing, component/object distributed processing, and parallel processing.
  • virtual computer system processing can be constructed to implement one or more of the methods or functionality as described herein.
  • a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program does not necessarily correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • circuitry refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
  • circuitry applies to all uses of this term in this application, including in any claims.
  • circuitry would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware.
  • circuitry would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and anyone or more processors of any kind of digital computer.
  • a processor receives instructions and data from a read only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
  • a computer also includes, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few.
  • Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • embodiments of the subject matter described in this specification can be implemented on a device having a display, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

A computing device performs matching between a target image and one or more template images. The computing device receives image data and performs an edge detection algorithm on the image data. The edge detection algorithm includes a distance metric based on angles between gradient vectors in the image data and gradient vectors in one or more templates. The computing device matches a building model to the image data based on results of the edge detection algorithm, wherein the building model is associated with the one or more templates.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This patent is related to and claims priority benefit of U.S. Provisional Application Ser. No. 61/841,627, filed Jul. 1, 2013, which is hereby incorporated by reference in its entirety.
FIELD
The following disclosure relates to image template matching, or more particularly, to a distance transform that provide accurate image template matching even when significant occlusion are present in the image data.
BACKGROUND
Shape matching or object recognition is a fundamental problem in computer vision and has been used in various applications ranging from information retrieval to object tracking. The measurements of similarities between templates and target objects have been studied extensively. Human beings naturally recognize thousands if not millions of shapes instantly with no effort. The lightening, scale, orientation, or viewing direction in which an object is viewed is easily reconciled by the human mind. Computers, on the other hand, do not easily interpret shapes when appearances have been modified. Computer vision focuses on the identification of these shapes.
One area of particular interest in computer vision is occlusions. An occlusion is a partially obstructed object in the image. Because a portion of the object is not viewable in the image, the object shape and other properties have changes. Occlusions occur when one object is positioned in front of another. Occlusions may also occur when lighting for an object changes. For example, shadows may change the appearance of an object. An image of an object taken midday may be matched easily to a template but images of the same object taken later in the day may be too occluded by shadows for accurate matching. One particular area of concern is object recognition in aerial photographs of buildings, which tend to cast far reaching shadows.
SUMMARY
A computing device performs matching between a target image and one or more template images. The computing device receives image data and performs an edge detection algorithm on the image data. The edge detection algorithm includes a distance metric based on angles between gradient vectors in the image data and gradient vectors in one or more templates. The computing device matches a building model to the image data based on results of the edge detection algorithm, wherein the building model is associated with the one or more templates.
BRIEF DESCRIPTION OF THE DRAWINGS
Exemplary embodiments of the present invention are described herein with reference to the following drawings.
FIG. 3 illustrates an example system for template matching.
FIG. 2 illustrates examples distance transforms for chamfer matching.
FIG. 3 illustrates an example of a distance metric with edge orientation.
FIG. 4 illustrates an example set of pixel values for a gradient vector.
FIG. 5 illustrates example unit vectors for the gradient vector.
FIG. 6 illustrates an example of template matching omitting occluded pixels.
FIG. 7 illustrates an exemplary computing device of the system of FIG. 1.
FIG. 8 illustrates an example flowchart for template matching using the computing device of FIG. 7.
DETAILED DESCRIPTION
Sketch and silhouette are primary visual cues that have been shown to be a dominant element in human perception and cognition. Matching objects through sketch and silhouette discards other valuable visual information such as intensity and texture. On the other hand, shape provides a compact form representation of the information which has been shown to be very useful for matching. Techniques for matching objects using sketch edges may include a metric based on the distance measurement between corresponding pixels is adopted to evaluate the similarity among edge images.
Depending on whether the correspondence is built by matching pixels or features, matching methods can be categorized into two groups: feature-dependent and feature-independent. The feature-dependent group solves the correspondence problem by minimizing distance of feature between corresponding pixels. The feature-independent group determines the distance in a more general way over all of pixels. Usually, feature-independent based methods are preferred, when speed is significantly considered, however, the performance of such methods may be affected when the scene comes more complicated and possesses more noise. Conversely, by defining high dimensional spatial features, feature dependent methods perform better and are more resistant to cluttered and occluded scenes. This increased accuracy results in increased computation cost. Chamfer matching methods, which are based on integrating edge orientation, are a preferred choice when speed and accuracy are concerned. However, these methods do not support partial matching and may suffer from large error when an object is occluded.
A chamfer matching algorithm includes a distance transform for quantifying the differences between two different images. The differences may be distances between edge points in the images. The following embodiments include a distance metric that facilitates template matching even with occlusions and a noisy environment.
FIG. 1 illustrates an example system 120 for template matching. The system 120 includes a developer system 121, a mobile device 122, a workstation 128, and a network 127. Additional, different, or fewer components may be provided. For example, many mobile devices 122 and/or workstations 128 connect with the network 127. The developer system 121 includes a server 125 and a database 123. The server 125 includes at least a processor, a communication interface, and a memory. Either or both of the mobile device 122 and the server 125 are configured to perform the methods described herein.
The mobile device 122 and the server 125 are configured to perform template matching to register or align image data with a building model. The image data may be collected by a satellite or another aerial vehicle (e.g., airplane, helicopter, low-altitude radio controlled device such as a quad-copter, or an unmanned aerial vehicle).
The mobile device 122 and the server 125 are configured to receive and analyze the image data. The analysis may include an edge detection algorithm with a distance transform (distance metric). The distance transform is calculated from a gradient area of the neighborhood of each pixel. The distance transform describes each pixel as a function of neighboring pixels. The edge detection algorithm may include a window that slides across the image data. The size of the window may be configured by a user.
The mobile device 122 and the server 125 are configured to analyze the image data to identify occluded areas based on the distance transform. The occluded areas may include structures in shadows (partially or completely) of other structures, shadows from trees, or shadows from clouds. Potentially occluded areas are distinguished from normal areas in the image data. The mobile device 122 and the server 125 are configured to perform a first edge detection algorithm on the occluded areas and a second edge detection algorithm on the normal areas.
The database 123 may store the building footprints and the image data. The database 123 may associate building footprints with geographic locations and/or locations in the image data for registering the image data with the building footprints.
The mobile device 122 is a smart phone, a mobile phone, a personal digital assistant (“PDA”), a tablet computer, a notebook computer, a personal navigation device (“PND”), a portable navigation device, in-car navigation system, and/or any other known or later developed portable or mobile device. The mobile device 122 includes one or more detectors or sensors as a positioning system built or embedded into or within the interior of the mobile device or vehicle 122. The mobile device 122 receives location data from the positioning system.
The developer system 121, the workstation 128, and the mobile device 122 are coupled with the network 127. The phrase “coupled with” is defined to mean directly connected to or indirectly connected through one or more intermediate components. Such intermediate components may include hardware and/or software-based components.
FIG. 2 illustrates examples distance transforms for chamfer matching. In chamfer matching, one image is a target image and the other image is a template. The target image and the template image may be binary images in which each pixel may have only two values (e.g., 1 or 0). The target image may be an image observed in the real world (e.g., an aerial image or other photograph), and the template may be one of multiple templates that may be potential matched with the target image.
In one example, the templates include building models. The building model may be a three-dimensional building model or a two-dimensional building model. The two-dimensional building model may be referred to as building footprints. The building model may be measured using a range finding device (e.g., a light detection and ranging (LIDAR) sensor) mounted on a ground vehicle or an aerial vehicle. The building mode may be created through measuring the locations of buildings manually. In the three-dimensional example, the building model may include outlines of buildings derived from a point cloud collected by the range finding device. In the two-dimensional example, the building model may include locations of the corners and/or edges of the buildings. The building model may be overlaid on a city map and stored in a map database.
The target image (U) may be formed by pixels ui such that U={ui} and the template image (V) may be formed by pixels vi such that V={vi}. The transformation (W) between the template image and the target image is Euclidian. Thus, the transformation (W) includes one or more of a rotation (R), and a translation (T). The translation may be performed by a translation vector include a translation distance in one or more directions. The rotation may be performed by a rotation matrix including an angle of rotation in one or more directions. The position of any pixel vi after the transformation including translation (T) and rotation (R) may be given by Equation 1.
W(v i ;[R|T])=R·v i +T≡v′ i  Eq. 3.
In chamfer matching, a match is made between the target image (V) and the template image (U). Given a distance metric d, vi's corresponding pixel in U is identified as the closest pixel of vi in U given transformation W(vi; [R|T]). The distance between U and V is then given by Equation 2:
D ( U , V ) = 1 V v i εV u i ε U m i n ( v i , u i )
When D is zero, the template image is perfectly matched to the target image. The optimal location of a chamfer matching is given by the parameters or rotation (R) and translation (T) obtained for the minimum D. Even with an ideal match, noise and inherent difference between the template image and the target image cause D to be greater than zero.
The computation of the distance metric d′ (v′i, ui) may involve complex calculations and significant computational resources. In one embodiment, the distance from any pixel in the target image to the closest edge pixel is computed ahead of time, stored in memory, and reused whenever V is placed to a new location under transformation.
This results in a computation of the distance transform. Given an edge image with edge pixels U, its corresponding distance transform image UDT is generated by looking for the distance of each pixel from the closest edge. Once the distance transform image is determined, the distance between U and V as described in Equation 3. UDT is the distance measure of location v′i in the transform image UDT. The objective of the distance measurement d(•) is to determine the closest pixel in the target image to a given pixel in the template. Several other distance metrics are possible.
D ( U , V ) = 1 V U DT ( v i ) Eq . 3
Referring back to FIG. 2, example objects are shown with corresponding distance transforms. The distance transforms are represented with grayscale intensity to show the relative distances from pixels to the nearest obstacle pixel (e.g., a boundary pixel). In section (a) a simple square 130 is the object, and the corresponding distance transform in section (b) includes regularly shaped regions 140 around the object.
The distance transform may be drastically affected by noise. In section (c) a few random noise pixels 131 are added in the vicinity of the square 130. Because the distance transform treats the noise pixels that are separated from the rest of the object in the image as boundary pixels, the regularly shaped regions 140 are disrupted as shown by irregular regions 141 in section (d). Similarly, the distance transform may be drastically affected by missing portions or occlusion of the object, as shown by incomplete square 134 in section (e). Because the incomplete square 134 is missing boundary pixels, the distance transform follows a similar irregular shape 144 shown in section (f). Matching algorithms have difficulty matching the distance transforms in sections (d) or (e) with the original square 130 even though casual observation could easily determine that the differences were simple noise or occlusions.
The matching problem is a fundamental problem in image processing and computer vision. A robust method should be resistant to noise and be able to deal with cases where matching targets are incomplete. The ordinary chamfer matching is very sensitive to noise and the result of matching will become unreliable if the target object is occluded. To address these shortcomings of the ordinary chamfer matching, three improvements are described. First the use of edge orientation in the distance metric is extended. Second, an edge distance variance is used to introduce connected components into the distance metric. Third, a method for matching incomplete targets is used.
Distance Metric with Edge Orientation
The computation of the distance transform image is very sensitive to noise in that even a few pixels can significantly change the contents of the chamfer distance field, as shown in FIG. 2. To eliminate impact from noise, the inverse of the image convolved by a Gaussian as a distance transform image may be used and to some extent reduces the effect of noise.
A computing device (e.g., mobile device 122 or server 125) analyzes a target image. The target image includes at least one shape comprising edge pixels. The computing device identifies the edge pixels. The edge pixels may be at specific locations within the target image. The edge pixels may identified by comparing pixel values of adjacent pixels. The pixel values may be color, brightness, hue, intensity, or another image property. The pixel values may be a numeric value for grayscale intensity. When the difference in pixel values from one pixel to the next exceeds a threshold gap, the computing device identifies the former pixel as an edge pixel.
The computing device applies a Gaussian function to the edge pixels. The computing device may convolve the edge pixels of the target image with a Gaussian function. The Gaussian function may be a Gaussian blurring function, a normal distribution, a probability density function of a normally distributed random variable. The Gaussian function creates a gradient region around the edge pixels. The gradient region includes multiple pixels of varying pixel intensities having a gradient direction from higher pixel values to lower pixel values, or vice versa, based on the arrangement of edge pixels and the Gaussian function.
The method described provides a more accurate combined distance feature which is less sensitive to noise. Edge orientation is effective when solving matching problem using chamfer matching. The computation of edge direction does not need to generate a linear representation of the edges. Instead, the edge direction is set to be perpendicular to the gradient vector of each pixel. The edge direction of the distance transform image can be computed by taking a vector that is perpendicular to the gradient vector. However for template image V which is a binary edge image, to get an edge direction, a Gaussian is first applied to the original image to create a gradient area close to edge pixels. This enables a gradient vector to be computed based on the resulting image.
FIG. 3 illustrates an example of a distance metric with edge orientation. An original edge image 150 depicts a human hand. The computing device applies (e.g., convolution) the Gaussian function to the original edge image 150 to generate the smoothed edge image 151. The computing device may analyze the edge pixel of the edge image 150 one and a time in succession. FIG. 3 illustrates the analysis of edge pixel 152. The computing device determines a gradient direction around the edge pixel 152. The computing device may consider a predetermined window around the edge pixel. The predetermined window may be a sized square or rectangle. The predetermined may be symmetric around the edge pixel 152, which is possible with windows sized at 3 pixels by 3 pixels, 5 pixels by 5 pixels, 7 pixels by 7 pixels and so on.
The gradient vector 153 may point from the edge pixel 152 to a pixel having the lowest value, or the highest value, within the predetermined window. Note that larger predetermined windows may lead to different directions for the gradient vector 153. The edge direction vector 154 is perpendicular to the gradient vector 153.
The edge direction vector 154 may be calculated by multiplying the gradient vector 154 by an inverting vector or by multiplying one component of the gradient vector 154 by −1. There may be two possible results for the edge direction vector 154. The computing device may select one of the two possible results based on a dihedral angle. The dihedral angle measures how much the direction of the edge are correlated to each other. Depending on the direction of the edge direction vector, there are two possible angels between 0 and 180 degrees. The computing device may select the smaller of the two angles.
FIG. 4 illustrates an example set of pixel values for a gradient vector. The set of pixels may be the pixels surrounding the edge pixel 152 of FIG. 3 as defined by window 156. In one example, the pixel values may range from 15 for black to 0 for white. The computing device may compare the pixel values in the window 156 to identify a minimum value or a maximum value. In the example of FIG. 4, the minimum value of “8” is associated with minimum pixel 157. The computing device calculates a gradient vector from the edge pixel 152 to the minimum pixel 157. FIG. 5 illustrates example unit vectors to each of the cells in the predetermined window. The unit vector to the minimum pixel 157 is [1, 1]. The computing device may calculate the edge direction vector (edge orientation) by changing the sign of one of the components of the unit vector (e.g., [−1, 1]).
The computing device may perform Equation 4 to calculate a distance function from a pixel vi in the target image to the nearest corresponding pixel ui in the template image. The edge direction vectors for the target image are θ(v′i) and nearest corresponding pixels in the template image are θ(ui). The inner product or dot product of the respective edge direction vectors provides an indication of the relative orientations of the edge direction vectors or the similarity of the edge direction vectors. The term λ is a constant weight, which determines the importance of edge orientations when normalizing the terms in Equation 4. The value for λ determines a weight to pixels with as a function of the dihedral angle for the edge direction vector. Higher values for λ or lower values for λ may be tested by the computing device to optimize the algorithm to be sensitive to noise but prevent small noise from perturbing edges. An example value for λ may range between 0 and 1 (e.g., 0.7). A squared Euclidean norm is used in the first term of Equations 4a and 4b. This gives a larger penalty to mismatched pixels.
d(v′ i ,u i)=λ∥v′ i −u i2+(1−λ)(1−|<θ(v′ i),θ(u i)>|)  Eq. 4a
d(v′ i ,u i)=λ∥v′ i −u i2+(1−λ)(1−[θ(v′ i)·θ(u i)])  Eq. 4b
In other words, the distance metric for any given edge pixel location in the target image depends on an angle between the edge direction vector of the pixel location, as determined by the gradient, and the edge direction vector of the nearest corresponding edge pixel location in the template image.
The computing device may sum the distance metric for multiple pixel locations. The multiple pixel locations may include all of the edge pixels in the target image or all pixels in the template image as identified by the computing device. The multiple pixel locations may be all pixels of the target image. The sum of the distance metric for the multiple pixel locations may be a match score that describe a degree of similarity between the target image and the template. The computing device may calculate match scores for multiple templates and select the template with the highest match score. Alternatively, the computing device may select a template based on variance of the distance metric within multiple template images.
Edge Distance Variance
A good matching result at a location in the target image is where a majority of template pixels get a small distance error. In addition, matched pixels should be linked as much as possible and have a small variance within an edge piece, so that the matched pixels better describe the shape in the target image. Therefore, the chamfer matching includes a mechanism of checking the variance of distance measures among template edge pixels.
Given a template edge pixel vi with a distance measure is computed as d(v′l, ui) on the target image, all the pixels connected to the template edge pixel are identified and used to create a chain: Cvi={vi+1, . . . , vv+m}. The computing device identifies the edge pixels in the target image. For a given edge pixel, the computing device identifies adjacent pixels that are also edge pixels. The series of edge pixels for the chain.
A number n of pixels are identified among Cvi, having m pixels as a window size, which have the smallest distance measure. The value of m is defined by the number of pixels in the chain. Using these n pixels with n being smaller than or equal to m, n distances are computed and the distance variance of vi is computed. The computing device selects the n pixels with the smallest distance measure out of the m possible pixels in the window. The distance variance of the n pixels of vi may be calculated from the distance metrics of multiple pixels in the template image. The distance metric may be the sum of the square of the different between distance metrics of the multiple pixels in the template image and a median distance metric or a mean distance metric.
The parameters m and n affect the computation of the distance variance. A larger m will cause a higher variance where a lower n will cause a smaller variance. The role of the parameter n is to exclude outliers from the distance variance computation. The values of m and n used in experiments are 15 and 7 respectively. For computational efficiency reasons, each pixel maintains a list of m linked neighbors. The distance variance measure of vi is denoted as φ(vi) and the distance measure is updated as shown by Equation 5.
d φ(v′ i ,u i)=d(v′ i ,u i)×(1+φ(v i))  Eq. 5
The significance of multiplying the distance variance measure in Equation 5 lies in several aspects. First, the distance measure is no longer solely dependent on each pixel's individual distance error. The distance now considers the relationship between a pixel and its best matched neighbors. Consequently, each pixel and its connected neighbors may have a small distance. Furthermore, introducing variance in Equation 5 makes it much easier to separate well matched pixels from mismatched ones, since it is more likely that pixels with large error will get a large variance as well.
Using variance a larger error change from well matched pixels as compared to high error pixels which indicates a much clearer boundary between matched and unmatched pixels. A clearer boundary is important when thresholding pixels with higher errors.
Matching Incomplete Targets
The traditional chamfer matching algorithm does not handle occlusions in the target. This is because the distance transform value changes and becomes unpredictable in parts which are occluded. Including the distance error in these occluded parts when optimizing the distance metric results in a shift from the true optimal location.
Therefore, the challenge is to separate well matched pixels from pixels in occluded areas and retain only pixels with small distance for error computation. In most cases, edge pixels within occluded areas produce a large error whereas well matched edge pixels produce a small error. Thus, a target image may be reliably matched with a template based on the occlusion free pixels while omitting the occluded pixels. By using the proposed distance measure of Equation 5, the computing device generates a much more pronounced boundary between low error and high error regions, thus making it easier to separate them.
Given dφ(v′i, ui), where v′i=W(vi; [R|T]), and I=1, . . . , |V|, pixels are divided according to their error. This is, in essence, a clustering problem with two clusters. Applying clustering algorithms may be too computational expensive for fast matching. Even looking for an ‘elbow corner’ of sorted errors will end up with time complexity of O(n log n) which is still too computational expensive for large area searching. A faster alternative is to use a histogram H whose bins bjεH are evenly separated in [0; max(dφ)]. Pixels can be assigned to each bin according to their error values. Given this histogram and an initial set S={0} in the beginning, the goal is to add to it as many low error pixels as possible. Similar to searching for an ‘elbow corner’ in a sorted array, a search for steep error change can be conducted on this histogram with linear time complexity. Finding an ‘elbow corner’ in the histogram may include too few pixels thus ailing to describe the target while generating a small error. Hence, bottom line acceptance criteria are set using the proportion of target pixels getting matched (denoted by p). An error tolerance threshold is set using Equations 4 and 5 so that when p is met any pixel in the remaining histogram whose error are smaller than ξ can still be added to the set S. Once the clustering is done, the matching error is computed using all of the pixel errors collected in S. A summary of this approach is given in the following algorithm, where p represents the index of the histogram bin obtained by applying the acceptance proportion to H.
COMPUTEERROR(H, S, p, ξ)
S = ∅
for j ← 1 to p
 do {S = S∪{vi′ : vi′ ∈ hj}
for j ← p to sizeof(H)
do { if h j ξ then S = S { v i : v i h j }
e = 1 S v i S d φ ( v i , u i )
return (e)
FIG. 6 illustrates an example of template matching omitting occluded pixels. Section (a) illustrates a target image including multiple edge pixels. The target image includes high error pixels 158 and low error pixels 159. Section (b) illustrates the target image after the high error pixels 158 are removed from the analysis, which is shown by missing portions 160 in the outline of the target image. The remaining portions 161 correspond to the low error pixels and are used to match the target image to the template image. Section (c) illustrates the results of matching using the error computations that omits high error pixels 158, reducing the effects of occlusions, and second (d) illustrates the results with no distinction between the variance of the distance metric. Section (c) illustrates a more accurate result than section (d).
The extended chamfer matching approach was applied to perform registration between oblique aerial images and a building footprint vector model. The footprint vector model was produced using a separate process and then simplified to create a coarse silhouette of ground truth building footprints. Two data sets were used, where each contained 1000 building footprints selected from San Francisco and Chicago urban areas. The building footprints were used as template images. The corresponding area of the target aerial images was cropped are cropped for each data set from a map base. The cropped size of buildings in aerial images was based on the template size which was increased to guarantee that the entire building was included in the cropped window. In both data sets, the resolution of the aerial images is about 0.5 meters per pixel. The images were preprocessed to remove noise. Further, small blobs were removed from the extracted edge images.
Due to the nature of the data sets containing high rise buildings in a downtown area, many of the buildings are covered by shadow from nearby buildings. In the example, 53 buildings were covered by shadow in the San Francisco data set, and 74 buildings were covered by shadow in the Chicago data set. The registration algorithm is based on shifting a sliding template window within the building image. In the experiments, the following parameters are set: acceptance proportion p=50%, tolerance threshold is computed using ξ. Since the distance metric is a combination of a Euclidean distance and an angular distance, hence, tolerance threshold is computed as a combination of a Euclidean distance threshold te=5 pixels and an angular distance threshold ta=15 degrees. Different settings of λ were tested. To evaluate the results, the ground truth matching locations were manually labeled. The accuracy of the result is measured by computing the proportion of area between the algorithm's resulting location and the known ground truth location. Two kinds of accuracy measurement are used in the results. In the first measurement, the average coverage accuracy of the proposed algorithm is computed. In the second measurement, the number of buildings in the test set that are above given coverage accuracy is computed. In both measurements, the weight constant λ is varied between 0 and 1. The result of the algorithm is compared with that of other chamfer matching techniques and provides a higher matching accuracy for both data sets. To further evaluate the performance of the proposed algorithm for matching incomplete target buildings in San Francisco and 74 buildings in Chicago. The best results of running this test are presented in Table 1. As can be observed, on average, the proposed algorithm produces 80% accuracy on both data sets compared with about 30-50% accuracy when using other chamfer matching techniques
FIG. 7 illustrates an exemplary computing device 101. The computing device 101 may be the server 125 or the mobile device 122 of the system of FIG. 1. The computing device includes a controller 100, a memory 104, an input device 103, a communication interface 105, and a display 111. Additional, different, or fewer components may be included in the computing device 101. FIG. 8 illustrates an example flowchart for template matching using the computing device of FIG. 7. One or more acts may be added, removed, or substituted in the flowchart. Acts may be repeated.
At act S101, the computing device 101 may receive image data including one or more objects. The image data may be a photograph such as an aerial photograph including buildings. The image data may include objects as vehicles, roadways, road signs, lane markers, pedestrians, or road indicia. The image data may include images in other fields such as hand signals used in sign language, animals, or types of trees. The image data may include pixels having at least one pixel value. The pixel value may be a numerical range (e.g., 0 to 255) that describe the intensity, color, or brightness of the pixel.
At act S103, the computing device 101 applies an operation to at least a portion of the image data to create a gradient. The portion of the image may include edge pixels. The computing device 101 may identify changes in contrast within the image data as the location of edge pixels. The computing device 101 detects images by applying a highpass filter to the image data. The computing device 101 may convolve the image data with a kernel in the spatial domain. The kernel may be a set of values applied in a neighborhood of the object pixel in the image data. The kernel may tend to blur the edge pixels of objects in the target image, resulting in a gradient. In another embodiment, the computing device 101 may apply the operation to all of the image data.
At act S105, the computing device 101 calculates a vector for the gradient. The vector for the gradient may referent to a direction of the gradient or a direction perpendicular to the gradient. The vector for the gradient may be perpendicular to a direction with a largest magnitude difference in the gradient extending from the object pixel in the direction of the gradient. The direction of the gradient may be defined as the lowest pixel value or highest pixel value in a gradient window surrounding the object pixel. The gradient window may be a square window having a length including an odd number of pixels greater than one (e.g., a 3×3 window gradient, a 5×5 window gradient or a 9×9 window gradient).
At act S107, the computing device 101 calculates a distance metric from the vector for the gradient and a template. The distance metric may be based on edge direction vectors of the target image defined as being orthogonal to the direction of the gradient and similar edge direction vectors of a template image. The distance metric may be a function of an angle between the edge direction vectors of the target image and the edge direction vectors of the template image.
At act S109, the computing device 101 matches a building model to the image data based on results of the distance metric. The computing device 101 may calculate a variance of error values from the distance metric. First, the computing device 101 identifies a chain of pixels extending from or including the object pixels. The distance metric for each of the pixels in the chain of pixels are analyzed statistically.
In one example, error values for the distance metric for the chain of pixels are placed in an ordered list. An elbow, cliff, or other abrupt change along the error values is identified by calculating differences between sequential pairs of the ordered list and comparing the differences between sequential pairs to a threshold value. The error values before the abrupt change are considered low error values and the error values after the abrupt change are considered high error values. The computing device 101 may identify a portion of the chain of pixels in the ordered list before the threshold value is reached as accurate distance metric values, and accordingly, match the building model based on the to the image data as a function of the portion of the chain of pixels.
The computing device 101 may compare multiple templates to the image data. In one example, the computing device 101 generates templates scores by summing distance metrics for a plurality of edge pixels in multiple templates. The computing device 101 may select the highest template score for the template that matches the image data.
The input device 103 may be one or more buttons, keypad, keyboard, mouse, stylist pen, trackball, rocker switch, touch pad, voice recognition circuit, or other device or component for inputting data to the mobile device 122. The input device 103 and the display 111 may be combined as a touch screen, which may be capacitive or resistive. The display 111 may be a liquid crystal display (LCD) panel, light emitting diode (LED) screen, thin film transistor screen, or another type of display. The input device 103 may be configured to receive a user input to define a window size for the image processing, a neighborhood size for the distance metric, or another parameter. The mobile device 122 may also include range finding device, range finding circuitry, position circuitry and/or a camera.
The positioning circuitry is optional and may be excluded for the map-related functions. The positioning circuitry may include a Global Positioning System (GPS), Global Navigation Satellite System (GLONASS), or a cellular or similar position sensor for providing location data. The positioning system may utilize GPS-type technology, a dead reckoning-type system, cellular location, or combinations of these or other systems. The positioning circuitry may include suitable sensing devices that measure the traveling distance, speed, direction, and so on, of the mobile device 122. The positioning system may also include a receiver and correlation chip to obtain a GPS signal. Alternatively or additionally, the one or more detectors or sensors may include an accelerometer built or embedded into or within the interior of the mobile device 122. The accelerometer is operable to detect, recognize, or measure the rate of change of translational and/or rotational movement of the mobile device 122. The mobile device 122 receives location data from the positioning system. The location data indicates the location of the mobile device 122. The building map and/or the image data may be accessed based on the location data from the positioning system.
The database 123 of the system 120 may be a geographic database. The geographic database 123 includes information about one or more geographic regions. Each road in the geographic region is composed of one or more road segments. A road segment represents a portion of the road.
The navigation-related features may include a route calculation application. End users may access a route from an origin to a destination. The route calculation application determines the route for the end user to travel along the road segments to reach the desired destination. In order to calculate a route, the route calculation application is provided with data identifying a starting location (origin) and a desired destination location. In one embodiment, the starting location may be the end user's current position and the destination may be entered by the end user. Given at least the identification of the starting location (origin) and the desired destination location, the route calculation application determines one or more solution routes between the starting location and the destination location. A solution route is formed of a series of connected road segments over which the end user can travel from the starting location to the destination location. When the route calculation application calculates a route, the application accesses the geographic database 123 and obtains data that represent road segments around and between the starting location and the destination location. The road calculation application uses the data to determine at least one valid solution route from the starting location to the destination location.
The map-related features may be any of the navigation-related features provided to the user without reference to the current location of the user or the device. In addition, map-related features may include display and manipulation of a map of a geographic region. The map-related features may be provided without navigation-related features.
The controller 100 may include a general processor, digital signal processor, an application specific integrated circuit (ASIC), field programmable gate array (FPGA), analog circuit, digital circuit, combinations thereof, or other now known or later developed processor. The controller 100 may be a single device or combinations of devices, such as associated with a network, distributed processing, or cloud computing.
The memory 104 may be a volatile memory or a non-volatile memory. The memory 104 may include one or more of a read only memory (ROM), random access memory (RAM), a flash memory, an electronic erasable program read only memory (EEPROM), or other type of memory. The memory 104 may be removable from the mobile device 100, such as a secure digital (SD) memory card.
The communication interface 105 may include any operable connection. An operable connection may be one in which signals, physical communications, and/or logical communications may be sent and/or received. An operable connection may include a physical interface, an electrical interface, and/or a data interface. The communication interface 105 provides for wireless and/or wired communications in any now known or later developed format.
The network 127 may include wired networks, wireless networks, or combinations thereof. The wireless network may be a cellular telephone network, an 802.11, 802.16, 802.20, or WiMax network. Further, the network 127 may be a public network, such as the Internet, a private network, such as an intranet, or combinations thereof, and may utilize a variety of networking protocols now available or later developed including, but not limited to TCP/IP based networking protocols.
Non-transitory computer readable media may be encoded with instructions for performing any of the above acts or functions. While the non-transitory computer-readable medium may be a single medium, the term “computer-readable medium” includes a single medium or multiple media, such as a centralized or distributed database, and/or associated caches and servers that store one or more sets of instructions. The term “computer-readable medium” shall also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor or that cause a computer system to perform any one or more of the methods or operations disclosed herein.
In a particular non-limiting, exemplary embodiment, the computer-readable medium can include a solid-state memory such as a memory card or other package that houses one or more non-volatile read-only memories. Further, the computer-readable medium can be a random access memory or other volatile re-writable memory. Additionally, the computer-readable medium can include a magneto-optical or optical medium, such as a disk or tapes or other storage device to capture carrier wave signals such as a signal communicated over a transmission medium. A digital file attachment to an e-mail or other self-contained information archive or set of archives may be considered a distribution medium that is a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a computer-readable medium or a distribution medium and other equivalents and successor media, in which data or instructions may be stored.
In an alternative embodiment, dedicated hardware implementations, such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement one or more of the methods described herein. Applications that may include the apparatus and systems of various embodiments can broadly include a variety of electronic and computer systems. One or more embodiments described herein may implement functions using two or more specific interconnected hardware modules or devices with related control and data signals that can be communicated between and through the modules, or as portions of an application-specific integrated circuit. Accordingly, the present system encompasses software, firmware, and hardware implementations.
In accordance with various embodiments of the present disclosure, the methods described herein may be implemented by software programs executable by a computer system. Further, in an exemplary, non-limited embodiment, implementations can include distributed processing, component/object distributed processing, and parallel processing. Alternatively, virtual computer system processing can be constructed to implement one or more of the methods or functionality as described herein.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
As used in this application, the term ‘circuitry’ or ‘circuit’ refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
This definition of ‘circuitry’ applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term “circuitry” would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term “circuitry” would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and anyone or more processors of any kind of digital computer. Generally, a processor receives instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer also includes, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a device having a display, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The illustrations of the embodiments described herein are intended to provide a general understanding of the structure of the various embodiments. The illustrations are not intended to serve as a complete description of all of the elements and features of apparatus and systems that utilize the structures or methods described herein. Many other embodiments may be apparent to those of skill in the art upon reviewing the disclosure. Other embodiments may be utilized and derived from the disclosure, such that structural and logical substitutions and changes may be made without departing from the scope of the disclosure. Additionally, the illustrations are merely representational and may not be drawn to scale. Certain proportions within the illustrations may be exaggerated, while other proportions may be minimized. Accordingly, the disclosure and the figures are to be regarded as illustrative rather than restrictive.

Claims (20)

We claim:
1. A method comprising:
receiving image data;
performing an edge detection algorithm on the image data, wherein the edge detection algorithm includes a distance metric based on angles between gradient vectors in the image data and gradient vectors in one or more templates;
identify a plurality of pixels from the distance metric;
calculate error values from the distance metric for the plurality of pixels; and
matching a building model to the image data based on results of the edge detection algorithm and the error values from the distance metric, wherein the building model is associated with the one or more templates.
2. The method of claim 1, wherein the image data is aerial data.
3. The method of claim 1, wherein the distance metric is a function of variance.
4. The method of claim 1, further comprising:
applying a gaussian algorithm to the image data;
analyzing a gradient area around each of a plurality of pixels;
calculating the gradient vector from the gradient area for each pixel; and
comparing the gradient vectors.
5. The method of claim 4, wherein the distance metric is based on a gradient vector with a largest magnitude.
6. An apparatus comprising:
at least one processor; and
at least one memory including computer program code for one or more programs; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least perform:
receive image data;
apply an operation to at least a portion of the image data to create a gradient;
calculate a vector for the gradient;
calculate a distance metric from the vector for the gradient and a template;
identify a chain of pixels from the distance metric;
calculate a variance of error values from the distance metric for the chain of pixels; and
match a building model to the image data based on a result of the distance metric and the variance of error values.
7. The apparatus of claim 6, wherein the vector for the gradient is perpendicular to a direction with a largest magnitude difference in the gradient.
8. The apparatus of claim 6, wherein the gradient is a square with a predetermined number of pixels.
9. The apparatus of claim 6, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to at least perform:
determine an ordered list of the error values from the distance metric for the chain of pixels;
calculate differences between sequential pairs of the ordered list; and
compare the differences between sequential pairs to a threshold value.
10. The apparatus of claim 9, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to at least perform:
identify a portion of the chain of pixels in the ordered list before the threshold value is reached, wherein the building model is matched to the image data as a function of the portion of the chain of pixels.
11. The apparatus of claim 6, wherein the portion of the image data includes an edge pixel, and the distance metric for the edge pixel depends on an angle between the vector for the gradient of the pixel and an edge direction vector of a corresponding edge pixel location in the template.
12. The apparatus of claim 6, wherein the distance metric (d) is defined according to

d=λ∥v i −u i2+(1−λ)(1−[θ(v i)·θ(u i)]),
wherein λ is a constant weighting factor, vi, is a pixel in a target image, ui is a corresponding pixel in the template, an edge direction vector for the target is θ(vi), and an edge direction vector for the template is θ(ui).
13. The apparatus of claim 6, wherein the image data is an aerial photograph of one or more buildings.
14. The apparatus of claim 6, wherein the operation is a convolution with a Gaussian function.
15. The apparatus of claim 6, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to at least perform:
generate template scores by summing distance metrics for a plurality of edge pixels in multiple templates; and
selecting a highest template score, wherein the highest template score is associated with the match building model.
16. A method comprising:
receiving aerial image data including one or more buildings;
identifying chains of edge pixels in the aerial image data, wherein the chains of edge pixels partially form outlines of the one or more buildings;
applying an operation to at least a portion of the image data to create a gradient;
calculating a vector for the gradient;
calculating a distance metric from the vector for the gradient and a template;
calculating a variance of error values from the distance metric for the chains of pixels; and
matching a building model to the outlines of the one or more buildings based on a result of the distance metric and the variance of error values.
17. The method of claim 16, wherein the operation is a convolution with a Gaussian function.
18. The method of claim 16, wherein the distance metric depends on an angle between the vector for the gradient of the pixel and an edge direction vector of a corresponding edge pixel location in the template.
19. The method of claim 1, further comprising:
determining an ordered list of the error values from the distance metric for the plurality of pixels;
calculating differences between sequential pairs of the ordered list; and
comparing the differences between sequential pairs to a threshold value.
20. An apparatus comprising:
at least one processor; and
at least one memory including computer program code for one or more programs; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least perform:
receive image data;
apply an operation to at least a portion of the image data to create a gradient;
calculate a vector for the gradient;
calculate a distance metric from the vector for the gradient and a template;
identify a chain of pixels from the distance metric;
determine an ordered list of error values from the distance metric for the chain of pixels;
calculate differences between sequential pairs of the ordered list;
perform a comparison of the differences between sequential pairs to a threshold value; and
match a building model to the image data based on a result of the distance metric and the comparison of the differences between sequential pairs to the threshold value.
US14/319,814 2013-07-01 2014-06-30 Occlusion resistant image template matching using distance transform Expired - Fee Related US9349189B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/319,814 US9349189B2 (en) 2013-07-01 2014-06-30 Occlusion resistant image template matching using distance transform

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361841627P 2013-07-01 2013-07-01
US14/319,814 US9349189B2 (en) 2013-07-01 2014-06-30 Occlusion resistant image template matching using distance transform

Publications (2)

Publication Number Publication Date
US20150003741A1 US20150003741A1 (en) 2015-01-01
US9349189B2 true US9349189B2 (en) 2016-05-24

Family

ID=52115663

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/319,814 Expired - Fee Related US9349189B2 (en) 2013-07-01 2014-06-30 Occlusion resistant image template matching using distance transform

Country Status (1)

Country Link
US (1) US9349189B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10210411B2 (en) 2017-04-24 2019-02-19 Here Global B.V. Method and apparatus for establishing feature prediction accuracy
US10776951B2 (en) 2017-08-10 2020-09-15 Here Global B.V. Method, apparatus, and system for an asymmetric evaluation of polygon similarity

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USD1031029S1 (en) 2003-11-25 2024-06-11 Bayer Healthcare Llc Syringe plunger
US9262583B2 (en) * 2013-03-29 2016-02-16 Case Western Reserve University Image similarity-based finite element model registration
US10378949B2 (en) 2014-06-10 2019-08-13 Bayer Healthcare Llc Syringe with indicator float
US9830532B1 (en) * 2014-08-22 2017-11-28 Matrox Electronic Systems Ltd. Object detection in images using distance maps
CN112999464B (en) 2015-08-28 2023-04-18 拜耳医药保健有限公司 Systems and methods for syringe fluid fill verification and image recognition of power injector system features
US10080539B2 (en) * 2015-11-25 2018-09-25 General Electric Company Systems and methods for X-ray image pasting
CN106097321B (en) * 2016-06-06 2018-12-11 哈尔滨工业大学 A kind of polarization high spectrum image object detection method based on tensor representation
US9779545B1 (en) 2016-06-30 2017-10-03 Microsoft Technology Licensing, Llc Footprint based business label placement
US10532166B2 (en) 2016-07-08 2020-01-14 Bayer Heatlhcare Llc System and method for identifying a fill volume of a fluid chamber
JP6439757B2 (en) * 2016-07-08 2018-12-19 オムロン株式会社 Image processing apparatus and image processing method
DE102016124594A1 (en) * 2016-12-16 2018-06-21 Jena-Optronik Gmbh Method for detecting a 3D scene using a LIDAR system and LIDAR system for this purpose
CN108986151B (en) * 2017-05-31 2021-12-03 华为技术有限公司 Multi-target tracking processing method and equipment
CN107452020B (en) * 2017-08-04 2021-04-06 河北汉光重工有限责任公司 Anti-occlusion tracking method for adaptive template matching
EP3451203A1 (en) * 2017-08-30 2019-03-06 Dassault Systèmes Computer-implemented method for computing an envelope for a building complying with shadow duration requirements
CN108182383B (en) * 2017-12-07 2021-07-20 浙江大华技术股份有限公司 Vehicle window detection method and device
JP6901386B2 (en) * 2017-12-08 2021-07-14 株式会社東芝 Gradient Estimator, Gradient Estimator, Program and Control System
US11164031B2 (en) 2018-06-26 2021-11-02 Here Global B.V. System and method for analyzing an image of a vehicle
US10762681B2 (en) * 2018-06-26 2020-09-01 Here Global B.V. Map generation system and method for generating an accurate building shadow
US10983215B2 (en) * 2018-12-19 2021-04-20 Fca Us Llc Tracking objects in LIDAR point clouds with enhanced template matching
CN110516531B (en) * 2019-07-11 2023-04-11 广东工业大学 Identification method of dangerous goods mark based on template matching
CN113761991B (en) * 2020-06-12 2024-12-10 北京沃东天骏信息技术有限公司 Watermark recognition method and device and computer readable storage medium
CN111931786B (en) * 2020-06-23 2022-02-01 联宝(合肥)电子科技有限公司 Image processing method and device and computer readable storage medium
CN111984814B (en) * 2020-08-10 2024-04-12 广联达科技股份有限公司 Stirrup matching method and device in building drawing
CN112199542B (en) * 2020-09-02 2024-03-05 广州酷车信息科技有限公司 Vehicle image filtering method, system, device and medium based on target detection
CN112750116B (en) * 2021-01-15 2023-08-11 北京市商汤科技开发有限公司 Defect detection method, device, computer equipment and storage medium
CN115019069B (en) * 2021-03-04 2025-05-30 株式会社理光 Template matching method, template matching device and storage medium
US11941863B2 (en) * 2021-08-04 2024-03-26 Datalogic Ip Tech S.R.L. Imaging system and method using a multi-layer model approach to provide robust object detection
CN114331951B (en) * 2021-09-30 2024-09-24 腾讯科技(深圳)有限公司 Image detection method, image detection device, computer, readable storage medium, and program product
CN113793365B (en) * 2021-11-17 2022-04-29 第六镜科技(成都)有限公司 Target tracking method and device, computer equipment and readable storage medium
CN114187498A (en) * 2021-12-08 2022-03-15 上海商汤智能科技有限公司 Occlusion detection method and device, electronic device and storage medium
CN114187267B (en) * 2021-12-13 2023-07-21 沭阳县苏鑫冲压件有限公司 Stamping part defect detection method based on machine vision
CN116958598A (en) * 2022-04-12 2023-10-27 鸿海精密工业股份有限公司 Image comparison method, device, electronic equipment and computer readable storage medium
CN116664872A (en) * 2023-07-26 2023-08-29 成都实时技术股份有限公司 Embedded image recognition method, medium and system based on edge calculation
CN117315454B (en) * 2023-11-29 2024-03-12 河北中瀚水务有限公司 Evaluation method, device and system for flocculation reaction process

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4906940A (en) * 1987-08-24 1990-03-06 Science Applications International Corporation Process and apparatus for the automatic detection and extraction of features in images and displays
US20030223615A1 (en) * 2002-06-04 2003-12-04 Keaton Patricia A. Digital image edge detection and road network tracking method and system
US20090252373A1 (en) * 2007-11-20 2009-10-08 Paglieroni David W Method and System for detecting polygon Boundaries of structures in images as particle tracks through fields of corners and pixel gradients
US20100194679A1 (en) * 2009-02-02 2010-08-05 Industrial Technology Research Institute Gesture recognition system and method thereof
US20100329508A1 (en) * 2009-06-24 2010-12-30 Xin Chen Detecting Ground Geographic Features in Images Based on Invariant Components
US20120155745A1 (en) * 2010-12-16 2012-06-21 Electronics And Telecommunications Research Institute Apparatus and method for extracting correspondences between aerial images
US20120170804A1 (en) * 2010-12-31 2012-07-05 Industrial Technology Research Institute Method and apparatus for tracking target object
US8311285B2 (en) * 2009-06-30 2012-11-13 Mitsubishi Electric Research Laboratories, Inc. Method and system for localizing in urban environments from omni-direction skyline images
US20130096884A1 (en) * 2010-08-19 2013-04-18 Bae Systems Plc Sensor data processing
US20140340489A1 (en) * 2013-05-14 2014-11-20 University Of Southern California Online coupled camera pose estimation and dense reconstruction from video

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4906940A (en) * 1987-08-24 1990-03-06 Science Applications International Corporation Process and apparatus for the automatic detection and extraction of features in images and displays
US20030223615A1 (en) * 2002-06-04 2003-12-04 Keaton Patricia A. Digital image edge detection and road network tracking method and system
US20090252373A1 (en) * 2007-11-20 2009-10-08 Paglieroni David W Method and System for detecting polygon Boundaries of structures in images as particle tracks through fields of corners and pixel gradients
US20100194679A1 (en) * 2009-02-02 2010-08-05 Industrial Technology Research Institute Gesture recognition system and method thereof
US20100329508A1 (en) * 2009-06-24 2010-12-30 Xin Chen Detecting Ground Geographic Features in Images Based on Invariant Components
US8311285B2 (en) * 2009-06-30 2012-11-13 Mitsubishi Electric Research Laboratories, Inc. Method and system for localizing in urban environments from omni-direction skyline images
US20130096884A1 (en) * 2010-08-19 2013-04-18 Bae Systems Plc Sensor data processing
US20120155745A1 (en) * 2010-12-16 2012-06-21 Electronics And Telecommunications Research Institute Apparatus and method for extracting correspondences between aerial images
US20120170804A1 (en) * 2010-12-31 2012-07-05 Industrial Technology Research Institute Method and apparatus for tracking target object
US20140340489A1 (en) * 2013-05-14 2014-11-20 University Of Southern California Online coupled camera pose estimation and dense reconstruction from video

Non-Patent Citations (22)

* Cited by examiner, † Cited by third party
Title
Barrow et al., Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching, 1977, pp. 659-663, Proceedings of the 5th International Joint Conference on Artificial Intelligence.
Belongie et al., Shape Matching and Object Recognition Using Shape Contexts, Apr. 2002, pp. 509-522, Vo. 2, No. 24, IEEE Transactions on Pattern Analysis and Machine Intelligence.
Berg et al., Shape Matching and Object Recognition Using Low Distortion Correspondences, 2005, pp. 26-33, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
Borgefors, Distance Transformations in Digital Images, 1986, pp. 344-371, Computer Vision, Graphics, and Image Processing.
Borgefors, Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm, Nov. 1988, pp. 849-865, vol. 10, No. 6, IEEE Transactions on Pattern Analysis and Machine Intelligence.
Daniel Mohr et al., Continuous Edge Gradient-Based Template Matching for Articulated Objects, IfI Technical Report Series, Department of Informatics, Clausthal University of Technology, International Conference on Computer Vision Theory and Applications (VISAPP), Feb. 5, 2009. *
Danielsson, Euclidean Distance Mapping, 1980, pp. 227-248, Computer Graphics and Image Processing.
Felzenszwalb et al., Distance Transforms of Sampled Functions, Technical Report, 2004, Cornell Computing and Information Science.
Gavrila et al., Real-Time Object Detection for "Smart" Vehicles, 1999, pp. 87-93, Proceedings of the Seventh IEEE International Conference on Computer Vision.
Gavrila, Multi-Feature Hierarchical Template Matching Using Distance Transforms, 1998, pp. 439-444, Proceedings of Fourteenth International Conference on Pattern Recognition.
Goshtasby, 2-D and 3-D Image Registration, 2005, Wiley Interscience.
Hoffman et al., Salience of Visual Parts, 1997, pp. 29-78, vol. 63, Cognition.
Li et al., Real-time template matching based on gradient direction vector product, IEEE, 2011. *
Liu et al., Fast Directional Chamfer Matching, Jul. 2010, pp. 1696-1703, Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
Montanari, A Method for Obtaining Skeletons Using a Quasi-Euclidean Distance, Oct. 1968, pp. 600-624, vol. 15, No. 4, Journal of the Association for Computing Machinery.
Rosenfeld et al., Sequential Operations in Digital Picture Processing, Oct. 1966, pp. 471-494, vol. 13, No. 4, Journal of the ACM.
Rucklidge, Efficiently Locating Objects Using the Hausdorff Distance, Aug. 1996, International Journal of Computer Vision.
Shotton et al., Multi-Scale Categorical Object Recognition Using Contour Fragments, 2008, IEEE Transactions of Pattern Analysis and Machine Intelligence.
Thayananthan et al., Shape Context and Chamfer Matching in Cluttered Scenes, 2003, pp. 127-133, Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
You et al., Hierarchical Image Matching: A Chamfer Matching Algorithm Using Interesting Points, 1995, pp. 70-75, Proceedings of the Third Australian and New Zealand Conference on Intelligent Information Systems.
Zhang et al., A Learning-Based Approach for Automated Quality Assessment of Computer-Rendered Images, 2012, pp. 68-75, Proceedings of SPIE 8293, Image Quality and System Performance IX.
Zhang et al., Efficient Edge Matching in Cluttered Scenes, 2009, pp. 1645-1648, IEEE International Symposium on Circuits and Systems.

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10210411B2 (en) 2017-04-24 2019-02-19 Here Global B.V. Method and apparatus for establishing feature prediction accuracy
US10776951B2 (en) 2017-08-10 2020-09-15 Here Global B.V. Method, apparatus, and system for an asymmetric evaluation of polygon similarity

Also Published As

Publication number Publication date
US20150003741A1 (en) 2015-01-01

Similar Documents

Publication Publication Date Title
US9349189B2 (en) Occlusion resistant image template matching using distance transform
US9760996B2 (en) Non-rigid registration for large-scale space-time 3D point cloud alignment
US20200401617A1 (en) Visual positioning system
US9710714B2 (en) Fusion of RGB images and LiDAR data for lane classification
Lehtomäki et al. Object classification and recognition from mobile laser scanning point clouds in a road environment
US20200364509A1 (en) System and method for training a neural network for visual localization based upon learning objects-of-interest dense match regression
US9858503B2 (en) Acceleration of linear classifiers
Wu et al. Weighted attentional blocks for probabilistic object tracking
CN111539428A (en) Rotating object detection method based on multi-scale feature integration and attention mechanism
US20220155441A1 (en) Lidar localization using optical flow
Lee et al. Neural geometric parser for single image camera calibration
Daraei et al. Velocity and shape from tightly-coupled LiDAR and camera
Son et al. A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments
Kalantar et al. Smart counting–oil palm tree inventory with UAV
US20200279395A1 (en) Method and system for enhanced sensing capabilities for vehicles
Cai et al. 3D vehicle detection based on LiDAR and camera fusion
Li Stereo vision and Lidar based dynamic occupancy grid mapping: Application to scenes analysis for intelligent vehicles
Huang et al. Improved intelligent vehicle self-localization with integration of sparse visual map and high-speed pavement visual odometry
Mazoul et al. Fast spatio-temporal stereo for intelligent transportation systems
Park et al. Estimating the camera direction of a geotagged image using reference images
Goldman et al. Robust epipolar geometry estimation using noisy pose priors
EP3147820B1 (en) Object tracking method, device, and system as well as relevant program and non-transitory computer-readable medium
Koc-San et al. A model-based approach for automatic building database updating from high-resolution space imagery
Lin et al. A template-matching based approach for extraction of roads from very high-resolution remotely sensed imagery
Al-Habashna et al. Building height estimation from street-view imagery using deep learning, image processing and automated geospatial analysis

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20240524