[go: up one dir, main page]

WO2019210978A1 - Image processing apparatus and method for an advanced driver assistance system - Google Patents

Image processing apparatus and method for an advanced driver assistance system Download PDF

Info

Publication number
WO2019210978A1
WO2019210978A1 PCT/EP2018/061608 EP2018061608W WO2019210978A1 WO 2019210978 A1 WO2019210978 A1 WO 2019210978A1 EP 2018061608 W EP2018061608 W EP 2018061608W WO 2019210978 A1 WO2019210978 A1 WO 2019210978A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature points
image portion
processing apparatus
scene
Prior art date
Application number
PCT/EP2018/061608
Other languages
French (fr)
Inventor
Onay URFALIOGLU
Claudiu CAMPEANU
Fahd BOUZARAA
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN201880092690.4A priority Critical patent/CN112005243B/en
Priority to PCT/EP2018/061608 priority patent/WO2019210978A1/en
Publication of WO2019210978A1 publication Critical patent/WO2019210978A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • the present invention relates to the field of image processing or computer vision. More specifically, the invention relates to an image processing apparatus and a method for an advanced driver assistance system. BACKGROUND
  • ADASs Advanced driver assistance systems
  • mapping involves an estimation of camera trajectories and the structure (e.g., 3D point cloud) of an environment, which is to be used for localization tasks.
  • Mapping relies on visual input, usually in form of video input from one or more cameras, and requires detecting a sufficient number of feature points from a static background in a scene.
  • Simultaneous localization and mapping is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of the vehicle's location within it.
  • Techniques for combining SLAM with semantic information also referred to as semantic mapping and localization, are disclosed, for instance, in CN 105989586 A, US 9574883 B2 and US 9758305 B2.
  • mapping techniques moving objects can distort the mapping results and cause it to fail.
  • the traffic scene contains many moving objects (e.g., cars, pedestrians, and the like).
  • not enough feature points are found due to lack of unique scene points, image blur, bad lighting conditions and the like.
  • Conventional techniques mainly rely on points or lines or edges for detecting unique feature points in the scene.
  • mapping or localization techniques can fail when there are too many moving objects or not sufficiently many good feature points.
  • mapping works there are not sufficiently many good feature points (outliers or points without correspondence) captured in the map to enable an accurate and robust localization. Extracting many feature points is generally a big computational effort.
  • Embodiments of the invention are defined by the features of the independent claims, and further advantageous implementations of the embodiments by the features of the dependent claims.
  • Scene The surrounding environment with respect to a reference.
  • the scene of a camera is the part of the environment which is visible by the camera.
  • ADAS Advanced Driver Assistance System.
  • 2D image A normal 2-dimensional image or picture (RGB or chrominance-luminance) acquired with one camera.
  • Texture Area within an image which depicts content having significant variation in the (color) intensities.
  • 3D Point Cloud A collection of points in 3D space.
  • 2D feature point A location in image coordinates representing a unique point in the scene.
  • 3D feature point A unique point in a 3D scene.
  • Mapping Creating a 3D structure/3D point cloud within a global coordinate system of some environment including location support (e.g., coordinates).
  • Localization Estimating the current location of an entity (e.g., camera) with respect to the global coordinate system of a provided map.
  • Semantic Segmentation A method to segment an image into different regions according to a semantic context. For instance, pixels depicting a car are all in red color, pixels depicting the road are all in blue color, and the like.
  • Object Instance Single object within a group of objects of the same class.
  • Instance Level Semantic Segmentation A method to segment in image into different regions and object instances according to a semantic belonging. Single objects are identified and are separable from each other.
  • Label An identifier (e.g., an integer) to determine the class type of an item/entity.
  • Dynamic Objects Objects in the scene which typically move or change their location.
  • Static Background All part of the scene which remains static, e.g., buildings, trees, road, and the like.
  • Global Coordinate System Coordinate System with respect to a common global reference.
  • Local Coordinate System Coordinate System with respect to a selected reference within a global reference.
  • Mapping Loop Typically, a specific vehicle route is selected for the environment to be mapped. This route can be traversed multiple times (multiple loops) in order to improve the final map accuracy and consistency.
  • Inlier Corresponding pair of Image Feature Points (from two image frames), where each point is pointing to the same static background 3D point in the scene.
  • Outlier Corresponding pair of Image Feature Points (from two image frames), which are pointing to two different 3D points in the scene.
  • embodiments of the invention are based on the idea to provide a robust and efficient mapping and localization by increasing the number of extracted feature points that are successful (e.g., inlier feature points or short "inliers") and correspond to the static background of a scene.
  • the invention relates to an image processing apparatus for generating a map of a scene on the basis of a plurality of images of the scene, each image comprising a plurality of pixels, wherein the image processing apparatus comprises a processing circuitry configured to iteratively generate the map by processing one-by-one the plurality of images by:
  • the image processing apparatus allows increasing the chance that useful target feature points, i.e. feature points associated with a static background of a scene, are extracted and used in the mapping and localization process.
  • useful target feature points i.e. feature points associated with a static background of a scene
  • the processing circuitry is configured to partition the first image and the further image of the plurality of images into a plurality of rectangular, in particular quadratic image portions.
  • the rectangular image portions have the same size.
  • the processing circuitry is configured to determine the confidence value for each image portion as the ratio of the number of target feature points to the total number of feature points of the respective image portion.
  • the processing circuitry is configured to determine the confidence value for each image portion as the product of the ratio of the number of target feature points to the total number of feature points of the respective image portion and the confidence value of the respective image portion of a previously processed image.
  • the map is a semantic map of the scene, including semantic information for at least some of the plurality of feature points.
  • the processing circuitry is further configured to assign to each of the plurality of feature points a semantic class C and to determine for each image portion a respective primary semantic class C having the most feature points.
  • the processing circuitry is configured to determine the confidence value for each image portion as the ratio of the number of target feature points to the total number of feature points of the respective image portion weighted by a first weighting factor, if the primary semantic class C of the respective image portion of the image is equal to the primary semantic class of the respective image portion of the previously processed image, or by a second weighting factor, if the primary semantic class C of the respective image portion of the image and the primary semantic class of the respective image portion of the previously processed image are different, wherein the first weighting factor is larger than the second weighting factor.
  • the processing circuitry is configured to iteratively generate the map on the basis of a simultaneous localization and mapping, SLAM, algorithm.
  • the number of the plurality of feature points to be extracted from a respective image portion of the further image is directly proportional to the confidence value of the respective image portion of the first image.
  • the image processing apparatus further comprises an image capturing device, in particular a camera, for capturing the plurality of images of the scene.
  • the invention relates to an advanced driver assistance system for a vehicle, wherein the advanced driver assistance system comprises an image processing apparatus according to the first aspect of the invention or any one of its implementation forms.
  • the invention relates to a corresponding image processing method for generating a map of a scene on the basis of a plurality of images of the scene, wherein the image processing method comprises the steps of:
  • step (d) repeating steps (a) to (c) for a further image of the plurality of images, wherein in step (b) the number of the plurality of feature points to be extracted from a respective image portion of the further image depends on the confidence value of the respective image portion of the first image.
  • the image processing method according to the third aspect of the invention can be performed by the image processing apparatus according to the first aspect of the invention. Further features of the image processing method according to the third aspect of the invention result directly from the functionality of the image processing apparatus according to the first aspect and its different implementation forms described above and below. According to a fourth aspect the invention relates to a computer program product comprising program code for performing the method according to the third aspect of the invention when executed on a computer.
  • Fig. 1 is a block diagram showing an example of an image processing apparatus according to an embodiment of the invention
  • Fig. 2 is a schematic diagram showing an example of an image with a plurality of image portions for processing by the image processing apparatus of Fig. 1 ;
  • Fig. 3 is a flow diagram showing an example of processing steps implemented in the image processing apparatus of Fig. 1 ;
  • Fig. 4 is a flow diagram showing another example of processing steps implemented in the image processing apparatus of Fig. 1 .
  • identical reference signs refer to identical or at least functionally equivalent features.
  • a corresponding device may include one or a plurality of units, e.g., functional units, to perform the described one or plurality of method steps (e.g., one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures.
  • units e.g., functional units
  • a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g., one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures.
  • the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.
  • Fig. 1 is a block diagram showing an example of an image processing apparatus 100 according to an embodiment of the invention.
  • the image processing apparatus 100 further comprises an image capturing device 103, in particular a camera, for capturing a plurality of images of a scene.
  • the image processing apparatus 100 is implemented as part of or interacting with an advanced driver assistance system (ADAS) of a vehicle.
  • ADAS advanced driver assistance system
  • the image processing apparatus 100 is configured to generate a map of a scene on the basis of a plurality of images of the scene.
  • the image processing apparatus 100 comprises processing circuitry 101 configured to iteratively generate the map by processing one-by-one the plurality of images by:
  • Fig. 2 is a schematic diagram showing an example of an image 200 with a plurality of image portions identified by a pair of indices (m,n) for processing by the image processing apparatus 100 of Fig. 1 .
  • the processing circuitry 101 is configured to partition the plurality of images, such as the image 200, into a plurality of rectangular, in particular quadratic image portions.
  • the rectangular image portions have the same size.
  • Fig. 3 is a flow diagram showing an example of a plurality of processing steps 300 implemented in the image processing apparatus 100 of Fig. 1 .
  • the plurality of processing steps 300 comprise the following steps.
  • K be the total number of features to be detected across the entire image.
  • P(m,n) K feature points in the image region (m,n).
  • P(m,n) 1 .
  • S(m,n) be the number of target feature points (inliers) in the image region (m,n)
  • T(m,n) the total number of feature points detected in the image region (m,n).
  • the number of the plurality of feature points to be extracted from a respective image portion of the image is directly proportional to the confidence value P(m,n) of the respective image portion of the previously processed image.
  • Semantic Segmentation assigns each pixel in the image a class or label C depicting its semantic class.
  • the assigned class indicates to which semantic class (e.g., car, road, building, and the like) the pixel belongs.
  • semantic class e.g., car, road, building, and the like
  • a pixel cannot be classified it can be assumed to be associated with a dynamic feature and, thus, can be defined as an outlier.
  • Each feature point has a location in pixel coordinates (eventually sub-pixel precision). Therefore, every feature point can be associated with its nearest pixel. If the nearest pixel’s semantic class is a dynamic object (car, pedestrian, truck, bicycle, and the like), then this feature point is removed from the set of detected feature points, i.e.is not a target feature point.
  • the processing circuitry 101 of the image processing apparatus 100 is configured to determine the confidence value P(m,n) for each image portion as the ratio of the number of target feature points S(m,n) to the total number of feature points T(m,n) of the respective image portion. Moreover, in an embodiment, the processing circuitry 101 is configured to determine the confidence value P(m,n) for each image portion as the product of the ratio of the number of target feature points S(m,n) to the total number of feature points T(m,n) of the respective image portion and the confidence value of the respective image portion of a previously processed image.
  • a few exemplary confidence values P(m,n) are shown for the different image portions (m,n) of the image 200 shown in Fig. 2.
  • the processing circuitry will extract most of the feature points in the image portions (1 ,2), (1 ,3) and (2,3), since these have the highest confidence value P.
  • the sum of the confidence values P of all image regions of an image should yield 1 , i.e.:
  • the semantic map is the map in the SLAM (Simultaneous localization and mapping) process, which is used to conduct vehicle localization. Updating the semantic map means that the map is updated according to an SLAM algorithm, but with the additional information coming from semantic segmentation (step 307). In this case, it additionally contains for each feature point its corresponding semantic class C.
  • the processing circuitry 101 of the image processing apparatus 100 is configured to iteratively generate the map on the basis of a simultaneous localization and mapping, SLAM, algorithm.
  • the map contains the calculated 3D point locations of the image feature points and the camera position and orientation.
  • the map can be updated at every new image, i.e. the mapping process is iterative. For example, new points may be added, or the the current camera position and orientation may be added (e.g., like a node in a graph). From time to time, some larger update may be done, e.g., going back several nodes in time (this is called a bundle adjustment process), where the camera position and/or orientation and/or the 3D points are fine-tuned to further improve the estimation accuracy. This is an optimization process overall.
  • Fig. 4 is a flow diagram showing another example of the plurality of processing steps 300 implemented in the image processing apparatus 100 of Fig. 1 .
  • the plurality of processing steps 300 additionally incorporates the semantic information about each image portion into the computation and update of the confidence value P by comprising an additional step 310 and a modified step 31 1 .
  • the plurality of processing steps 300 shown in Fig. 4 take into account the primary semantic class C of the respective image portion of the image that is processed.
  • the primary semantic class of a respective image portion is defined as that semantic class having the largest number of pixels.
  • P(m,n) 1 .
  • S(m,n) denotes the number of target feature points (i.e. inliers) of the image region (m,n)
  • T(m,n) denotes the total number of feature points detected in the image region (m,n).
  • C(m,n) denotes the primary semantic class in the image region (m,n). As already mentioned above, this means that the majority of the pixels belong to the class C(m,n).
  • the weight D is a measure of how often the primary semantic class of each image region is changing over time. More frequent changes decrease the image region’s reliability of containing useful target feature points. This is reflected by the introduction of the weight D in Eqn. (1 ) above. Thus, the higher the change frequency of the primary semantic class, the smaller the average weight D over time.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the described apparatus embodiment is merely exemplary.
  • the unit division is merely logical function division and may be other division in actual implementation.
  • a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces.
  • the indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
  • the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.
  • Embodiments of the invention may further comprise an apparatus, which comprises a processing circuitry configured to perform any of the methods and/or processes described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

An image processing apparatus (100) for generating a map of a scene on the basis of a plurality of images of the scene is proposed. The image processing apparatus (100) comprises a processing circuitry (101) configured to iteratively generate the map by processing the plurality of images by: (a) partitioning a first image of the plurality of images into a plurality of image portions; (b) extracting from each image portion a plurality of feature points and classifying at least one feature point of the plurality of feature points as at least one successful feature point of the respective image portion, in case the at least one feature point is associated with a static background of the scene; (c) determining for each image portion of the first image a confidence value on the basis of the at least one successful feature point; and (d) repeating (a) to (c) for a further image of the plurality of images, wherein in (b) the number of the plurality of feature points to be extracted from a respective image portion of the further image depends on the confidence value of the respective image portion of the first image. An image processing method is also disclosed.

Description

IMAGE PROCESSING APPARATUS AND METHOD FOR AN ADVANCED DRIVER
ASSISTANCE SYSTEM
TECHNICAL FIELD
The present invention relates to the field of image processing or computer vision. More specifically, the invention relates to an image processing apparatus and a method for an advanced driver assistance system. BACKGROUND
Advanced driver assistance systems (ADASs) can alert the driver in dangerous situations and/or take an active part in the driving. One of the main challenges for an advanced driver assistance system (ADAS) is mapping of the environment of a vehicle. Generally, mapping involves an estimation of camera trajectories and the structure (e.g., 3D point cloud) of an environment, which is to be used for localization tasks. Mapping relies on visual input, usually in form of video input from one or more cameras, and requires detecting a sufficient number of feature points from a static background in a scene.
Simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of the vehicle's location within it. Techniques for combining SLAM with semantic information, also referred to as semantic mapping and localization, are disclosed, for instance, in CN 105989586 A, US 9574883 B2 and US 9758305 B2.
In conventional mapping techniques, moving objects can distort the mapping results and cause it to fail. In some cases, the traffic scene contains many moving objects (e.g., cars, pedestrians, and the like). In some other cases, not enough feature points are found due to lack of unique scene points, image blur, bad lighting conditions and the like. Conventional techniques mainly rely on points or lines or edges for detecting unique feature points in the scene. Thus, conventional mapping or localization techniques can fail when there are too many moving objects or not sufficiently many good feature points. Sometimes, even though mapping works, there are not sufficiently many good feature points (outliers or points without correspondence) captured in the map to enable an accurate and robust localization. Extracting many feature points is generally a big computational effort.
In light of the above, there is a need for improved image processing devices and methods which allow robust and efficient mapping and localization.
SUMMARY
Embodiments of the invention are defined by the features of the independent claims, and further advantageous implementations of the embodiments by the features of the dependent claims.
In order to describe embodiments of the invention in detail, the following terms, abbreviations and notation will be used:
Scene: The surrounding environment with respect to a reference. For instance, the scene of a camera is the part of the environment which is visible by the camera.
ADAS: Advanced Driver Assistance System.
2D image: A normal 2-dimensional image or picture (RGB or chrominance-luminance) acquired with one camera.
Texture: Area within an image which depicts content having significant variation in the (color) intensities.
3D Point Cloud: A collection of points in 3D space.
2D feature point: A location in image coordinates representing a unique point in the scene. 3D feature point: A unique point in a 3D scene.
Mapping: Creating a 3D structure/3D point cloud within a global coordinate system of some environment including location support (e.g., coordinates...). Localization: Estimating the current location of an entity (e.g., camera) with respect to the global coordinate system of a provided map.
Semantic Segmentation: A method to segment an image into different regions according to a semantic context. For instance, pixels depicting a car are all in red color, pixels depicting the road are all in blue color, and the like.
Object Instance: Single object within a group of objects of the same class.
Instance Level Semantic Segmentation: A method to segment in image into different regions and object instances according to a semantic belonging. Single objects are identified and are separable from each other.
Label: An identifier (e.g., an integer) to determine the class type of an item/entity.
Dynamic Objects: Objects in the scene which typically move or change their location.
Static Background: All part of the scene which remains static, e.g., buildings, trees, road, and the like.
Global Coordinate System: Coordinate System with respect to a common global reference.
Local Coordinate System: Coordinate System with respect to a selected reference within a global reference.
Mapping Loop: Typically, a specific vehicle route is selected for the environment to be mapped. This route can be traversed multiple times (multiple loops) in order to improve the final map accuracy and consistency.
Inlier: Corresponding pair of Image Feature Points (from two image frames), where each point is pointing to the same static background 3D point in the scene.
Outlier: Corresponding pair of Image Feature Points (from two image frames), which are pointing to two different 3D points in the scene. Generally, embodiments of the invention are based on the idea to provide a robust and efficient mapping and localization by increasing the number of extracted feature points that are successful (e.g., inlier feature points or short "inliers") and correspond to the static background of a scene.
More specifically, according to a first aspect the invention relates to an image processing apparatus for generating a map of a scene on the basis of a plurality of images of the scene, each image comprising a plurality of pixels, wherein the image processing apparatus comprises a processing circuitry configured to iteratively generate the map by processing one-by-one the plurality of images by:
(a) partitioning a first image of the plurality of images into a plurality of image portions;
(b) extracting from each image portion a plurality of feature points and classifying at least one feature point of the plurality of feature points as at least one target feature point, i.e. an inlier of the respective image portion, in case the at least one feature point is associated with a static background of the scene;
(c) determining for each image portion of the first image a confidence value on the basis of the at least one target feature point; and
(d) repeating (a) to (c) for a further image of the plurality of images, wherein in (b) the number of the plurality of feature points to be extracted from a respective image portion of the further image depends on the confidence value of the respective image portion of the first image.
The image processing apparatus according to the first aspect of the invention allows increasing the chance that useful target feature points, i.e. feature points associated with a static background of a scene, are extracted and used in the mapping and localization process. Thus a robust and efficient apparatus for generating a map of a scene is provided.
In a further possible implementation form of the first aspect, the processing circuitry is configured to partition the first image and the further image of the plurality of images into a plurality of rectangular, in particular quadratic image portions. In a further possible implementation form of the first aspect, the rectangular image portions have the same size.
In a further possible implementation form of the first aspect, the processing circuitry is configured to determine the confidence value for each image portion as the ratio of the number of target feature points to the total number of feature points of the respective image portion.
In a further possible implementation form of the first aspect, the processing circuitry is configured to determine the confidence value for each image portion as the product of the ratio of the number of target feature points to the total number of feature points of the respective image portion and the confidence value of the respective image portion of a previously processed image.
In a further possible implementation form of the first aspect, the map is a semantic map of the scene, including semantic information for at least some of the plurality of feature points.
In a further possible implementation form of the first aspect, the processing circuitry is further configured to assign to each of the plurality of feature points a semantic class C and to determine for each image portion a respective primary semantic class C having the most feature points.
In a further possible implementation form of the first aspect, the processing circuitry is configured to determine the confidence value for each image portion as the ratio of the number of target feature points to the total number of feature points of the respective image portion weighted by a first weighting factor, if the primary semantic class C of the respective image portion of the image is equal to the primary semantic class of the respective image portion of the previously processed image, or by a second weighting factor, if the primary semantic class C of the respective image portion of the image and the primary semantic class of the respective image portion of the previously processed image are different, wherein the first weighting factor is larger than the second weighting factor. In a further possible implementation form of the first aspect, the processing circuitry is configured to iteratively generate the map on the basis of a simultaneous localization and mapping, SLAM, algorithm.
In a further possible implementation form of the first aspect, the number of the plurality of feature points to be extracted from a respective image portion of the further image is directly proportional to the confidence value of the respective image portion of the first image.
In a further possible implementation form of the first aspect, the image processing apparatus further comprises an image capturing device, in particular a camera, for capturing the plurality of images of the scene.
According to a second aspect the invention relates to an advanced driver assistance system for a vehicle, wherein the advanced driver assistance system comprises an image processing apparatus according to the first aspect of the invention or any one of its implementation forms.
According to a third aspect the invention relates to a corresponding image processing method for generating a map of a scene on the basis of a plurality of images of the scene, wherein the image processing method comprises the steps of:
(a) partitioning a first image of the plurality of images into a plurality of image portions;
(b) extracting from each image portion a plurality of feature points and classifying at least one feature point of the plurality of feature points as at least one target feature point, i.e. an inlier feature point of the respective image portion, in case the at least one feature point is associated with a static background of the scene;
(c) determining for each image portion of the first image a confidence value on the basis of the at least one target feature point; and
(d) repeating steps (a) to (c) for a further image of the plurality of images, wherein in step (b) the number of the plurality of feature points to be extracted from a respective image portion of the further image depends on the confidence value of the respective image portion of the first image. Thus a robust and efficient method for generating a map of a scene is provided.
The image processing method according to the third aspect of the invention can be performed by the image processing apparatus according to the first aspect of the invention. Further features of the image processing method according to the third aspect of the invention result directly from the functionality of the image processing apparatus according to the first aspect and its different implementation forms described above and below. According to a fourth aspect the invention relates to a computer program product comprising program code for performing the method according to the third aspect of the invention when executed on a computer.
Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
In the following, embodiments of the invention are described in more detail with reference to the attached figures and drawings, in which:
Fig. 1 is a block diagram showing an example of an image processing apparatus according to an embodiment of the invention;
Fig. 2 is a schematic diagram showing an example of an image with a plurality of image portions for processing by the image processing apparatus of Fig. 1 ;
Fig. 3 is a flow diagram showing an example of processing steps implemented in the image processing apparatus of Fig. 1 ; and
Fig. 4 is a flow diagram showing another example of processing steps implemented in the image processing apparatus of Fig. 1 . In the following, identical reference signs refer to identical or at least functionally equivalent features. DETAILED DESCRIPTION OF THE EMBODIMENTS
In the following description, reference is made to the accompanying figures which show, by way of illustration, specific aspects of embodiments of the invention or specific aspects in which embodiments of the invention may be used. It is understood that embodiments of the invention may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the invention is defined by the appended claims. For instance , it is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g., functional units, to perform the described one or plurality of method steps (e.g., one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g., functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g., one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.
Fig. 1 is a block diagram showing an example of an image processing apparatus 100 according to an embodiment of the invention. In an embodiment, the image processing apparatus 100 further comprises an image capturing device 103, in particular a camera, for capturing a plurality of images of a scene. In an embodiment, the image processing apparatus 100 is implemented as part of or interacting with an advanced driver assistance system (ADAS) of a vehicle.
As will be described in more detail below, the image processing apparatus 100 is configured to generate a map of a scene on the basis of a plurality of images of the scene. To this end, the image processing apparatus 100 comprises processing circuitry 101 configured to iteratively generate the map by processing one-by-one the plurality of images by:
(a) partitioning a first image of the plurality of images into a plurality of image portions;
(b) extracting from each image portion a plurality of feature points and classifying at least one feature point of the plurality of feature points as at least one target feature point, i.e. an inlier of the respective image portion, in case the at least one feature point is associated with a static background of the scene;
(c) determining for each image portion of the first image a confidence value P on the basis of the at least one target feature point; and
(d) repeating (a) to (c) for a further image of the plurality of images, wherein in (b) the number of the plurality of feature points to be extracted from a respective image portion of the further image depends on the confidence value of the respective image portion of the first image.
Fig. 2 is a schematic diagram showing an example of an image 200 with a plurality of image portions identified by a pair of indices (m,n) for processing by the image processing apparatus 100 of Fig. 1 . As can be taken from the exemplary image 200 shown in Fig. 2, in an embodiment the processing circuitry 101 is configured to partition the plurality of images, such as the image 200, into a plurality of rectangular, in particular quadratic image portions. In an embodiment, the rectangular image portions have the same size.
Fig. 3 is a flow diagram showing an example of a plurality of processing steps 300 implemented in the image processing apparatus 100 of Fig. 1 . The plurality of processing steps 300 comprise the following steps.
301 : Capture an image of the plurality of images for further processing.
303: Partition the image of the plurality of images into a NxM rectangular grid of image portions.
305: Let K be the total number of features to be detected across the entire image. Setup the processing circuitry 101 to detect or extract P(m,n) * K feature points in the image region (m,n). Initially, set all P(m,n) = 1 . Let S(m,n) be the number of target feature points (inliers) in the image region (m,n) and T(m,n) the total number of feature points detected in the image region (m,n). Thus, in an embodiment, the number of the plurality of feature points to be extracted from a respective image portion of the image is directly proportional to the confidence value P(m,n) of the respective image portion of the previously processed image.
307: Semantic Segmentation assigns each pixel in the image a class or label C depicting its semantic class. The assigned class indicates to which semantic class (e.g., car, road, building, and the like) the pixel belongs. In case a pixel cannot be classified, it can be assumed to be associated with a dynamic feature and, thus, can be defined as an outlier.
309: Each feature point has a location in pixel coordinates (eventually sub-pixel precision). Therefore, every feature point can be associated with its nearest pixel. If the nearest pixel’s semantic class is a dynamic object (car, pedestrian, truck, bicycle, and the like), then this feature point is removed from the set of detected feature points, i.e.is not a target feature point.
31 1 : The confidence value P of each image portion (m,n) is updated as:
Figure imgf000012_0001
Thus, in an embodiment the processing circuitry 101 of the image processing apparatus 100 is configured to determine the confidence value P(m,n) for each image portion as the ratio of the number of target feature points S(m,n) to the total number of feature points T(m,n) of the respective image portion. Moreover, in an embodiment, the processing circuitry 101 is configured to determine the confidence value P(m,n) for each image portion as the product of the ratio of the number of target feature points S(m,n) to the total number of feature points T(m,n) of the respective image portion and the confidence value of the respective image portion of a previously processed image.
A few exemplary confidence values P(m,n) are shown for the different image portions (m,n) of the image 200 shown in Fig. 2. As will be appreciated, the processing circuitry will extract most of the feature points in the image portions (1 ,2), (1 ,3) and (2,3), since these have the highest confidence value P. The sum of the confidence values P of all image regions of an image should yield 1 , i.e.:
Figure imgf000013_0001
313: Update the map using the labelled feature points. In an embodiment, the semantic map is the map in the SLAM (Simultaneous localization and mapping) process, which is used to conduct vehicle localization. Updating the semantic map means that the map is updated according to an SLAM algorithm, but with the additional information coming from semantic segmentation (step 307). In this case, it additionally contains for each feature point its corresponding semantic class C. Thus, in an embodiment, the processing circuitry 101 of the image processing apparatus 100 is configured to iteratively generate the map on the basis of a simultaneous localization and mapping, SLAM, algorithm.
In the mapping process, the map contains the calculated 3D point locations of the image feature points and the camera position and orientation. As described above, the map can be updated at every new image, i.e. the mapping process is iterative. For example, new points may be added, or the the current camera position and orientation may be added (e.g., like a node in a graph). From time to time, some larger update may be done, e.g., going back several nodes in time (this is called a bundle adjustment process), where the camera position and/or orientation and/or the 3D points are fine-tuned to further improve the estimation accuracy. This is an optimization process overall.
315: Output the (updated) semantic map.
Fig. 4 is a flow diagram showing another example of the plurality of processing steps 300 implemented in the image processing apparatus 100 of Fig. 1 . In comparison to the processing steps 300 shown in Fig. 3 the plurality of processing steps 300 additionally incorporates the semantic information about each image portion into the computation and update of the confidence value P by comprising an additional step 310 and a modified step 31 1 .
More specifically, the plurality of processing steps 300 shown in Fig. 4 take into account the primary semantic class C of the respective image portion of the image that is processed. The primary semantic class of a respective image portion is defined as that semantic class having the largest number of pixels. As in the case of the processing steps 300 shown in Fig. 3, initially the confidence values for all images regions of a currently processed image should be normalized, i.e. P(m,n) = 1 . As in the case of the processing steps 300 shown in Fig. 3, S(m,n) denotes the number of target feature points (i.e. inliers) of the image region (m,n) and T(m,n) denotes the total number of feature points detected in the image region (m,n).
310: Determine the primary semantic class for each image region. C(m,n) denotes the primary semantic class in the image region (m,n). As already mentioned above, this means that the majority of the pixels belong to the class C(m,n).
31 1 : The confidence value P of each image region (m,n) is updated by the processing circuitry 101 on the basis of the following equations:
Figure imgf000014_0001
1.00
wherein
Figure imgf000014_0002
0.75
Figure imgf000014_0003
Flere, the weight D is a measure of how often the primary semantic class of each image region is changing over time. More frequent changes decrease the image region’s reliability of containing useful target feature points. This is reflected by the introduction of the weight D in Eqn. (1 ) above. Thus, the higher the change frequency of the primary semantic class, the smaller the average weight D over time.
Thus, in an embodiment, the processing circuitry 101 is configured to assign to each of the plurality of feature points a semantic class C and to determine for each image portion a respective primary semantic class C having the most feature points. Moreover, in an embodiment, the processing circuitry 101 is configured to determine the confidence value P(m,n) for each image portion as the ratio of the number of target feature points S(m,n) to the total number of feature points T(m,n) of the respective image portion weighted by a first weighting factor, e.g., D=1 , if the primary semantic class C of the respective image portion of the image is equal to the primary semantic class of the respective image portion of the previously processed image, or by a second weighting factor, e.g., D=0.75, if the primary semantic class C of the respective image portion of the image and the primary semantic class of the respective image portion of the previously processed image are different.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely exemplary. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
In addition, functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.
Embodiments of the invention may further comprise an apparatus, which comprises a processing circuitry configured to perform any of the methods and/or processes described herein.

Claims

1 . An image processing apparatus (100) for generating a map of a scene on the basis of a plurality of images of the scene, wherein the image processing apparatus (100) comprises a processing circuitry (101 ) configured to generate the map by processing the plurality of images by:
(a) partitioning a first image of the plurality of images into a plurality of image portions;
(b) extracting from each image portion a plurality of feature points and classifying at least one feature point of the plurality of feature points as at least one target feature point of the respective image portion, in case the at least one feature point is associated with a static background of the scene;
(c) determining for each image portion of the first image a confidence value on the basis of the at least one target feature point; and
(d) repeating (a) to (c) for a further image of the plurality of images, wherein in (b) the number of the plurality of feature points to be extracted from a respective image portion of the further image depends on the confidence value of the respective image portion of the first image.
2. The image processing apparatus (100) of claim 1 , wherein the processing circuitry (101 ) is configured to partition the first image and the further image of the plurality of images into a plurality of rectangular image portions.
3. The image processing apparatus (100) of claim 2, wherein the rectangular image portions have the same size.
4. The image processing apparatus (100) of any one of the preceding claims, wherein the processing circuitry (101 ) is configured to determine the confidence value for each image portion as the ratio of the number of target feature points to the number of feature points of the respective image portion.
5. The image processing apparatus (100) of any one of the preceding claims, wherein the processing circuitry (101 ) is configured to determine the confidence value for each image portion as the product of the ratio of the number of target feature points to the number of feature points of the respective image portion and the confidence value of the respective image portion of a previously processed image.
6. The image processing apparatus (100) of any one of the preceding claims, wherein the map is a semantic map of the scene, the map including semantic information for at least some of the feature points.
7. The image processing apparatus (100) of claim 6, wherein the processing circuitry (101 ) is further configured to assign to each of the plurality of feature points a semantic class C and to determine for each image portion a respective primary semantic class C having the most feature points.
8. The image processing apparatus (100) of claim 7, wherein the processing circuitry (101 ) is configured to determine the confidence value for each image portion as the ratio of the number of target feature points to the number of feature points of the respective image portion weighted by a first weighting factor, if the primary semantic class C of the respective image portion of the image is equal to the primary semantic class of the respective image portion of a previously processed image, or by a second weighting factor, if the primary semantic class C of the respective image portion of the image and the primary semantic class of the respective image portion of a previously processed image are different, wherein the first weighting factor is larger than the second weighting factor.
9. The image processing apparatus (100) of anyone of claims 6 to 8, wherein the processing circuitry (101 ) is configured to generate the map on the basis of a simultaneous localization and mapping, SLAM, algorithm.
10. The image processing apparatus (100) of any one of the preceding claims, wherein the number of the plurality of feature points to be extracted from a respective image portion of the further image is proportional to the confidence value of the respective image portion of the first image.
1 1 . The image processing apparatus (100) of any one of the preceding claims, wherein the image processing apparatus (100) further comprises an image capturing device (103), in particular a camera, for capturing the plurality of images of the scene.
12. Advanced driver assistance system for a vehicle, wherein the advanced driver assistance system comprises an image processing apparatus (100) according to any one of the preceding claims.
13. An image processing method for generating a map of a scene on the basis of a plurality of images of the scene, wherein the image processing method (200) comprises the steps of:
(a) partitioning a first image of the plurality of images into a plurality of image portions;
(b) extracting from each image portion a plurality of feature points and classifying at least one feature point of the plurality of feature points as at least one target feature point of the respective image portion, in case the at least one feature point is associated with a static background of the scene;
(c) determining for each image portion of the first image a confidence value on the basis of the at least one target feature point; and
(d) repeating steps (a) to (c) for a further image of the plurality of images, wherein in step (b) the number of the plurality of feature points to be extracted from a respective image portion of the further image depends on the confidence value of the respective image portion of the first image.
14. A computer program product comprising program code for performing the method of claim 13, when executed on a computer or a processor.
PCT/EP2018/061608 2018-05-04 2018-05-04 Image processing apparatus and method for an advanced driver assistance system WO2019210978A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880092690.4A CN112005243B (en) 2018-05-04 2018-05-04 Image processing device and method for advanced driver assistance system
PCT/EP2018/061608 WO2019210978A1 (en) 2018-05-04 2018-05-04 Image processing apparatus and method for an advanced driver assistance system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2018/061608 WO2019210978A1 (en) 2018-05-04 2018-05-04 Image processing apparatus and method for an advanced driver assistance system

Publications (1)

Publication Number Publication Date
WO2019210978A1 true WO2019210978A1 (en) 2019-11-07

Family

ID=62152537

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/061608 WO2019210978A1 (en) 2018-05-04 2018-05-04 Image processing apparatus and method for an advanced driver assistance system

Country Status (2)

Country Link
CN (1) CN112005243B (en)
WO (1) WO2019210978A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112902966A (en) * 2021-01-28 2021-06-04 开放智能机器(上海)有限公司 Fusion positioning system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130216098A1 (en) * 2010-09-17 2013-08-22 Tokyo Institute Of Technology Map generation apparatus, map generation method, moving method for moving body, and robot apparatus
CN105989586A (en) 2015-03-04 2016-10-05 北京雷动云合智能技术有限公司 SLAM method based on semantic bundle adjustment method
US9574883B2 (en) 2015-03-24 2017-02-21 X Development Llc Associating semantic location data with automated environment mapping
US9758305B2 (en) 2015-07-31 2017-09-12 Locus Robotics Corp. Robotic navigation utilizing semantic mapping
US20180012085A1 (en) * 2016-07-07 2018-01-11 Ants Technology (Hk) Limited. Computer Vision Based Driver Assistance Devices, Systems, Methods and Associated Computer Executable Code

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8791996B2 (en) * 2010-03-31 2014-07-29 Aisin Aw Co., Ltd. Image processing system and position measurement system
CN107437258B (en) * 2016-05-27 2020-11-06 株式会社理光 Feature extraction method, motion state estimation method, and motion state estimation device
JP2018036901A (en) * 2016-08-31 2018-03-08 富士通株式会社 Image processing apparatus, image processing method, and image processing program
CN106778767B (en) * 2016-11-15 2020-08-11 电子科技大学 Visual image feature extraction and matching method based on ORB and active vision
CN107689048B (en) * 2017-09-04 2022-05-31 联想(北京)有限公司 A method for detecting image feature points and a server cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130216098A1 (en) * 2010-09-17 2013-08-22 Tokyo Institute Of Technology Map generation apparatus, map generation method, moving method for moving body, and robot apparatus
CN105989586A (en) 2015-03-04 2016-10-05 北京雷动云合智能技术有限公司 SLAM method based on semantic bundle adjustment method
US9574883B2 (en) 2015-03-24 2017-02-21 X Development Llc Associating semantic location data with automated environment mapping
US9758305B2 (en) 2015-07-31 2017-09-12 Locus Robotics Corp. Robotic navigation utilizing semantic mapping
US20180012085A1 (en) * 2016-07-07 2018-01-11 Ants Technology (Hk) Limited. Computer Vision Based Driver Assistance Devices, Systems, Methods and Associated Computer Executable Code

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Simultaneous Localization Mapping and Tracking of Moving Objects", 1 August 2011, KASSEL UNIVERSITY PRESS GMBH, ISBN: 978-3-86219-062-1, article GEORGIOS LIDORIS: "Simultaneous Localization Mapping and Tracking of Moving Objects", pages: 8 - 30, XP055533139 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112902966A (en) * 2021-01-28 2021-06-04 开放智能机器(上海)有限公司 Fusion positioning system and method

Also Published As

Publication number Publication date
CN112005243A (en) 2020-11-27
CN112005243B (en) 2024-12-10

Similar Documents

Publication Publication Date Title
US9710924B2 (en) Field of view determiner
CN108230437B (en) Scene reconstruction method and apparatus, electronic device, program, and medium
EP2858008B1 (en) Target detecting method and system
CN109598794B (en) Construction method of three-dimensional GIS dynamic model
EP2575104B1 (en) Enhancing video using super-resolution
US9104919B2 (en) Multi-cue object association
CN110599522B (en) Method for detecting and removing dynamic target in video sequence
KR20190030474A (en) Method and apparatus of calculating depth map based on reliability
CN104685513A (en) Feature based high resolution motion estimation from low resolution images captured using an array source
CN106530407A (en) Three-dimensional panoramic splicing method, device and system for virtual reality
US20210232862A1 (en) Data providing system and data collection system
CN113379748A (en) Point cloud panorama segmentation method and device
EP2989611A1 (en) Moving object detection
Pan et al. Depth map completion by jointly exploiting blurry color images and sparse depth maps
Huang et al. A hybrid moving object detection method for aerial images
Shi et al. A method for detecting pedestrian height and distance based on monocular vision technology
CN113298871A (en) Map generation method, positioning method, system thereof, and computer-readable storage medium
CN112085778A (en) Oblique photography illegal building detection method and system based on superpixels and morphology
US11423647B2 (en) Identification system, model re-learning method and program
WO2019210978A1 (en) Image processing apparatus and method for an advanced driver assistance system
JP2014102805A (en) Information processing device, information processing method and program
CN113987228A (en) Database construction method, positioning method and related equipment thereof
KR102049666B1 (en) Method for Estimating 6-DOF Relative Displacement Using Vision-based Localization and Apparatus Therefor
KR102161212B1 (en) System and method for motion detecting
WO2020054058A1 (en) Identification system, parameter value update method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18724182

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18724182

Country of ref document: EP

Kind code of ref document: A1