KR20170092533A

KR20170092533A - A face pose rectification method and apparatus

Info

Publication number: KR20170092533A
Application number: KR1020177010873A
Authority: KR
Inventors: 얀 로드리게즈; 프란시스 모울린; 세바스티앙 피칸트
Original assignee: 키레몬 에스에이
Priority date: 2014-09-23
Filing date: 2014-09-23
Publication date: 2017-08-11
Also published as: WO2016045711A1; EP3198522A1

Abstract

얼굴(100) 이미지들을 표현하는 데이터에서 포즈(pose)를 교정하기 위한 포즈 교정 방법은: A-2D 근적외선 이미지 데이터, 2D 가시광선 이미지 데이터, 및 깊이 맵을 포함하는 적어도 하나의 테스트 프레임을 획득하는 단계; C-공지된 방향의 머리의 3D 모델에 따라 상기 깊이 맵을 정렬함으로써 상기 테스트 프레임에서 얼굴의 포즈를 추정하는 단계; D-텍스처(texture)된 이미지 데이터를 생성하기 위해, 상기 2D 이미지의 적어도 하나를 깊이 맵 상에 매핑하는 단계; E-포즈 교정된 2D 투영된 이미지를 표현하는 데이터를 생성하기 위해 텍스처된 이미지 데이터를 2D로 투영하는 단계를 포함한다.A pause correction method for correcting a pose in data representing face 100 images includes: obtaining at least one test frame comprising A-2D near-infrared image data, 2D visible light image data, and a depth map step; C-estimating a pose of a face in the test frame by aligning the depth map according to a 3D model of the head of a known direction; Mapping at least one of the 2D images onto a depth map to generate D-textured image data; And projecting the textured image data in 2D to produce data representing an E-Pose calibrated 2D projected image.

Description

Technical Field [0001] The present invention relates to a facial pose correcting method and apparatus,

본 발명은 얼굴 포즈(pose) 교정 방법 및 장치에 관한 것이다.The present invention relates to a face pose correction method and apparatus.

얼굴 인식은 인간 얼굴의 테스트 이미지(또는 테스트 이미지들의 세트)를 표현하는 데이터의 분석 및 기준 이미지들의 데이터베이스와의 그것의 비교를 수반한다. 테스트 이미지는 일반적으로, 공통 2D 카메라로 캡처(capture)된 2D 이미지인 반면에, 기준 이미지는 예를 들면, 깊이 카메라로 캡처된 2D 이미지 또는 때때로 3D 이미지이다.Face recognition involves the analysis of data representing a test image of a human face (or a set of test images) and its comparison with a database of reference images. The test image is typically a 2D image captured with a common 2D camera, while the reference image is, for example, a 2D image, or sometimes a 3D image, captured with a depth camera.

얼굴 인식을 위한 이전 접근법들은 일반적으로, 테스트 이미지 및 기준 이미지들이 전부 완전한 정면 뷰로 캡처된다고 가정한다. 따라서, 정면 테스트 이미지들을 분류하고 그들을 대응하는 정면 기준 이미지들과 매칭(matching)하기 위한 다양한 수행 알고리즘들이 개발되었다.Previous approaches for face recognition generally assume that the test image and the reference images are all captured in a complete front view. Accordingly, various performance algorithms have been developed to classify front test images and match them with corresponding front reference images.

인식의 신뢰성은 그러나, 테스트 이미지가 정면이 아닌 시점으로부터 캡처되면 빠르게 감소한다. 이 문제점을 해결하기 위해, 포즈를 교정하는 것 즉, 정면이 아닌 테스트 이미지로부터 합성 정면 테스트 이미지를 생성하기 위해 테스트 이미지를 휘게하는 것(warp)이 종래 기술에서 이미 제안되었다.The credibility of recognition, however, decreases rapidly when the test image is captured from a non-frontal viewpoint. To solve this problem, it has already been proposed in the prior art to correct the pose, i.e. warp the test image to produce a composite frontal test image from the non-frontal test image.

따라서, 머리 포즈를 추정하고 그 다음, 정정하기 위한 바르고 강건한(robust) 방법들은 얼굴 인식, 또는 더 일반적으로 얼굴 분류 및 얼굴 프로세싱과 같은 많은 적용들에서 필수적이다.Thus, robust methods for estimating head pose and then correcting are essential in many applications such as face recognition, or more generally face classification and face processing.

US8055028은 정면이 아닌 2D 얼굴 이미지를 정면 얼굴 이미지로 정규화하는 방법에 관한 것이다. 방법은: 객체의 정면이 아닌 이미지의 포즈를 결정하는 단계; 객체의 정면이 아닌 이미지에 관해 스무딩 변환(smoothing transformation)을 수행하고, 그에 의해 스무딩된 객체 이미지를 생성하는 단계; 및 포즈 결정 결과 및 스무딩된 객체 이미지를 이용함으로써 객체의 정면 이미지를 합성하는 단계를 포함한다. 포즈는 정면이 아닌 이미지의 중심 라인의 양쪽 측면에 존재하는 객체 피쳐 포인트(object feature point)들 사이의 제 1 평균 거리를 얻음으로써 결정된다.US8055028 relates to a method of normalizing a 2D facial image to a frontal facial image rather than a frontal view. The method includes: determining a pose of an image that is not a front of the object; Performing a smoothing transformation on the non-frontal image of the object, thereby creating a smoothed object image; And synthesizing the frontal image of the object by using the pose determination result and the smoothed object image. The pose is determined by obtaining a first average distance between object feature points that are present on both sides of the center line of the image rather than the front.

US8199979는 2D 이미지 획득 디바이스로 획득되는 얼굴 영역들을 포함하는 이미지들을 분류하고 보관하기 위한 방법을 개시한다. 얼굴 검출 모듈은 얼굴 영역에 대응하는 픽셀들의 그룹을 식별한다. 얼굴 방향(face orientation) 및 포즈가 그 다음 결정된다. 절반의 옆모습(half-profile) 얼굴의 경우에, 정규화 모듈은 후보 얼굴 영역을 3D 공간으로 변환하고, 그 다음 그것을 적절한 포즈 정정이 행해질 때까지 회전한다. 대안적으로, 2D 얼굴 영역의 텍스처(texture), 컬러 및 피쳐 영역들은 그 다음, 포즈를 정정하기 위해 회전되는 3D 모델 상에 매핑된다. 이것은 그러나, 2D 이미지가 상이한 얼굴의 3D 모델 상에 매핑될 때, 심각한 변형을 야기한다.US 8199979 discloses a method for classifying and storing images comprising face regions obtained with a 2D image acquisition device. The face detection module identifies a group of pixels corresponding to the face area. The face orientation and pose are then determined. In the case of a half-profile face, the normalization module converts the candidate face area to 3D space, and then rotates it until an appropriate pose correction is made. Alternatively, the texture, color, and feature areas of the 2D face region are then mapped onto the rotated 3D model to correct the pose. This, however, causes a serious deformation when 2D images are mapped onto 3D models of different faces.

US7929775는 2D 이미지를 3D 클래스 모델(class model)과 매칭하기 위한 방법을 개시한다. 방법은 2D 이미지에서 이미지 피쳐들을 식별하는 단계; 클래스 모델과 이미지 사이의 정렬 변형을 컴퓨팅(computing)하는 단계; 및 정렬 변형 하에서, 클래스 모델의 클래스 부분들을 이미지 피쳐들과 비교하는 단계를 포함한다.US7929775 discloses a method for matching a 2D image with a 3D class model. The method includes identifying image features in a 2D image; Computing an alignment transformation between the class model and the image; And comparing the class portions of the class model with image features under a sorting variant.

US7289648은 단일 이미지로부터 얼굴과 같은, 3차원 객체를 자동으로 모델링하기 위한 방법을 개시한다. 본 발명에 따른 시스템 및 방법은 단일 이미지를 이용하여 하나 이상의 3차원(3D) 얼굴 모델들을 구성한다. 그것은 또한, 대부분의 얼굴 인식 시스템들을 트레이닝(training)하기 위해 필요한 다양한 포즈들을 갖는 얼굴들의 데이터베이스를 생성하기 위한 툴(tool)로서 이용될 수 있다.US 7289648 discloses a method for automatically modeling a three-dimensional object, such as a face, from a single image. The system and method according to the present invention construct one or more three-dimensional (3D) face models using a single image. It can also be used as a tool to create a database of faces with various poses needed to train most face recognition systems.

US2013156262는 쌍 피쳐들의 세트를 기하학적 구성요소(geometric primitive)들의 쌍들로서 정의함으로써 객체의 포즈의 추정을 제안하고, 기하학적 구성요소들은 지향된 표면 포인트들, 지향된 경계 포인트들, 및 경계 라인 세그먼트들을 포함한다. 모델 쌍 피쳐들은 객체의 모델에 대한 쌍 피쳐들의 세트에 기초하여 결정된다. 장면 쌍 피쳐들은 3D 센서에 의해 획득된 데이터로부터의 쌍 피쳐들의 세트에 기초하여 결정되고, 그 다음 모델 쌍 피쳐들은 객체의 포즈를 추정하기 위해 장면 쌍 피쳐들과 매칭된다.US2013156262 proposes an estimate of the pose of an object by defining a set of pairs of features as pairs of geometric primitives, and the geometric components include directed surface points, directed boundary points, and boundary line segments do. The model pair features are determined based on the set of pairs of features for the model of the object. The scene pair features are determined based on a set of pairs of features from the data obtained by the 3D sensor, and then the model pair features are matched with the scene pair features to estimate the object's pose.

2D 이미지로부터의 포즈 교정은 그러나, 어려운 업무이고 교정의 품질은 일반적으로 강건하지 않은데, 이는 그것이 폐색(occlusion)들, 조명, 등에 관해 성취될 진폭 정정에 의존하기 때문이다.Pose correction from a 2D image is, however, a difficult task and the quality of the correction is generally not robust because it depends on the amplitude correction to be achieved with respect to occlusions, illumination,

2D 데이터에 기초하여 머리 추정 포즈의 내재하는 문제점들 중 일부를 극복하기 위해, 점점 더 이용가능하고 입수가능해지고 있는, 깊이 센서들에 의해 제공된 부가적인 깊이 정보를 이용하는 것이 이미 제안되어 왔다.It has already been proposed to use additional depth information provided by depth sensors, which are becoming increasingly available and available, to overcome some of the inherent problems of head estimation poses based on 2D data.

US8660306은 깊이 데이터로부터 결정된 인간 포즈의 정정을 위한 방법을 개시한다. 방법은 깊이 이미지 데이터를 수신하는 단계, 깊이 이미지 데이터로부터 연계된 객체의 초기 추정된 골격을 얻는 단계, 랜덤 포레스트 부분공간 회귀 함수(random forest subspace regression function)(복수의 랜덤 분할/투영 결정 트리들을 활용하는 함수)를 초기 추정된 골격에 적용하는 단계, 및 랜덤 포레스트 부분공간 회귀를 초기 추정된 골격에 적용하는 결과에 기초하여 포즈의 표현을 결정하는 단계를 포함한다. 방법은 특히, 전체 신체의 포즈의 측정에 적응된다.US 8660306 discloses a method for correcting human pauses determined from depth data. The method includes receiving depth image data, obtaining an initial estimated skeleton of the associated object from the depth image data, using a random forest subspace regression function (utilizing a plurality of random segmentation / To the initially estimated skeleton, and determining a representation of the pose based on the result of applying the random forest subspace regression to the initially estimated skeleton. The method is particularly adapted to the measurement of the pose of the entire body.

US6381346은 얼굴 이미지들을 생성하고, 복합 코드들에 의해 그들 이미지들을 인덱싱(indexing)하며, 유사한 2차원 얼굴 이미지들을 검색하기 위한 시스템을 개시한다. 인간 얼굴들의 3D 이미지들은 3D 얼굴 피쳐 표면 형상들의 데이터 저장소로부터 생성된다. 이들 형상들은 얼굴 피쳐 부분들에 의해 조직화된다. 각각의 얼굴 부분에 대한 형상을 조립함으로써, 3D 얼굴 이미지가 형성된다.US 6 383 464 discloses a system for generating face images, indexing the images by composite codes, and searching similar two-dimensional face images. 3D images of human faces are generated from a data store of 3D face feature surface features. These shapes are organized by facial feature portions. By assembling the shapes for each face portion, a 3D face image is formed.

US8406484는 얼굴 인식 장치에 관한 것이고, 상기 얼굴 인식 장치는: 대상의 2차원 이미지 정보를 획득하기 위한 2차원 정보 획득 유닛; 대상의 3차원 이미지 정보를 획득하기 위한 3차원 정보 획득 유닛; 이용자의 3차원 얼굴 정보 및 이용자의 2차원 얼굴 정보에 대응하는 타원형 모델을 저장하기 위한 이용자 정보 데이터베이스; 및 대상의 2차원 이미지 정보를 이용하여 얼굴 인식을 수행하고, 인식된 얼굴이 이용자의 얼굴인지의 여부를 결정하고, 이용자의 타원형 모델을 3차원 이미지 정보와 매칭하고, 인식된 얼굴이 이용자의 얼굴이라고 결정할 시에 에러를 산출하며, 이용자의 얼굴이 에러에 기초하여 적절하지 않게 이용되는지의 여부를 결정하기 위한 제어 유닛을 포함한다.US 8406484 relates to a face recognition apparatus, which comprises: a two-dimensional information obtaining unit for obtaining two-dimensional image information of an object; A three-dimensional information obtaining unit for obtaining three-dimensional image information of an object; A user information database for storing the three-dimensional face information of the user and the elliptic model corresponding to the two-dimensional face information of the user; Dimensional image information of the object, determines whether or not the recognized face is the face of the user, matches the elliptic model of the user with the three-dimensional image information, and recognizes the face of the user , And includes a control unit for determining whether or not the face of the user is improperly used based on the error.

US7756325는 단일 사진으로부터 검색된 정보에 기초하여, 인간 얼굴과 같은, 3차원 객체의 3D 형상을 추정하기 위한 알고리즘을 개시한다. 픽셀 세기에 더하여, 본 발명은 멀티 피쳐 맞춤 알고리즘(multi-features fitting algorithm)(MFF)에서 다양한 이미지 피쳐들을 이용한다.US 7756325 discloses an algorithm for estimating the 3D shape of a three-dimensional object, such as a human face, based on information retrieved from a single photograph. In addition to pixel intensity, the present invention utilizes a variety of image features in a multi-features fitting algorithm (MFF).

EP1039417은 3차원 객체의 이미지를 프로세싱하는 방법에 관한 것이고, 상기 방법은 복수의 3차원 이미지들로부터 얻어진 변형가능한 객체 모델을 제공하는 단계, 변형가능한 객체 모델을 적어도 하나의 2D 객체 이미지와 매칭하는 단계, 및 매칭된 변형가능한 객체 모델을 객체의 3D 표현으로서 제공하는 단계를 포함한다.EP1039417 relates to a method of processing an image of a three-dimensional object, the method comprising the steps of providing a deformable object model obtained from a plurality of three-dimensional images, matching the deformable object model with at least one 2D object image And providing the matched deformable object model as a 3D representation of the object.

US8553973은 3D 객체들(예를 들면, 인간 얼굴들)을 모델링하기 위한 또 다른 방법들 및 시스템들에 관한 것이다. 게임 콘솔들, 랩탑들, 태블릿들, 스마트폰들과 같은 다양한 소비자 디바이스들, 및 예를 들면, 자동차들에서의 깊이 센서들의 출현으로 인해, 얼굴들의 더욱 더 많은 이미지들이 깊이 맵을 포함할 것이다. RGB-D 데이터세트들을 생성하는 깊이 센서의 대중적인 예는 엑스박스(Xbox) 360, 엑스박스 원 및 윈도우즈 PC(마이크로소프트사의 모든 상표들)를 위해 마이크로소프트에 의해 제안된 키넥트 입력 디바이스이다. 깊이 센서들은 이미지의 각각의 픽셀에 대한 깊이(또는 광원에 대한 거리)의 표시를 포함하지만, 예를 들면 머리의 뒤쪽과 같은, 숨겨지거나 폐색된 요소들에 관해 어떠한 표시도 포함하지 않는 2.5D 이미지 데이터세트들을 생성한다.US8553973 relates to other methods and systems for modeling 3D objects (e.g., human faces). More and more images of faces will include a depth map, due to the appearance of various consumer devices such as game consoles, laptops, tablets, smart phones, and depth sensors in, for example, automobiles. A popular example of a depth sensor that generates RGB-D data sets is the Kinect input device proposed by Microsoft for the Xbox 360, Xbox One and Windows PCs (all trademarks of Microsoft Corporation). The depth sensors include an indication of the depth (or distance to the light source) for each pixel of the image, but a 2.5D image that does not contain any indication of hidden or occluded elements, such as, for example, Data sets are generated.

테스트 이미지들이 깊이 센서들로 캡처되고, 그 다음 기존의 2D 이미지들과 비교되는 새로운 얼굴 인식 방법들을 제공하는 것이 바람직할 것이다. 이러한 방법은 RGB-D 이미지 데이터의 획득을 위해 최신의 깊이 센서들을 이용하고, 그들을 폭넓게 이용가능한 2D 기준 이미지들과 비교하는 장점을 가질 것이다.It would be desirable to provide new face recognition methods in which test images are captured with depth sensors and then compared to existing 2D images. This method will have the advantage of using modern depth sensors for acquiring RGB-D image data and comparing them to widely available 2D reference images.

포즈 교정의 업무를 개선하기 위해 깊이 센서들의 능력들 및 RGB-D 데이터세트들을 이용하여 새로운 방법들을 제공하는 것이 또한 바람직할 것이다.It would also be desirable to provide new methods using the capabilities of depth sensors and RGB-D data sets to improve the task of pose calibration.

기존의 방법들보다 빠른, 머리 포즈를 평가하기 위한 새로운 방법을 제공하는 것이 또한 바람직할 것이다.It would also be desirable to provide a new method for evaluating head pose, which is faster than conventional methods.

기존 방법들보다 정확한, 머리 포즈를 평가하기 위한 새로운 방법을 제공하는 것이 또한 바람직할 것이다.It would also be desirable to provide a new method for evaluating head pose, which is more accurate than existing methods.

넓은 포즈 변형들을 취급할 수 있는, 머리 포즈를 평가하기 위한 새로운 방법을 제공하는 것이 또한 바람직할 것이다.It would also be desirable to provide a new method for evaluating head pose, which can handle wide pose deformations.

따라서, 본 발명의 목적은 얼굴 이미지들을 표현하는 2.5D 데이터세트들에서 포즈를 교정하기 위한 새로운 방법을 제안하는 것이다.It is therefore an object of the present invention to propose a new method for correcting pose in 2.5D data sets representing face images.

본 발명에 따라, 이들 목적들은 얼굴 이미지들을 표현하는 데이터에서 포즈를 교정하기 위한 포즈 교정 방법에 의해 성취되고, 상기 방법은:According to the present invention, these objects are achieved by a pose correction method for correcting a pose in data representing face images, the method comprising:

A-2D 근적외선 이미지 데이터, 2D 가시광선 이미지 데이터, 및 깊이 맵을 포함하는 적어도 하나의 테스트 프레임을 획득하는 단계;Obtaining at least one test frame including A-2D near-infrared image data, 2D visible light image data, and a depth map;

C-공지된 방향의 머리의 3D 모델에 따라 상기 깊이 맵을 정렬함으로써 상기 테스트 프레임에서 얼굴의 포즈를 추정하는 단계;C-estimating a pose of a face in the test frame by aligning the depth map according to a 3D model of the head of a known direction;

D-텍스처된 이미지 데이터를 생성하기 위해, 상기 2D 이미지의 적어도 하나를 깊이 맵 상에 매핑하는 단계;Mapping at least one of the 2D images onto a depth map to generate D-textured image data;

E-포즈 교정된 2D 투영된 이미지를 표현하는 데이터를 생성하기 위해 텍스처된 이미지 데이터를 2D로 투영하는 단계를 포함한다.And projecting the textured image data in 2D to produce data representing an E-Pose calibrated 2D projected image.

깊이 맵(즉, 2.5D 데이터세트)의 이용이 이로운데, 이는 그것이 포즈 추정 단계의 정확성 및 강건성을 개선하기 때문이다.The use of a depth map (i.e., a 2.5D data set) is advantageous because it improves the accuracy and robustness of the pose estimation step.

가시광선 데이터의 이용이 이로운데, 이는 그것이 예를 들면, 스킨 컬러 또는 텍스처에 의존하는 피쳐들의 검출을 허용하기 때문이다.The use of visible light data is advantageous because it allows detection of features that depend on, for example, skin color or texture.

근적외선(NIR) 이미지 데이터의 이용이 이로운데, 이는 그것이 가시광선 데이터보다 조명 조건들에 덜 의존하기 때문이다.The use of near infrared (NIR) image data is advantageous because it is less dependent on illumination conditions than visible light data.

머리 포즈는 깊이 맵의 방향을 추정하기 위해, 기존의 3D 모델에 따라 깊이 맵을 맞춤으로써 추정된다. 이 포즈 추정은 특히 강건하다.The head pose is estimated by fitting the depth map according to the existing 3D model in order to estimate the direction of the depth map. This pose estimation is particularly robust.

3D 모델은 일반 모델 즉, 이용자 독립 모델일 수 있다.The 3D model may be a general model, that is, a user-independent model.

3D 모델은 이용자 의존 모델 예를 들면, 신원이 검증될 필요가 있는 이용자의 머리의 3D 모델일 수 있다.The 3D model may be a user-dependent model, e.g., a 3D model of the user's head whose identity needs to be verified.

3D 모델은 성별 특정 모델, 인종 특정 모델, 또는 나이 특정 모델이고, 성별, 인종성 및/또는 나이의 선험 지식에 기초하여 선택될 수 있다.The 3D model may be a sex-specific model, a race-specific model, or an age-specific model and may be selected based on a priori knowledge of gender, race and / or age.

기존의 3D 모델은 얼굴의 포즈를 추정하기 위해 이용된다. 그러나, 2D 이미지 데이터는 이 3D 모델 상에 매핑되지 않고, 깊이 맵 상에 매핑된다.The existing 3D model is used to estimate the pose of the face. However, the 2D image data is not mapped onto this 3D model, but is mapped onto the depth map.

방법은 이미지를 분류하는 예를 들면, 2D 투영된 이미지를 분류하는 또 다른 단계를 포함할 수 있다.The method may include, for example, another step of classifying the image, classifying the 2D projected image.

분류는 얼굴 인증, 얼굴 식별, 성별 추정, 나이 추정, 및/또는 다른 얼굴 피쳐들의 검출을 포함할 수 있다.Classification may include face authentication, face identification, gender estimation, age estimation, and / or detection of other facial features.

방법은 2D 투영된 이미지를 또한 프로세싱하는 단계를 포함할 수 있다.The method may also include processing the 2D projected image.

획득 단계는 깊이 센서에 의해 생성된 노이즈(noise)를 제거하기 위해, 깊이 맵에서 포인트들의 시간적 및/또는 공간적 스무딩을 포함할 수 있다.The acquiring step may include temporal and / or spatial smoothing of the points in the depth map to remove noise generated by the depth sensor.

포즈를 추정하는 단계는 예를 들면, 랜덤 회귀 포레스트 방법에 기초하여 개략적인 포즈 추정을 수행하는 제 1 단계를 포함할 수 있다.Estimating the pose may include, for example, a first step of performing a rough pose estimation based on the random regression forest method.

포즈를 추정하는 단계는 미세 포즈 추정의 제 2 단계를 포함할 수 있다. 미세 포즈 추정은 개략적인 포즈 추정의 결과에 기초할 수 있다.The step of estimating the pose may include the second step of estimating the fine pose. The fine pose estimation may be based on the outline of the pose estimation.

미세 포즈 추정은 엄격한 반복 최근접점(Iterative Closest Point; ICP) 방법들을 이용하여 예를 들면, 3D 모델에 따른 깊이 맵의 정렬에 기초할 수 있다.The fine pose estimation can be based on the alignment of depth maps according to the 3D model, for example, using strict iterative close point (ICP) methods.

방법은 얼굴에 속하지 않는 2D 근적외선 이미지의, 및/또는 상기 2D 가시광선 이미지의, 및/또는 상기 깊이 맵의 적어도 일부 부분들을 제거하기 위해, 상기 포즈 추정 전에 기본 얼굴 검출 단계를 더 포함할 수 있다.The method may further comprise a basic face detection step prior to the pose estimation, to remove at least some portions of a 2D near-infrared image that does not belong to a face, and / or of the 2D visible light image, and / or the depth map .

방법은 배경에 속하지 않는 상기 2D 근적외선 이미지의, 및/또는 상기 2D 가시광선 이미지의, 및/또는 상기 2D 근적외선의 부분들을 제거하기 위해 배경 추출 단계를 더 포함할 수 있다.The method may further comprise a background extracting step to remove portions of the 2D near-infrared image that do not belong to the background, and / or portions of the 2D visible light image, and / or the 2D near-infrared light.

머리의 기존의 3D 모델에 따라 깊이 맵을 정렬하는 단계는 깊이 맵의 크기를 조정하는(scaling) 단계를 포함할 수 있어서, 그것의 면적 중 일부(예를 들면, 최대 높이)가 3D 모델의 대응하는 면적과 매칭한다.The step of aligning the depth map according to the existing 3D model of the head may include scaling the depth map such that some of its area (e.g., maximum height) And the area to be matched.

머리의 기존의 3D 모델에 따라 깊이 맵을 맞추는 단계는 깊이 맵 및/또는 3D 모델을 휘게하는 단계를 포함할 수 있다.Matching the depth map according to the existing 3D model of the head may include warping the depth map and / or the 3D model.

방법은 2D 근적외선 이미지 데이터세트에 기초하여 2D 가시광선 이미지 데이터세트의 조명을 정정하는 또 다른 단계를 포함할 수 있다. 2D 근적외선 이미지 데이터세트에서 나타나지 않는 2D 가시광선 이미지 데이터세트에서의 그림자 또는 밝은 존들이 따라서 정정될 수 있다.The method may include another step of correcting the illumination of the 2D visible light image data set based on the 2D near infrared image data set. Shadows or light zones in the 2D visible light image data set that do not appear in the 2D near infrared image data set can therefore be corrected.

방법은 깊이 맵 상에서 및/또는 2D 이미지 상에서 보이지 않는 부분들에 대응하는 상기 포즈 교정된 2D 투영된 이미지 데이터의 부분들을 표시하는(flagging) 또 다른 단계를 포함할 수 있다.The method may include another step of flagging portions of the pose corrected 2D projected image data corresponding to portions that are not visible on the depth map and / or on the 2D image.

방법은 깊이 맵의 공지되지 않은 부분들에 대응하는 포즈 교정된 2D 투영된 이미지 데이터의 부분들을 재구성하는 또 다른 단계를 포함할 수 있다.The method may include another step of reconstructing portions of the pose corrected 2D projected image data corresponding to the unknown portions of the depth map.

본 발명은 예로서 주어진 그리고 도면들에 의해 도시된 일 실시예의 설명의 도움으로 더 양호하게 이해될 것이다.The invention will be better understood with the aid of the description of an embodiment given by way of example and illustrated by the figures.

도 1은 본 발명의 방법의 흐름도.
도 2는 본 발명에 따른 장치의 개략도.
도 3은 얼굴의 2D 가시광선 이미지의 일례를 도시한 도면.
도 4는 도 3의 얼굴의 2D 적외선 이미지의 일례를 도시한 도면.
도 5는 도 3의 얼굴의 깊이 맵의 표현의 일례를 도시한 도면.
도 6은 도 3의 얼굴의 텍스처된 깊이 맵의 표현의 일례를 도시한 도면.
도 7은 머리 포즈 추정을 위해 이용된 일반 머리 모델의 일례를 도시한 도면.
도 8은 3D 모델 내에 깊이 맵을 정렬함으로써 미세 포즈 추정 단계를 도시한 도면.
도 9는 깊이 맵의 포즈 교정된 2D 투영을 도시한 도면이고, 깊이 맵에서 존재하지 않는 머리의 부분들에 대응하는 누락 부분들이 표시된다.
도 10은 2.5D 데이터세트의 포즈 교정된 2D 투영을 도시한 도면이고, 깊이 맵에서 존재하지 않는 머리의 부분들에 대응하는 누락 부분들이 재구성된다.
도 11은 2.5D 데이터세트의 포즈 교정된 2D 투영을 도시한 도면이고, 2.5D에서 존재하지 않는 머리의 부분들에 대응하는 누락 부분들이 재구성되고 표시된다.1 is a flow chart of a method of the present invention;
2 is a schematic view of an apparatus according to the invention;
3 shows an example of a 2D visible light image of a face;
Fig. 4 shows an example of a 2D infrared image of the face of Fig. 3; Fig.
5 is a view showing an example of a representation of a depth map of the face of FIG. 3;
Figure 6 shows an example of a representation of a textured depth map of the face of Figure 3;
7 shows an example of a general head model used for head pose estimation;
Fig. 8 is a diagram showing a fine pose estimation step by aligning a depth map in a 3D model; Fig.
9 is a diagram showing a pose corrected 2D projection of a depth map, with missing portions corresponding to parts of the head that do not exist in the depth map.
Figure 10 shows a pose corrected 2D projection of a 2.5D data set, with missing portions corresponding to portions of the head not present in the depth map reconstructed.
Figure 11 shows a pose corrected 2D projection of a 2.5D data set, with missing portions corresponding to parts of the head not present at 2.5D being reconstructed and displayed.

첨부된 도면들과 관련하여 하기에 제시된 상세한 설명은 본 발명의 다양한 실시예들의 설명으로서 의도되고 본 개시가 실행될 수 있는 유일한 양태들을 표현하도록 의도되지 않는다. 이 개시에서 설명된 각각의 양태는 단지 본 발명의 일례 또는 예시로서 제공되고, 반드시 바람직하거나 필수적인 것으로서 해석되는 것은 아니다.The following detailed description, taken in conjunction with the accompanying drawings, is intended as a description of various embodiments of the invention and is not intended to represent the only aspects in which the disclosure may be practiced. Each aspect described in this disclosure is only provided as an example or illustration of the present invention and is not necessarily to be construed as preferred or essential.

도 1은 본 발명에 따른 포즈 교정 방법의 일례의 주요 단계들을 개략적으로 도시하는 흐름도이다. 이 방법은 도 2 상에 블록도로서 개략적으로 도시된 장치 또는 시스템으로 실행될 수 있다. 이 예에서, 장치는 이용자(100)의 이미지를 캡처하기 위한 카메라(101)를 포함한다. 카메라(101)는 제한 없이 키넥트 카메라(마이크로소프트의 상표), 비행 시간(time-of-flight) 카메라, 또는 RGB-D 데이터 스트림을 생성할 수 있는 임의의 다른 카메라와 같은, 깊이 카메라일 수 있다.1 is a flow chart schematically illustrating key steps of an example of a pause correcting method according to the present invention. This method can be implemented as an apparatus or system as schematically shown in block diagram on Fig. In this example, the device includes a camera 101 for capturing an image of the user 100. [ The camera 101 may be a depth camera, such as a Kinect camera (trademark of Microsoft), a time-of-flight camera, or any other camera capable of generating an RGB-D data stream. have.

카메라(101)는 메모리(104)에 액세스하는 프로세서(102)에 접속되고 네트워크 인터페이스(103)를 통해 네트워크에 접속된다. 메모리(104)는 프로세서로 하여금 도 1의 방법의 적어도 일부 단계들을 실행하게 하는 컴퓨터 코드를 저장하기 위한 영구적 메모리 부분을 포함할 수 있다. 메모리(104), 또는 메모리(104)의 부분들은 제거가능할 수 있다.The camera 101 is connected to the processor 102 that accesses the memory 104 and is connected to the network via the network interface 103. [ The memory 104 may include a permanent memory portion for storing the computer code that causes the processor to execute at least some of the steps of the method of FIG. Memory 104, or portions of memory 104, may be removable.

본 명세서에서 이용된 바와 같이, 장치(101+102+104)는 모바일 폰, 개인용 네비게이션 디바이스, 개인 정보 관리자(PIM), 자동차 장비, 게이밍 디바이스, 개인 휴대용 정보 단말기(PDA), 랩탑, 태블릿, 노트북 및/또는 핸드헬드 컴퓨터, 스마트 글래스, 스마트 시계, 스마트TV, 다른 웨어러블 디바이스, 등의 형태를 취할 수 있다.As used herein, device 101 + 102 + 104 may be a mobile phone, a personal navigation device, a personal information manager (PIM), a car device, a gaming device, a personal digital assistant (PDA), a laptop, And / or a handheld computer, a smart glass, a smart clock, a smart TV, other wearable devices, and the like.

도 1의 획득 단계(A)에서, 테스트 비디오 스트림은 식별되거나 그렇지 않으면 분류될 이용자(100)의 테스트 이미지들을 캡처하기 위해 깊이 카메라(101)에 의해 생성된다.In the acquisition step (A) of FIG. 1, the test video stream is generated by the depth camera 101 to capture test images of the user 100 to be identified or otherwise classified.

본 설명에서, 표현("테스트 이미지들")은 포즈가 전형적으로, (식별 또는 인증을 위한) 테스트 동안 뿐만 아니라, 등록 동안 교정될 필요가 있는 페이스의 이미지들을 지정한다.In the present description, the expressions ("test images") specify that the pose typically includes images of the face that need to be calibrated during registration, as well as during testing (for identification or authentication).

테스트 비디오 스트림의 각각의 프레임은 바람직하게, 3개의 시간적으로 그리고 공간적으로 정렬된 데이터세트들을 포함한다:Each frame of the test video stream preferably contains three temporally and spatially aligned data sets:

i) (예를 들면, 그레이스케일(grayscale) 또는 RGB 이미지와 같은) 이용자(100)의 얼굴의 2차원(2D) 가시광선 이미지에 대응하는 제 1(선택적) 데이터세트. 하나의 예가 도 3 상에 도시된다.i) a first (optional) data set corresponding to a two-dimensional (2D) visible light image of a face of a user 100 (e.g., a grayscale or RGB image) One example is shown in Figure 3.

ii) 이용자(100)의 얼굴의 2D 근적외선(NIR) 이미지를 표현하는 제2(선택적) 데이터세트. 하나의 예가 도 4 상에 도시된다.ii) a second (optional) data set representing a 2D Near Infrared (NIR) image of the face of the user 100; One example is shown in Fig.

iii) 각각의 픽셀과 연관된 값이 발광원의 깊이 즉, 카메라(101)에서 깊이 센서까지의 그것의 거리에 의존하는 깊이 맵(즉, 2.5D 데이터세트). 이러한 깊이 맵의 표현은 도 5 상에 도시된다.iii) a depth map (i.e., a 2.5D data set) in which the value associated with each pixel is dependent on the depth of the source of light, i.e., its distance from the camera 101 to the depth sensor. This representation of the depth map is shown in Fig.

도 6은 프레임의 또 다른 표현(201)이고, 여기서 제 1 RGB 데이터세트가 깊이 맵 상에 투영된다.Figure 6 is another representation (201) of a frame, wherein the first set of RGB data is projected onto a depth map.

도 5 상에서 보여질 수 있는 바와 같이, 많은 저 비용 깊이 센서들은 노이즈 깊이 맵들 즉, 각각의 포인트에 할당된 깊이 값이 노이즈를 포함하는 데이터세트들을 생성한다. 노이즈의 해로운 영향은 깊이 맵을 스무딩함으로써 감소될 수 있다. 스무딩은 공간 및/또는 시간(연속적인 프레임들에 걸친) 도메인에서 깊이 맵의 저역 필터링을 포함할 수 있다.As can be seen in Figure 5, many low cost depth sensors produce noise depth maps, i.e., data sets with depth values assigned to each point, including noise. The deleterious effects of noise can be reduced by smoothing the depth map. Smoothing can include low-pass filtering of the depth map in space and / or time (across consecutive frames) domain.

도 1의 기본 얼굴 검출 단계(B)에서, 이용자 얼굴을 표현하는 각각의 데이터세트의 부분이 검출되고 이미지의 배경 및 다른 요소들로부터 분리된다. 이 검출은 3개의 데이터세트들 중 임의의 데이터세트 또는 그들 전부에 관해 수행되고 그들 데이터세트들 중 임의의 데이터세트 또는 그들 전부에 적용될 수 있다.In the basic face detection step (B) of Figure 1, a portion of each data set representing the user's face is detected and separated from the background and other elements of the image. This detection may be performed on any or all of the three data sets and may be applied to any or all of the data sets of those data sets.

하나의 실시예에서, 이 기본 얼굴 검출은 미리 정의된 깊이 범위 예를 들면, 20cm와 100cm 사이에 있지 않은 픽셀들을 배제하기 위해, 깊이 맵의 경계화(thresholding)에 적어도 부분적으로 기초한다. 다른 공지된 알고리즘들은 이용자 얼굴을 표현하는 배경을 추출하고, 배경을 배제하기 위해 이용될 수 있고, 이들은 예를 들면, 컬러 검출에 기초한 알고리즘들을 포함한다.In one embodiment, the basic face detection is based at least in part on thresholding of the depth map to exclude pixels that are not between a predefined depth range, e.g., 20 cm and 100 cm. Other known algorithms may be used to extract the background representing the user's face and to exclude the backgrounds, which include algorithms based on color detection, for example.

도 1의 머리 포즈 추정 단계(C)에서, 프레임에서의 이용자의 포즈가 추정된다. 하나의 예에서, 이 추정은 2개이 연속적인 단계들로 수행된다:In the hair pose estimating step (C) of Fig. 1, the user's pose in the frame is estimated. In one example, this estimation is performed in two consecutive steps:

머리 포즈 추정 단계의 제 1 부분 동안, 머리 포즈의 개략적인 추정치가 결정된다. 하나의 예에서, 이 개략적인 추정치의 산출은 몇몇 정확도들로 머리 포즈를 빠르게 결정하기 위해, 랜덤 포레스트 알고리즘을 이용한다. 랜덤 회귀 포레스트를 통한 개략적인 머리 포즈 추정 방법은 지. 파넬리(G. Fanelli), 등에 의한, "랜덤 회귀 포레스트들을 통한 실시간 머리 포즈 추정(Real Time Head Pose Estimation with Random Regression Forests)", 컴퓨터 비전 및 패턴 인식(CVPR), 2011 IEEE 회의, 617 내지 624. 바람직하게, 코의 및/또는 얼굴의 다른 키 피쳐들의 위치는 또한, 이 단계 동안 결정된다.During a first portion of the head pose estimation step, a rough estimate of the head pose is determined. In one example, the calculation of this approximate estimate uses a random forest algorithm to quickly determine head pose with some accuracy. A rough estimation method of head pose with random regressive forest &Quot; Real Time Head Pose Estimation with Random Regression Forests ", Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference, 617-624, by G. Fanelli, Preferably, the position of the other key features of the nose and / or face is also determined during this step.

머리 포즈 추정의 제 2 단계 동안, 머리 포즈의 더 미세한 추정치가 결정된다. 이 미세한 추정치는 속도 및 강건성을 개선하기 위해, 방향의 그리고 코 위치와 같은, 하나의 포인트의 위치의 이전에 결정된 개략적인 추정치로부터 시작할 수 있다. 미세한 추정치는 깊이 맵에 최상으로 대응하는 머리의 3D 모델(도 7)의 방향을 결정함으로써 컴퓨팅될 수 있다. 하나의 예에서, 테스트 깊이 맵에 대한 3D 모델의 매칭은 예를 들면, 반복 최근접점(ICP) 방법을 이용하여 깊이 맵의 포인트들과 3D 모델의 대응하는 포인트들 사이의 거리 함수를 최소화함으로써 수행될 수 있다. 도 8은 3D 모델(200)(이 예시에서 메시(mesh)로서 표현됨) 내에 테스트 깊이 맵(201)(이 예시에서 텍스처됨) 맞추는 이 단계를 개략적으로 도시한다.During the second phase of head pose estimation, a finer estimate of the head pose is determined. This fine estimate can start from a previously determined approximate estimate of the position of one point, such as direction and nose position, to improve speed and robustness. The fine estimate can be computed by determining the orientation of the 3D model of the head (Figure 7) corresponding best to the depth map. In one example, the matching of the 3D model to the test depth map is performed, for example, by minimizing the distance function between the points of the depth map and corresponding points of the 3D model using an iterative close contact (ICP) method . FIG. 8 schematically illustrates this step of fitting a test depth map 201 (textured in this example) into a 3D model 200 (represented in this example as a mesh).

이 정렬 단계는 바람직하게, 3D 모델의 조정을 포함하여, 그것의 면적의 적어도 일부가 테스트 깊이 맵에 대응한다.This alignment step preferably includes at least a portion of its area corresponding to the test depth map, including adjustment of the 3D model.

3D 모델(200)은 일반적, 즉 이용자 독립적일 수 있다. 대안적으로, 3D 모델은 이용자 의존적이고, 예를 들면, 이용자(100)의 가정된 신원에 기초하여 이용자 의존 3D 모델들의 데이터베이스로부터 검색될 수 있다. 복수의 이용자 독립 3D 모델들은 또한, 예를 들면, 이용자의 가정된 성별, 나이, 또는 인종성에 따라 저장되고 선택될 수 있다.The 3D model 200 may be generic, i.e., user independent. Alternatively, the 3D model is user-dependent and may be retrieved from a database of user-dependent 3D models based on, for example, the assumed identity of the user 100. The plurality of user-independent 3D models may also be stored and selected according to, for example, the assumed gender, age, or ethnicity of the user.

하나의 예에서, 개인 3D 모델은 엄격하지 않은 반복 최근접점(ICP) 방법을 이용하여 생성된다. 3D 모델은 그 다음, 머리의 변형가능한 부분들 예를 들면, 얼굴의 하부에서 일부 현실적이고 제한된 변형을 허용하기 위해, 노드들 사이의 위치 및/또는 관계들에 관한 일부 제약들을 갖는 메시를 포함할 수 있다. 이 경우에, ICP 방법은 모든 가능한 변형들을 고려해볼 때 가장 공산이 큰 방향을 발견하기 위해, 모델의 일부 변형들 또는 모프(morph)를 시도할 수 있다.In one example, a private 3D model is created using a non-stringent iterative close-contact (ICP) method. The 3D model then includes a mesh with some constraints on positions and / or relationships between the nodes, to allow for deformations of the head, e.g., some real and limited deformation at the bottom of the face . In this case, the ICP method can try some deformations or morphs of the model in order to find the direction with the greatest conjugation given all possible deformations.

머리 포즈 추정 단계의 출력은 주어진 좌표 시스템에 관해서 3개의 회전들을 설명하는 각들(피(phi), 세타(theta), 시(psi))의 세트를 포함할 수 있다.The output of the head pose estimating step may comprise a set of angles (phi, theta, po (psi)) describing three rotations for a given coordinate system.

도 1의 단계(D)에서, 프레임의 2D 텍스처들(가시적 및 NIR 범위들에서의)은 깊이 맵 상에 매핑된다(UV 매핑). 단지 가시적 데이터세트의 그레이스케일 값을 이용하는 것이 가능하다. 이 매핑은 포즈 정정 전후에 수행될 수 있고, 공지된 방향을 갖는 텍스처된 깊이 맵을 생성한다. 숨겨진 부분들 예를 들면, 깊이 맵에서 숨겨지거나 폐색된 이용자의 얼굴의 부분들은 예를 들면, 이용자의 얼굴의 대칭을 가정함으로써 유효하지 않은 것으로서 표시되거나, 재구성된다.In step (D) of Figure 1, 2D textures (in visible and NIR ranges) of the frame are mapped onto the depth map (UV mapping). It is possible to use only the gray scale value of the visible data set. This mapping can be performed before and after the pose correction, producing a textured depth map with known directions. Hidden parts For example, portions of a user's face that are hidden or obscured in the depth map are displayed or reconstructed as invalid by assuming, for example, the symmetry of the user's face.

이 단계는 또한, 가시광선 및/또는 NIR 이미지 데이터세트들에서의 조명의 정정을 포함할 수 있다. 조명의 정정은 컬러 이미지들의 경우에 밝기, 콘트라스트, 및/또는 화이트 밸런스의 정정을 포함할 수 있다. 하나의 바람직한 실시예에서, NIR 데이터세트는 가시광선 데이터세트의 부분들에서 나타나지만 NIR 데이터세트들의 대응하는 부분들에서 나타나지 않는 밝기 변화들을 보상함으로써, 그림자를 제거하거나 약화시키기 위해 이용되고/되거나 가시광선 데이터세트에서 반사한다.This step may also include correction of illumination in the visible light and / or NIR image data sets. Correction of illumination may include correction of brightness, contrast, and / or white balance in the case of color images. In one preferred embodiment, the NIR data set is used to remove or attenuate shadows by compensating for changes in brightness that are present in portions of the visible light data set but not in corresponding portions of the NIR data sets, and / Reflect in the data set.

도 1의 단계(E)에서, 텍스처된 3D 이미지는 가시적 및/또는 NIR 범위에서, 포즈 교정된 2D 투영된 이미지를 표현하는 적어도 하나의 데이터세트를 생성하기 위해 2D로 투영된다. 다양한 투영들이 고려될 수 있다. 먼저, 카메라에 의해 및/또는 원근법에 의해 야기된 변형들이 바람직하게 고려된다. 그 다음, 하나의 실시예에서, 투영은 정면 얼굴 이미지 즉, 이용자(100) 정면의 뷰어로부터 보여지는 바와 같은 2D 이미지를 생성한다. 정면이 아닌 얼굴 이미지, 또는 예를 들면, 하나의 정면 얼굴 투영 및 하나의 또 다른 프로파일 투영과 같은 복수의 2D 투영들을 생성하는 것이 또한 가능하다. 지도 투영들, 또는 얼굴의 차별적 부분들 특히, 눈, 얼굴의 상부 절반을 확대하기 위해 변형들을 도입하고, 입과 같은, 얼굴의 더 변형가능한 부분들의 크기를 감소시키는 투영들을 포함하는 다른 투영들이 고려될 수 있다. 비교를 용이하게 하기 위해, 비교 전에 머리를 일반 모델로 변형시키는 것이 또한 가능하다. 용이하게 표현가능하지 않은 공간 상으로의 투영들과 같은, 순전히 수학적 투영들이 또한 고려될 수 있다.In step (E) of Figure 1, the textured 3D image is projected in 2D to produce at least one data set representing the pose corrected 2D projected image in the visible and / or NIR range. Various projections can be considered. First, variants caused by cameras and / or by perspective are preferably considered. Then, in one embodiment, the projection produces a frontal face image, i.e., a 2D image as seen from the viewer at the front of the user 100. It is also possible to generate a plurality of 2D projections, such as facial images that are not frontal, or, for example, one frontal face projection and one other profile projection. Other projections, including projections, or projections that reduce the size of the more deformable parts of the face, such as the mouth, may be taken into consideration, especially when introducing deformations to magnify the upper half of the face, . To facilitate comparison, it is also possible to transform the head into a generic model before comparison. Purely mathematical projections, such as projections onto space that are not readily representable, can also be considered.

도 9는 단계(E) 동안 생성된 2D 텍스처된 투영(202)의 일례를 도시한다. 깊이 맵에서 이용가능하지 않은 투영의 부분들(203) 예를 들면, 숨겨지거나 폐색된 부분들은 이와 같이 표시된다.FIG. 9 illustrates an example of a 2D textured projection 202 generated during step (E). Parts of the projection 203 that are not available in the depth map, for example, the hidden or occluded portions are thus marked.

도 10은 단계(E) 동안 생성된 2D 텍스처된 투영(202)의 또 다른 예를 도시한다. 이 예에서, 깊이 맵에서 이용가능하지 않은 투영의 부분들(204) 예를 들면, 숨겨지거나 폐색된 부분들은 비 텍스처된 이미지로서 재구성된다. 재구성은 예를 들면, 이용자의 얼굴이 대칭이라고 가정함으로써 이미지의 이용가능한 부분들에 기초할 수 있다. 대안적으로, 또는 게다가, 재구성은 머리의 일반 모델을 이용할 수 있다. 대안적으로, 또는 게다가, 재구성은 동일한 비디오 시퀀스에서의 다른 프레임들로부터 이용가능한 이미지 부분 데이터를 이용할 수 있다.FIG. 10 shows another example of a 2D textured projection 202 generated during step (E). In this example, portions of the projection 204 that are not available in the depth map, for example, the hidden or occluded portions, are reconstructed as non-textured images. The reconstruction may be based on available parts of the image, for example, by assuming that the user's face is symmetrical. Alternatively, or additionally, reconstruction can utilize a generic model of the head. Alternatively, or additionally, reconstruction may utilize available image portion data from other frames in the same video sequence.

도 10은 단계(E) 동안 생성된 2D 텍스처된 투영(202)의 또 다른 예를 도시한다. 이 예에서, 깊이 맵에서 이용가능하지 않은 투영의 부분들(205) 예를 들면, 숨겨지거나 폐색된 부분들은 텍스처된 이미지로서 재구성된다. 재구성은 예를 들면, 이용자의 얼굴이 대칭이라고 가정함으로써 이미지의 이용가능한 부분들에 기초할 수 있다. 대안적으로, 또는 게다가, 재구성은 머리의 일반 모델을 이용할 수 있다. 대안적으로, 또는 게다가, 재구성은 동일한 비디오 시퀀스에서의 다른 프레임들로부터 이용가능한 이미지 부분 데이터를 이용할 수 있다.FIG. 10 shows another example of a 2D textured projection 202 generated during step (E). In this example, portions of the projection 205 that are not available in the depth map, for example, the hidden or occluded portions, are reconstructed as a textured image. The reconstruction may be based on available parts of the image, for example, by assuming that the user's face is symmetrical. Alternatively, or additionally, reconstruction can utilize a generic model of the head. Alternatively, or additionally, reconstruction may utilize available image portion data from other frames in the same video sequence.

상기 설명된 방법은 따라서, 깊이 카메라로 얻어진 2.5D 테스트 뷰에 기초하여, 이용자의 포즈 정정된 2D 테스트 이미지 데이터세트를 생성한다. 얼굴 프로세싱 단계(F) 동안, 이 데이터세트는 그 다음, 이용자 식별 또는 인증 모듈, 또는 성별 추정 모듈, 나이 추정 모듈, 등과 같은 분류 모듈에 의해 이용될 수 있다. 분류는 단일 프레임 예를 들면, 최고의 신뢰성에 따라 분류될 수 있는 프레임에, 또는 주어진 임계치보다 높은 신뢰성에 따라 분류될 수 있는 제 1 프레임에, 또는 동일한 비디오 스트림의 복수의 연속적인 프레임들에 기초할 수 있다. 부가적으로, 또는 대안적으로, 분류는 또한, 지향된 텍스처된 3D 이미지에 기초할 수 있다. 다른 얼굴 프로세싱이 단계(F) 동안 적용될 수 있다.The method described above thus generates a user-pose corrected 2D test image data set based on the 2.5D test view obtained with the depth camera. During the face processing phase (F), this data set can then be used by a classification module such as a user identification or authentication module, or a gender estimation module, an age estimation module, The classification may be based on a single frame, for example, in a frame that can be classified according to highest reliability, or in a first frame that can be classified according to a reliability higher than a given threshold, or on a plurality of successive frames of the same video stream . Additionally, or alternatively, the classification may also be based on a directed textured 3D image. Other face processing may be applied during step F. [

본 명세서에 개시된 방법들은 설명된 방법을 성취하기 위한 하나 이상의 단계들 또는 동작들을 포함한다. 단계들 또는 동작들의 특정 순서가 명시되지 않으면, 특정 단계들 및/또는 동작들의 순서 및/또는 이용은 청구항들의 범위를 벗어나지 않고 변경될 수 있다.The methods disclosed herein include one or more steps or operations for achieving the described method. The order and / or use of certain steps and / or actions may be altered without departing from the scope of the claims, unless a specific order of steps or acts is specified.

실시예에 의존하여, 본 명세서에서 설명된 방법들 중 임의의 방법의 특정 행위 또는 이벤트(event)들 또는 단계들이 상이한 시퀀스로 수행될 수 있거나, 모두 함께 부가되거나, 합쳐지거나, 배제될 수 있음이 인식될 것이다(예로서, 설명된 모든 행위가 방법의 실행을 위해 필요한 것은 아니다). 게다가, 특정 실시예들에서, 동작들 또는 이벤트들은 예로서, 멀티 쓰레딩(multi-threading)된 프로세싱, 인터럽트 프로세싱(interrupt processing), 또는 다수의 프로세서들을 통해, 순차적이기보다는 동시에 수행될 수 있다.Depending on the embodiment, certain acts or events of any of the methods described herein may be performed in different sequences, or all may be added together, merged, or excluded (E.g., not all actions described are necessary for the performance of the method). In addition, in certain embodiments, operations or events may be performed concurrently, rather than sequentially, for example, through multi-threaded processing, interrupt processing, or multiple processors.

상기 설명된 방법들의 다양한 동작들은 대응하는 기능들을 수행할 수 있는 임의의 적합한 수단에 의해 수행될 수 있다. 수단은 본 명세서에서 설명된 방법 단계들을 실행하기 위해 설계된 회로, 주문형 반도체(application specific integrate circuit; ASIC), 프로세서, 필드 프로그래밍가능한 게이트 어레이 신호(FPGA) 또는 다른 프로그래밍가능한 로직 디바이스(PLD), 별개의 게이트 또는 트랜지스터 로직, 별개의 하드웨어 구성요소들 또는 그의 임의의 조합을 포함하지만, 그들로 제한되지 않는, 다양한 하드웨어 및/또는 소프트웨어 구성요소(들) 및/또는 모듈(들)을 포함할 수 있다.The various operations of the above-described methods may be performed by any suitable means capable of performing corresponding functions. Means may comprise circuitry, an application specific integrated circuit (ASIC), a processor, a field programmable gate array signal (FPGA) or other programmable logic device (PLD) May include various hardware and / or software component (s) and / or module (s), including, but not limited to, gate or transistor logic, discrete hardware components or any combination thereof.

본 명세서에서 이용된 바와 같이, 용어들("결정하는" 및 "추정하는")은 광범위한 동작들을 포함한다. 예를 들면, "결정하는" 및 "추정하는"은 산출하는 것, 컴퓨팅하는 것, 얻는 것, 검색하는 것(예로서, 표, 데이터베이스 또는 또 다른 데이터 구조에서 검색하는 것), 확인하는 것 등을 포함할 수 있다. 또한, "결정하는" 및 "추정하는"은 수신하는 것, 액세스하는 것(예로서, 메모리의 데이터에 액세스하는 것) 등을 포함할 수 있다.As used herein, the terms ("determining" and "estimating") include broad operations. For example, "determining" and "estimating" may include computing, computing, obtaining, searching (e.g., searching in a table, database, or another data structure) . &Lt; / RTI > In addition, "determining" and "estimating" may include receiving, accessing (e.g., accessing data in memory), and the like.

본 개시와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로, 프로세서에 의해 실행된 소프트웨어 모듈로, 또는 그 둘의 조합으로 직접적으로 구현될 수 있다. 소프트웨어 모듈은 본 분야에서 공지되는 임의의 형태의 저장 매체에 상주할 수 있다. 이용될 수 있는 저장 매체들의 일부 예들은 랜덤 액세스 메모리(RAM), 판독 전용 메모리(ROM), 플래시 메모리, EPROM 메모리, EEPROM 메모리, 레지스터들, 하드 디스크, 착탈가능한 디스크, CD-ROM 등을 포함한다. 소프트웨어 모듈은 단일 지시, 또는 많은 지시들을 포함할 수 있고, 상이한 프로그램들 중에서, 몇몇 상이한 코드 세그먼트들을 통해, 그리고 다수의 저장 매체에 걸쳐 분산될 수 있다. 저장 매체는 프로세서가 저장 매체로부터 정보를 판독하고, 상기 저장 매체에 정보를 기록할 수 있도록 프로세서에 결합될 수 있다. 대안에서, 저장 매체는 프로세서와 일체형일 수 있다.The steps of a method or algorithm described in connection with the present disclosure may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may reside in any form of storage medium known in the art. Some examples of storage media that may be used include random access memory (RAM), read only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, . A software module may contain a single instruction, or many instructions, and may be distributed among different programs, through several different code segments, and across multiple storage media. The storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral with the processor.

따라서, 특정 양태들은 본 명세서에서 제시된 동작들을 수행하기 위한 컴퓨터 프로그램 제품을 포함할 수 있다. 예를 들면, 이러한 컴퓨터 프로그램 제품은 지시들이 저장된(및/또는 인코딩된) 컴퓨터 판독가능한 매체를 포함할 수 있고, 지시들은 본 명세서에서 설명된 동작들을 수행하기 위해 하나 이상의 프로세서들에 의해 실행가능하다.Accordingly, certain aspects may include a computer program product for performing the operations set forth herein. For example, such a computer program product may include a computer-readable medium having stored (and / or encoded) instructions and instructions may be executable by one or more processors to perform the operations described herein .

본 발명의 설명된 실시예들에 대한 다양한 수정들 및 변형들은 첨부된 청구항들에서 정의된 바와 같이 본 발명의 범위를 벗어나지 않고 당업자들에게 명백할 것이다. 본 발명이 특정 바람직한 실시예들과 관련하여 설명되었을지라도, 청구된 바와 같이 본 발명이 이러한 특정 실시예로 지나치게 제한되지 않아야 함이 이해되어야 한다.Various modifications and variations of the described embodiments of the invention will be apparent to those skilled in the art without departing from the scope of the invention as defined in the appended claims. Although the invention has been described in connection with certain preferred embodiments, it is to be understood that the invention is not to be unduly limited to such specific embodiments as claimed.

100: 이용자 101: 카메라
102: 프로세서
103: 네트워크 인터페이스 104: 메모리
200: 3D 모델 201: 테스트 깊이 맵
202: 2D 텍스처된 투영100: User 101: Camera
102: Processor
103: Network interface 104: Memory
200: 3D Model 201: Test Depth Map
202: 2D Textured Projection

Claims

CLAIMS What is claimed is: 1. A pose correction method for correcting a pose in data representing face (100) images, comprising:
Obtaining at least one test frame including A-2D near-infrared image data, 2D visible light image data, and a depth map;
C-estimating a pose of a face in the test frame by aligning the depth map according to a 3D model of the head of a known direction;
Mapping at least one of the 2D images onto the depth map to generate D-textured image data;
And projecting the textured image data in 2D to produce data representing an E-Pose calibrated 2D projected image.

The method according to claim 1,
And temporal and / or spatial smoothing of the points in the depth map.

3. The method according to claim 1 or 2,
The step (C) of estimating the pose includes, for example, a first step of performing a rough pose estimation based on a random forest, and another step of determining a more accurate estimation of the pose A method of correcting a pose.

4. The method according to any one of claims 1 to 3,
Wherein said step (C) of aligning said depth map according to a 3D model of a head of a known orientation uses an iterative close points (ICP) method.

5. The method according to any one of claims 1 to 4,
(C) prior to the estimation (C) of the pose to remove at least some portions of the 2D near infrared image data that do not belong to the face, and / or the 2D visible light image data, and / Further comprising step (B).

6. The method according to any one of claims 1 to 5,
Wherein the 3D model is user independent.

6. The method according to any one of claims 1 to 5,
Wherein the 3D model is user-dependent.

8. The method according to any one of claims 1 to 7,
Wherein the 3D model warps to adapt the 3D model to the user.

9. The method according to any one of claims 1 to 8,
Wherein said step (C) of aligning said depth map according to an existing 3D model of the head comprises bending said 3D model.

10. The method according to any one of claims 1 to 9,
Further comprising correcting illumination of portions of the 2D visible light image data based on the 2D near infrared image data.

11. The method according to any one of claims 1 to 10,
Further comprising: flagging portions of the pose corrected 2D projected image data corresponding to portions not visible on the depth map.

12. The method according to any one of claims 1 to 11,
Further comprising reconstructing portions of the pose corrected 2D projected image data corresponding to unknown portions of the depth map.

13. The method according to any one of claims 1 to 12,
Further comprising classifying the 2D projected image (F).

An apparatus comprising: a depth map camera (101) arranged to obtain at least one test frame comprising 2D near infrared image data, 2D visible light image data, and a depth map, as well as a depth map camera A processor having a memory for storing the program that causes the computer to execute the steps of:
The steps,
C-estimating a pose of a face in the test frame by aligning the depth map according to a 3D model of the head of a known direction;
Mapping at least one of the 2D images onto the depth map to generate D-textured image data;
And projecting the textured image data in 2D to produce data representing an E-Pose calibrated 2D projected image, the apparatus comprising:

An apparatus comprising means for performing the method of any one of claims 1 to 14.

A computer program product comprising:
Obtaining at least one test frame including A-2D near-infrared image data, 2D visible light image data, and a depth map;
C-estimating a pose of a face in said test frame by aligning said depth map according to a 3D model of a head of known direction;
Map at least one of the 2D images onto the depth map to generate D-textured image data;
And computer-readable media comprising executable instructions for projecting the textured image data in 2D to produce data representing an E-Pose calibrated 2D projected image.