KR102472767B1

KR102472767B1 - Method and apparatus of calculating depth map based on reliability

Info

Publication number: KR102472767B1
Application number: KR1020170117962A
Authority: KR
Inventors: 정휘룡; 홍성훈; 강철우
Original assignee: 삼성전자주식회사
Priority date: 2017-09-14
Filing date: 2017-09-14
Publication date: 2022-12-01
Anticipated expiration: 2037-09-14
Also published as: US20190080462A1; KR20190030474A; US11049270B2

Abstract

일 실시예에 따른 깊이 맵을 산출하는 방법 및 장치는 입력 영상을 세그먼트들로 분할하고, 세그먼트들에 대하여 산출된 신뢰도에 기초하여 세그먼트들 중 하나 이상의 세그먼트를 선택하고, 선택된 세그먼트를 이용하여 입력 영상에 대한 카메라의 포즈 정보를 추정하며, 카메라의 포즈 정보를 이용하여 입력 영상의 깊이 맵을 산출한다.A method and apparatus for calculating a depth map according to an embodiment divides an input image into segments, selects one or more of the segments based on reliability calculated for the segments, and uses the selected segments in the input image. Camera pose information is estimated for , and a depth map of the input image is calculated using the camera pose information.

Description

Method and apparatus for calculating depth map based on reliability

아래 실시예들은 신뢰도에 기초하여 깊이 맵을 산출하는 방법 및 장치에 관한 것이다.The embodiments below relate to a method and apparatus for calculating a depth map based on reliability.

2차원 입력 영상은 카메라의 위치 추정 및 깊이 추정(depth estimation)을 통해 3차원 영상으로 재구성(reconstruction)될 수 있다. 카메라의 위치 추정 및 깊이 추정에는 예를 들어, 대상의 움직임에 의한 운동으로부터 발생하는 정보를 통해 대상의 구조를 파악하는 SFM(Structure From Motion) 기법, 이동 중인 카메라의 위치를 측정하면서 동시에 주변 환경의 지도를 작성하는 SLAM(Simultaneous Localization and Mapping) 기법, 카메라 영상을 분석하여 위치 및 자세를 결정하는 시각적 주행 거리(Visual Odometry; VO) 측정 기법 등이 이용될 수 있다.A 2D input image may be reconstructed into a 3D image through camera position estimation and depth estimation. Camera position estimation and depth estimation include, for example, the Structure From Motion (SFM) technique that identifies the structure of an object through information generated from the movement of the object, and the position of the moving camera while simultaneously measuring the surrounding environment. A Simultaneous Localization and Mapping (SLAM) technique for creating a map, a Visual Odometry (VO) measurement technique for determining a position and attitude by analyzing a camera image, and the like may be used.

전술한 기법들에 따르면 영상 내에서 추적하고자 하는 대상 객체(target object)가 아닌 대상의 영역에 대한 반복적인 선택을 통한 오류 발생, 및/또는 이동 객체의 추적 등에 의해 연산 자원(computational resource)의 불필요한 손실이 발생할 수 있다.According to the techniques described above, unnecessary computational resources are required due to error generation through repetitive selection of an area of an object other than a target object to be tracked in an image and/or tracking of a moving object. losses may occur.

일 실시예에 따르면, 깊이 맵을 산출하는 방법은 입력 영상을 세그먼트들(segments)로 분할하는 단계; 상기 세그먼트들의 신뢰도를 산출하는 단계; 상기 신뢰도에 기초하여 상기 세그먼트들 중 하나 이상의 세그먼트를 선택하는 단계; 상기 선택된 세그먼트를 이용하여 상기 입력 영상에 대한 카메라의 포즈 정보를 추정하는 단계; 및 상기 카메라의 포즈 정보를 이용하여 상기 입력 영상의 깊이 맵을 산출하는 단계를 포함한다. According to an embodiment, a method of calculating a depth map includes dividing an input image into segments; calculating reliability of the segments; selecting one or more of the segments based on the confidence level; estimating camera pose information for the input image using the selected segment; and calculating a depth map of the input image using pose information of the camera.

상기 분할하는 단계는 상기 입력 영상에 포함된 객체를 의미(semantic) 단위로 구분하여 상기 입력 영상을 의미 세그먼트들로 분할하는 단계; 및 상기 입력 영상의 깊이 값에 기초하여 상기 입력 영상을 깊이 세그먼트들로 분할하는 단계 중 어느 하나 또는 이들의 조합을 포함할 수 있다. The dividing may include dividing the input image into semantic segments by classifying objects included in the input image into semantic units; and dividing the input image into depth segments based on a depth value of the input image, or a combination thereof.

상기 신뢰도를 산출하는 단계는 상기 의미 세그먼트들에 대한 제1 신뢰도를 산출하는 단계; 및 상기 깊이 세그먼트들에 대한 제2 신뢰도를 산출하는 단계 중 어느 하나 또는 이들의 조합을 포함할 수 있다. The calculating of the reliability may include calculating a first reliability of the semantic segments; and calculating the second reliability of the depth segments, or a combination thereof.

상기 제1 신뢰도를 산출하는 단계는 상기 입력 영상에 포함된 객체가 이동 객체(moving object)인지 여부에 기초하여 상기 의미 세그먼트들에 대한 제1 신뢰도를 산출하는 단계를 포함할 수 있다. The calculating of the first reliability may include calculating the first reliability of the semantic segments based on whether an object included in the input image is a moving object.

상기 제1 신뢰도를 산출하는 단계는 상기 객체가 이동 객체인 경우, 상기 이동 객체에 대응하는 의미 세그먼트에 대한 제1 신뢰도를 제1 값으로 결정하는 단계; 및 상기 객체가 고정 객체인 경우, 상기 고정 객체에 대응하는 의미 세그먼트에 대한 제1 신뢰도를 제2 값으로 결정하는 단계를 포함할 수 있다.The calculating of the first reliability may include determining a first reliability for a semantic segment corresponding to the moving object as a first value when the object is a moving object; and determining a first reliability of a semantic segment corresponding to the fixed object as a second value when the object is a fixed object.

상기 신뢰도를 산출하는 단계는 상기 제1 신뢰도 및 상기 제2 신뢰도를 융합(fusion)하는 단계; 및 상기 융합한 신뢰도를 상기 세그먼트들에 대한 신뢰도로 결정하는 단계를 포함할 수 있다. The calculating of the reliability may include fusion of the first reliability and the second reliability; and determining the fused reliability as the reliability of the segments.

상기 깊이 맵을 산출하는 방법은 상기 융합된 신뢰도에 기초하여 상기 선택된 세그먼트로부터 픽셀들을 선택하는 단계를 더 포함하고, 상기 카메라의 포즈 정보를 추정하는 단계는 상기 선택된 픽셀들로부터의 상기 카메라의 포즈 정보를 추정하는 단계를 포함할 수 있다. The method of calculating the depth map further comprises selecting pixels from the selected segment based on the fused reliability, wherein the estimating the pose information of the camera comprises the pose information of the camera from the selected pixels. It may include estimating.

상기 선택된 세그먼트로부터 픽셀들을 선택하는 단계는 상기 융합된 신뢰도에 비례하도록 상기 선택된 세그먼트로부터 픽셀들을 선택하는 단계를 포함할 수 있다. Selecting pixels from the selected segment may include selecting pixels from the selected segment to be proportional to the fused reliability.

상기 입력 영상은 프레임들을 포함하고, 상기 신뢰도를 산출하는 단계는 상기 프레임들 중 키 프레임(key frame)마다에 대하여 상기 세그먼트들의 신뢰도를 산출하는 단계를 포함할 수 있다. The input image may include frames, and calculating the reliability may include calculating reliability of the segments for each key frame among the frames.

상기 카메라의 포즈 정보를 추정하는 단계는 상기 선택된 세그먼트에 대하여 비용 함수(cost function)를 적용하여 상기 카메라의 포즈 정보를 추정하는 단계를 포함할 수 있다. Estimating the pose information of the camera may include estimating the pose information of the camera by applying a cost function to the selected segment.

일 실시예에 따르면, 깊이 맵을 산출하는 장치는 입력 영상을 획득하는 카메라; 및 상기 입력 영상을 세그먼트들로 분할하고, 상기 세그먼트들에 대하여 산출된 신뢰도에 기초하여 상기 세그먼트들 중 하나 이상의 세그먼트를 선택하고, 상기 선택된 세그먼트를 이용하여 상기 입력 영상에 대한 카메라의 포즈 정보를 추정하며, 상기 카메라의 포즈 정보를 이용하여 상기 입력 영상의 깊이 맵을 산출하는 프로세서를 포함한다. According to an embodiment, an apparatus for calculating a depth map includes a camera acquiring an input image; and dividing the input image into segments, selecting one or more of the segments based on reliability calculated for the segments, and estimating camera pose information for the input image using the selected segments. and a processor for calculating a depth map of the input image using pose information of the camera.

상기 프로세서는 상기 입력 영상에 포함된 객체를 의미 단위로 구분하여 상기 입력 영상을 의미 세그먼트들로 분할하거나, 또는 상기 입력 영상의 깊이 값에 기초하여 상기 입력 영상을 깊이 세그먼트들로 분할하거나, 또는 상기 입력 영상을 상기 의미 세그먼트들 및 상기 깊이 세그먼트들로 분할할 수 있다. The processor classifies objects included in the input image into semantic units and divides the input image into semantic segments, or divides the input image into depth segments based on a depth value of the input image, or An input image may be divided into the semantic segments and the depth segments.

상기 프로세서는 상기 의미 세그먼트들에 대한 제1 신뢰도를 산출하거나, 또는 상기 깊이 세그먼트들에 대한 제2 신뢰도를 산출하거나, 또는 상기 제1 신뢰도 및 상기 제2 신뢰도를 산출할 수 있다.The processor may calculate a first reliability level for the semantic segments, a second reliability level for the depth segments, or calculate the first reliability level and the second reliability level.

상기 프로세서는 상기 입력 영상에 포함된 객체가 이동 객체인지 여부에 기초하여 상기 의미 세그먼트들에 대한 제1 신뢰도를 산출할 수 있다. The processor may calculate a first reliability of the semantic segments based on whether an object included in the input image is a moving object.

상기 프로세서는 상기 제1 신뢰도 및 상기 제2 신뢰도를 융합하고, 상기 융합한 신뢰도를 상기 세그먼트들에 대한 신뢰도로 결정할 수 있다. The processor may fuse the first reliability level and the second reliability level, and determine the fused reliability level as a reliability level for the segments.

상기 프로세서는 상기 융합된 신뢰도에 기초하여 상기 선택된 세그먼트로부터 픽셀들을 선택하고, 상기 선택된 픽셀들로부터의 상기 카메라의 포즈 정보를 추정할 수 있다. The processor may select pixels from the selected segment based on the fused reliability, and estimate pose information of the camera from the selected pixels.

상기 프로세서는 상기 융합된 신뢰도에 비례하도록 상기 선택된 세그먼트로부터 픽셀들을 선택할 수 있다. The processor may select pixels from the selected segment proportional to the fused reliability.

상기 입력 영상은 프레임들을 포함하고, 상기 프로세서는 상기 프레임들 중 키 프레임마다에 대하여 상기 세그먼트들의 신뢰도를 산출할 수 있다.The input image may include frames, and the processor may calculate reliability of the segments for each key frame among the frames.

상기 프로세서는 상기 선택된 세그먼트에 대하여 비용 함수를 적용하여 상기 카메라의 포즈 정보를 추정할 수 있다.The processor may estimate pose information of the camera by applying a cost function to the selected segment.

도 1은 일 실시예에 따른 깊이 맵을 산출하는 방법을 나타낸 흐름도.
도 2는 일 실시예에 따라 신뢰도를 산출하는 방법을 나타낸 흐름도.
도 3은 일 실시예에 따라 하나 이상의 세그먼트를 선택하는 방법을 설명하기 위한 도면.
도 4는 일 실시예에 따라 카메라의 포즈 정보를 추정하는 방법을 나타낸 흐름도.
도 5는 일 실시예에 따른 깊이 맵을 산출하는 장치의 동작을 설명하기 위한 도면이다.
도 6는 일 실시예에 따른 깊이 맵을 산출하는 장치의 블록도.1 is a flowchart illustrating a method of calculating a depth map according to an exemplary embodiment;
2 is a flowchart illustrating a method of calculating reliability according to an exemplary embodiment;
3 is a diagram for explaining a method of selecting one or more segments according to an exemplary embodiment;
4 is a flowchart illustrating a method of estimating pose information of a camera according to an exemplary embodiment;
5 is a diagram for explaining an operation of an apparatus for calculating a depth map according to an exemplary embodiment.
6 is a block diagram of an apparatus for calculating a depth map according to an exemplary embodiment;

본 명세서에 개시되어 있는 특정한 구조적 또는 기능적 설명들은 단지 기술적 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것이다. 개시된 실시예들은 다양한 다른 형태로 변경되어 실시될 수 있으며 본 명세서의 범위는 개시된 실시예들에 한정되지 않는다.Specific structural or functional descriptions disclosed in this specification are only illustrated for the purpose of describing embodiments according to technical concepts. The disclosed embodiments may be modified and implemented in various other forms, and the scope of the present specification is not limited to the disclosed embodiments.

제1 또는 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but such terms should only be interpreted for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.It should be understood that when an element is referred to as being “connected” to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or numbers, It should be understood that the presence or addition of steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this specification, it should not be interpreted in an ideal or excessively formal meaning. don't

하기에서 설명될 실시예들은 다양한 증강 현실(Augmented Reality; AR) 응용 분야들에서 입력 영상의 3차원 장면을 재구성하기 위한 깊이 값을 추정하는 데에 이용될 수 있다. 실시예들은 깊이 카메라 등의 부가적인 하드웨어 구성이 없이도 하나의 카메라에서 획득되는 영상들에 의해 빠른 시간 내에 조밀한 깊이 맵을 생성할 수 있다. 실시예들은 예를 들어, 증강 현실 헤드-업 디스플레이(Head Up Display; HUD), 증강 현실/가상 현실 글래스(Virtual Reality(VR) Glass), 자율 주행 자동차, 지능형 자동차, 스마트 폰, 및 모바일 기기 등에 실시간으로 증강 현실 어플리케이션들을 구현하는 데에 적용될 수 있다. 실시예들은 예를 들어, 헤드-업 디스플레이에서의 주행 영상과 가상 객체 간의 정확한 정합을 위한 카메라 포즈 트래킹(camera pose tracking) 및 깊이 재구성(depth reconstruction)에 적용될 수 있다. 실시예들은 모바일 플랫폼(mobile platform)에서의 스마트 폰, 증강 현실/가상 현실 기기의 정합 및 3차원 영상 재구성에 적용될 수 있다. 또는 실시예들은 드론(drone), 로봇(robot), 자율 주행 자동차에서 비전(vision) 기술을 이용한 자세 제어에 적용될 수 있다. 실시예들은 칩(chip) 형태로 구현되어 자동차의 차량용 인포테인먼트(In-Vehicle Infotainment; IVI), 첨단 운전자 지원 시스템(Advanced Driver Assistance Systems; ADAS), 스마트 폰, 증강 현실/가상 현실 기기 등에 탑재될 수 있다. 이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Embodiments to be described below may be used to estimate a depth value for reconstructing a 3D scene of an input image in various augmented reality (AR) application fields. In the embodiments, a dense depth map can be generated within a short period of time using images obtained from a single camera without additional hardware configuration such as a depth camera. Embodiments include, for example, augmented reality head-up displays (HUDs), augmented reality/virtual reality (VR) glasses, autonomous vehicles, intelligent vehicles, smart phones, and mobile devices, etc. It can be applied to implement augmented reality applications in real time. The embodiments may be applied to, for example, camera pose tracking and depth reconstruction for accurate matching between driving images and virtual objects in a head-up display. Embodiments can be applied to matching and 3D image reconstruction of a smart phone, augmented reality/virtual reality device on a mobile platform. Alternatively, the embodiments may be applied to posture control using vision technology in drones, robots, and self-driving cars. Embodiments can be implemented in the form of a chip and mounted on in-vehicle infotainment (IVI), advanced driver assistance systems (ADAS), smart phones, augmented reality/virtual reality devices, etc. have. Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Like reference numerals in each figure indicate like elements.

도 1은 일 실시예에 따른 깊이 맵을 산출하는 방법을 나타낸 흐름도이다. 도 1을 참조하면, 일 실시예에 따른 깊이 맵을 산출하는 장치(이하, '산출 장치')는 입력 영상을 세그먼트들(segments)로 분할한다(110). 입력 영상은 산출 장치에 입력되는 영상으로, 예를 들어, 실시간 영상(live image) 또는 동영상(moving picture)일 수 있다. 또는 입력 영상은 모노 영상일 수도 있고, 스테레오 영상일 수도 있다. 입력 영상은 복수의 프레임들을 포함할 수 있다. 입력 영상은 산출 장치에 포함된 카메라(예를 들어, 도 6에 도시된 카메라(610))를 통해 캡쳐된 것일 수도 있고, 산출 장치의 외부로부터 획득된 것일 수도 있다.1 is a flowchart illustrating a method of calculating a depth map according to an exemplary embodiment. Referring to FIG. 1 , an apparatus for calculating a depth map according to an exemplary embodiment (hereinafter referred to as a 'calculating apparatus') divides an input image into segments (110). The input image is an image input to the calculation device, and may be, for example, a live image or a moving picture. Alternatively, the input image may be a mono image or a stereo image. An input image may include a plurality of frames. The input image may be captured through a camera included in the calculation device (eg, the camera 610 shown in FIG. 6 ) or may be acquired from an external source of the calculation device.

세그먼트들은 입력 영상을 일정 기준에 따라 구분 또는 분할한 일부 영역들(areas)에 해당할 수 있다. Segments may correspond to partial areas in which the input image is classified or divided according to a predetermined criterion.

산출 장치는 입력 영상에 포함된 객체를 예를 들어, 도로, 차량, 인도, 사람, 동물, 하늘, 건물 등과 같이 20개 클래스들(classes)의 의미(semantic) 단위로 구분하여 의미 세그먼트들로 분할할 수 있다. 의미 단위의 클래스들은 예를 들어, 도로, 하늘, 건물 등과 같은 고정 객체 이외에 움직이는 사람, 움직이는 동물, 이동 중인 차량 등과 같은 이동 객체 또한 포함할 수 있다. 산출 장치는 입력 영상으로부터 의미 단위로 물체를 분할하고, 분할된 영역이 어떤 의미를 갖는지를 픽셀(Pixel) 단위로 파악하여 각 클래스(class)마다 레이블링(labeling) 함으로써 의미 세그먼트들을 포함하는 세그먼테이션 영상을 생성할 수 있다.The calculation device classifies objects included in the input image into semantic units of 20 classes, such as roads, vehicles, sidewalks, people, animals, sky, buildings, etc., and divides them into semantic segments. can do. Classes of semantic units may include, for example, moving objects such as moving people, moving animals, and moving vehicles in addition to fixed objects such as roads, sky, and buildings. The calculation device divides an object from an input image into semantic units, identifies the meaning of the segmented area in units of pixels, and labels each class to obtain a segmentation image including semantic segments. can create

산출 장치는 예를 들어, 복수 개의 클래스들을 인식하도록 미리 학습된 컨볼루션 신경망(Convolution Neural Network; CNN), 심층 신경망(Deep Neural Network; DNN), 서포트 벡터 머신(Support Vector Machine) 등을 이용하여 입력 영상을 의미 세그먼트들을 분할할 수 있다. 컨볼루션 신경망은 다양한 객체들이 미리 학습된 것으로서, 영역 기반(Region-based) 컨볼루션 신경망일 수 있다. 산출 장치는 이 밖에도 다양한 기계 학습(Machine Learning) 방법들을 이용하여 입력 영상에 포함된 객체들을 의미 세그먼트들로 분할할 수 있다.The calculation device uses, for example, a Convolution Neural Network (CNN), a Deep Neural Network (DNN), a Support Vector Machine, etc. that have been previously trained to recognize a plurality of classes. An image can be segmented into semantic segments. The convolutional neural network is obtained by pre-learning various objects, and may be a region-based convolutional neural network. In addition, the calculation device may divide objects included in the input image into semantic segments using various machine learning methods.

또한, 산출 장치는 입력 영상으로부터 유추한 깊이 맵(depth map) 또는 노말 맵(normal map) 등에 의해 획득한 깊이 값에 기초하여 입력 영상을 깊이 세그먼트들로 분할할 수 있다. 의미 세그먼트들과 깊이 세그먼트들의 영역은 서로 일치할 수도 있고, 서로 상이할 수도 있다. Also, the calculation device may divide the input image into depth segments based on a depth value obtained by using a depth map or a normal map inferred from the input image. The regions of the semantic segments and the depth segments may coincide with each other or may be different from each other.

산출 장치는 단계(110)에서 분할한 세그먼트들의 신뢰도를 산출한다(120). 여기서의 신뢰도는 예를 들어, 세그먼트들의 깊이 정보(예를 들어, 깊이 값) 및 위치 정보(예를 들어, 위치 좌표)에 대한 신뢰도에 해당할 수 있다. 산출 장치는 예를 들어, 의미 세그먼트들에 대한 제1 신뢰도를 산출할 수 있다. 또한, 산출 장치는 깊이 세그먼트들에 대한 제2 신뢰도를 산출할 수 있다. The calculator calculates the reliability of the segments divided in step 110 (120). Reliability here may correspond to, for example, reliability of depth information (eg, depth values) and location information (eg, location coordinates) of segments. The calculator may calculate first reliability for semantic segments, for example. Also, the calculator may calculate the second reliability of the depth segments.

산출 장치는 프레임들 중 키 프레임(key frame)마다에 대하여 세그먼트들의 신뢰도를 산출할 수 있다. 키 프레임은 타임 라인(time line)에서 진행되는 영상의 모든 정보를 가지고 있는 프레임으로서, 예를 들어, 단일 동작의 시각 프레임과 끝 프레임 등과 같이 가장 중심이 되는 프레임에 해당할 수 있다. The calculator may calculate reliability of segments for each key frame among frames. A key frame is a frame having all information of an image progressing on a time line, and may correspond to the most central frame, such as a time frame and an end frame of a single motion, for example.

산출 장치는 예를 들어, 이동 객체를 포함하는 세그먼트의 신뢰도를 낮게 설정함으로써 해당 세그먼트가 이후의 카메라의 포즈 정보를 추정하는 과정 및 깊이 맵을 산출하는 과정에서 배제되도록 할 수 있다. 산출 장치가 세그먼트들에 대한 신뢰도를 산출하는 방법은 아래의 도 2를 참조하여 구체적으로 설명한다. For example, the calculation device may set the reliability of the segment including the moving object low so that the corresponding segment is excluded from the process of estimating the pose information of the camera and the process of calculating the depth map. A method of calculating reliability of segments by the calculator will be described in detail with reference to FIG. 2 below.

산출 장치는 단계(120)에서 산출한 신뢰도에 기초하여 세그먼트들 중 하나 이상의 세그먼트를 선택한다(130). 산출 장치는 신뢰도에 기초하여 선택된 세그먼트로부터 후술하는 카메라의 포즈 정보를 추정하는 과정 및 깊이 맵을 산출하는 과정에서 이용되는 특징점(feature point)이 될 픽셀을 선택할 수 있다. 특징점은 프레임 내 특징이 되는 점으로, 해당하는 프레임 내 2차원 위치에 대응하는 정보(u,v)를 포함할 수 있다. 각각의 프레임은 복수의 특징점들을 포함할 수 있으며, 프레임으로부터 특징점들을 선택하는 동작에는 일반적인 특징점 검출 알고리즘이 적용될 수 있으므로 보다 상세한 설명은 생략한다. 실시예에 따르면, 특징점들 중 적어도 일부는 추가적으로 깊이 값에 대응하는 정보를 더 포함할 수 있다. 예를 들어, 입력 영상을 촬영한 카메라의 포즈 정보를 추정하는 과정에서 특징점들 중 적어도 일부의 3차원 위치에 대응하는 정보가 획득될 수 있다. 3차원 위치는 깊이 값을 포함한다.The calculation device selects one or more segments from among the segments based on the reliability calculated in step 120 (130). The calculation device may select a pixel to be a feature point used in a process of estimating pose information of a camera and a process of calculating a depth map, which will be described later, from the selected segment based on reliability. A feature point is a feature point within a frame, and may include information (u, v) corresponding to a 2D location within a corresponding frame. Each frame may include a plurality of feature points, and since a general feature point detection algorithm may be applied to an operation of selecting feature points from the frame, a detailed description thereof is omitted. According to an embodiment, at least some of the feature points may additionally include information corresponding to a depth value. For example, information corresponding to 3D positions of at least some of feature points may be obtained in a process of estimating pose information of a camera that captures an input image. A 3D position includes a depth value.

예를 들어, 물체 사이의 경계 부분이 모호하거나 변화가 없어 잘 구분이 되지 않는 건물의 옆면 등과 같은 로우 그래디언트(low gradient) 영역 등에서는 오류를 정상으로 잘못 판단하여 선택하는 부정 오류 선택(False Negative Selection), 영상 내에 이동 객체(moving object), 또는 도로 노면에 있는 유리 조각 등에 의해 해당 부분의 그래디언트(gradient)가 일부 높게 나타나는 부분에 의한 고주파 노이즈(high frequency noise) 등에 의해 트래킹 손실(tracking)이 발생할 수 있다. For example, False Negative Selection (False Negative Selection ), a moving object in the image, or a piece of glass on the road surface, causing tracking loss due to high frequency noise caused by a part in which the gradient of the part appears high. can

일 실시예에서는 신뢰도에 의해 이와 같은 트래킹 손실을 유발할 수 있는 세그먼트들, 다시 말해 신뢰도가 낮은 세그먼트(들)를 배제하고, 신뢰도가 높은 세그먼트(들)를 선택할 수 있다. 산출 장치는 신뢰도가 높은 세그먼트(들)로부터 추출된 정보를 이용하여 카메라의 포즈 정보를 추정하고, 입력 영상의 깊이 맵을 산출함으로써 산출 속도 및 정확도를 향상시킬 수 있다. 산출 장치가 하나 이상의 세그먼트를 선택하는 방법은 아래의 도 3을 참조하며 구체적으로 설명한다. In an embodiment, segments that may cause such tracking loss according to reliability, that is, segment(s) with low reliability may be excluded, and segment(s) with high reliability may be selected. The calculation device may improve calculation speed and accuracy by estimating pose information of the camera using information extracted from the highly reliable segment(s) and calculating a depth map of the input image. A method of selecting one or more segments by the calculation device will be described in detail with reference to FIG. 3 below.

산출 장치는 선택된 세그먼트를 이용하여 입력 영상에 대한 카메라의 포즈 정보를 추정한다(140). 카메라의 포즈 정보는 예를 들어, 카메라의 회전(rotation) 정보(R) 및 이동(translation) 정보(T)를 포함할 수 있다. 또는 카메라의 포즈 정보는 예를 들어, 카메라의 위치에 해당하는 X(수평), Y(수직), Z(깊이) 및/또는 카메라의 자세(orientation)에 해당하는 피치(pitch), 요(yaw), 및 롤(roll)을 포함하는 6 자유도(6 DoF) 카메라 포즈일 수 있다. The calculation device estimates camera pose information for the input image using the selected segment (140). The pose information of the camera may include, for example, rotation information R and translation information T of the camera. Alternatively, the pose information of the camera may be, for example, X (horizontal), Y (vertical), Z (depth) corresponding to the position of the camera, and/or pitch, yaw corresponding to the orientation of the camera. ), and a 6 DoF camera pose including roll.

산출 장치는 예를 들어, 연속적인 일련의 영상들(프레임들)에서의 픽셀들 간의 상관 관계를 나타내는 호모그래피(Homography)를 이용하여 입력 영상을 촬영한 카메라의 위치 및 촬영한 객체의 위치(깊이)를 포함하는 포즈 정보를 추정할 수 있다. 산출 장치는 예를 들어, 특징 기반(Feature based)의 SLAM(Simultaneous Localization and Mapping) 기법, 다이렉트(Direct) SLAM 기법, EKF(Extended Kalman Filter) SLAM 기법, 패스트(Fast) SLAM 기법, 및 LSD(Large-Scale Direct Monocular) SLAM 기법 등 다양한 SLAM 기법들을 이용하여 카메라의 포즈 정보를 획득할 수 있다. 산출 장치가 카메라의 포즈 정보를 추정하는 방법은 아래의 도 4를 참조하여 구체적으로 설명한다.The calculation device uses, for example, homography representing a correlation between pixels in a series of images (frames) to determine the location of a camera capturing an input image and the location (depth) of a captured object. ) may be estimated. The calculation device is, for example, a feature based simultaneous localization and mapping (SLAM) technique, a direct SLAM technique, an extended Kalman filter (EKF) SLAM technique, a fast SLAM technique, and a large -Scale Direct Monocular) The pose information of the camera can be obtained using various SLAM techniques such as the SLAM technique. A method of estimating pose information of a camera by a calculation device will be described in detail with reference to FIG. 4 below.

산출 장치는 카메라의 포즈 정보를 이용하여 입력 영상의 깊이 맵을 산출한다(150). 산출 장치는 카메라의 포즈 정보를 추정하는 과정에서 ??득된 카메라의 위치 좌표(u.v), 카메라의 회전(rotation) 정보(R) 및 이동(translation) 정보(T) 등을 이용하여 깊이 값을 계산함으로써 깊이 맵을 산출할 수 있다. The calculator calculates a depth map of the input image using pose information of the camera (150). The calculation device calculates a depth value using camera position coordinates (u.v) obtained in the process of estimating camera pose information, rotation information (R) and translation information (T) of the camera, etc. By doing so, a depth map can be calculated.

도 2는 일 실시예에 따라 신뢰도를 산출하는 방법을 나타낸 흐름도이다. 도 2를 참조하면, 일 실시예에 따른 산출 장치는 입력 영상에 포함된 객체가 이동 객체인지 여부에 기초하여 의미 세그먼트들에 대한 제1 신뢰도를 산출할 수 있다. 2 is a flowchart illustrating a method of calculating reliability according to an exemplary embodiment. Referring to FIG. 2 , the calculation device according to an exemplary embodiment may calculate first reliability for semantic segments based on whether an object included in an input image is a moving object.

산출 장치는 입력 영상에 포함된 객체가 이동 객체인지를 판단할 수 있다(210). 단계(210)에서 객체가 이동 객체가 아니라고 판단되면(다시 말해, 객체가 고정 객체라고 판단되면), 산출 장치는 고정 객체에 대응하는 의미 세그먼트에 대한 제1 신뢰도를 제2 값으로 결정할 수 있다(220). 제2 값을 예를 들어, '1' 일 수 있다. The calculation device may determine whether an object included in the input image is a moving object (210). In step 210, if it is determined that the object is not a moving object (ie, if it is determined that the object is a fixed object), the calculation device may determine the first reliability of the semantic segment corresponding to the fixed object as a second value ( 220). The second value may be, for example, '1'.

단계(210)에서 객체가 이동 객체라고 판단되면, 산출 장치는 이동 객체에 대응하는 의미 세그먼트에 대한 제1 신뢰도를 제1 값으로 결정할 수 있다(230). 제1 값을 예를 들어, '0'일 수 있다. 일 실시예에 따르면, 산출 장치는 이동 객체와 같이 트래킹 손실을 유발하거나, 노이즈가 있는 신뢰도가 낮은 세그먼트(들)에 대한 신뢰도 값을 낮게 설정함으로써 카메라의 포즈 정보를 추정하거나, 또는 깊이 맵을 산출하는 데에 해당 세그먼트의 사용을 배제시킬 수 있다. If it is determined that the object is a moving object in step 210, the calculation device may determine the first reliability of the semantic segment corresponding to the moving object as a first value (230). The first value may be, for example, '0'. According to an embodiment, the calculation device estimates camera pose information or calculates a depth map by setting a low reliability value for a low reliability segment(s) that cause tracking loss or have noise, such as a moving object. In doing so, the use of the corresponding segment can be excluded.

산출 장치는 깊이 세그먼트들에 대한 제2 신뢰도를 산출할 수 있다(240). 산출 장치는 예를 들어, 아래의 수학식 1을 이용하여 깊이 세그먼트들에 대한 제2 신뢰도(R_Si)를 산출할 수 있다. The calculator may calculate the second reliability of the depth segments (240). The calculator may calculate the second reliability (R _Si ) of the depth segments using, for example, Equation 1 below.

수학식 1에서,

는 현재(current)의 키 프레임 i를 나타내고,

는 키 프레임 i에 가장 인접한 다음(next) 키 프레임 j를 나타낸다.

는 깊이 맵을 나타내고,

는

에서

로의 변환 행렬(transformation matrix)을 나타낸다.

는 세그먼트 i에 속한 이미지 영역을 나타내고,

는 현재의 키 프레임에서의 이미지 영역을 나타낸다. In Equation 1,

represents the current key frame i,

represents the next key frame j closest to key frame i.

denotes a depth map,

Is

at

Represents a transformation matrix to .

denotes an image area belonging to segment i,

represents the image area in the current key frame.

로 나타낼 수 있다.

는 고유 행렬(intrinsic matrix)를 나타내고,

는 픽셀 좌표(pixel coordinate)를 나타내고,

는

의 동종 표현(homogeneous representation)을 나타낸다.

는 현재의 키 프레임 i에서의 픽셀 좌표 u의 깊이 맵을 나타낸다.

이다.

can be expressed as

denotes an intrinsic matrix,

represents a pixel coordinate,

Is

represents a homogeneous representation of

denotes the depth map of pixel coordinates u at the current keyframe i.

to be.

수학식 1에서

는 타겟(target)이 되는 다음 키 프레임 j을 나타내고,

는 변형된 호스트(warped host), 다시 말해 변형된 현재의 키 프레임 i을 나타낸다.in Equation 1

represents the next key frame j to be the target,

denotes a warped host, that is, a warped current key frame i.

산출 장치는 제1 신뢰도 및 제2 신뢰도를 융합(fusion)할 수 있다(250). 산출 장치는 예를 들어, 아래의 수학식 2를 이용하여 의미 세그먼트에 대한 제1 신뢰도 및 깊이 세그먼트에 대한 제2 신뢰도를 융합할 수 있다. The calculation device may fuse the first reliability level and the second reliability level (250). The calculation device may fuse the first reliability for the semantic segment and the second reliability for the depth segment using Equation 2 below, for example.

는 수학식 1에서 산출한 픽셀 좌표

에서의 깊이 세그먼트의 신뢰도를 나타내고,

는 픽셀 좌표

에서의 의미 세그먼트의 신뢰도를 나타낸다.

is the pixel coordinate calculated in Equation 1

Represents the reliability of the depth segment in ,

is the pixel coordinate

Indicates the reliability of the semantic segment in .

산출 장치는 융합한 신뢰도를 세그먼트들에 대한 신뢰도로 결정할 수 있다(260).The calculation device may determine the fused reliability as the reliability of the segments (260).

도 3은 일 실시예에 따라 하나 이상의 세그먼트를 선택하는 방법을 설명하기 위한 도면이다. 도 3의 (a)를 참조하면, 의미 세그먼트들(310, 320, 330, 340)로 분할된 입력 영상이 도시된다. 전술한 바와 같이, 산출 장치는 입력 영상에 포함된 객체를 의미 단위로 구분하여 입력 영상을 의미 세그먼트들로 분할할 수 있다. 이하에서는 설명의 편의를 위해 의미 세그먼트들에서 하나 이상의 세그먼트를 선택하는 방법에 대하여 설명하지만, 깊이 세그먼트들에 대하여도 마찬가지의 방법이 적용될 수 있다. 3 is a diagram for explaining a method of selecting one or more segments according to an exemplary embodiment. Referring to (a) of FIG. 3 , an input image divided into semantic segments 310 , 320 , 330 , and 340 is shown. As described above, the calculation device may divide the input image into semantic segments by classifying objects included in the input image into semantic units. Hereinafter, a method of selecting one or more segments from semantic segments will be described for convenience of explanation, but the same method may be applied to depth segments.

예를 들어, 입력 영상에 포함된 객체들의 의미 별로, 도로는 세그먼트(310)로, 건물은 세그먼트(320)로, 하늘은 세그먼트(330)로, 자동차는 세그먼트(340)로 분할될 수 있다. 분할된 세그먼트들(310, 320, 330, 340) 중 예를 들어, 도 3의 (b)에 도시된 것과 같이 하늘에 해당하는 세그먼트(330)에서 비행 중인 비행체에 의해 햇빛이 반사되어 갑자기 번쩍거리는 고주파 노이즈(high frequency noise)가 발생하고, 자동차로 구분된 세그먼트(340)의 자동차가 이동 객체라고 하자. 산출 장치는 고주파 노이즈가 발생한 세그먼트(330)의 신뢰도를 노이즈가 없는 세그먼트들에 비해 낮게 설정될 수 있다. 또한, 이동 객체에 해당하는 세그먼트(340)의 신뢰도는 예를 들어, '0'으로 설정될 수 있다. For example, the road may be divided into segments 310, the building into segments 320, the sky into segments 330, and the car into segments 340 according to the meaning of the objects included in the input image. Among the divided segments 310, 320, 330, and 340, for example, as shown in FIG. Assume that a high frequency noise is generated and the car of the segment 340 divided into cars is a moving object. The calculation device may set the reliability of the segment 330 with high-frequency noise lower than that of segments without noise. Also, the reliability of the segment 340 corresponding to the moving object may be set to '0', for example.

산출 장치는 트래킹 손실을 유발할 수 있는 세그먼트 또는 신뢰도가 낮은 세그먼트를 배제하고, 신뢰도가 높은 세그먼트(들)를 선택할 수 있다. 산출 장치는 예를 들어, 도 3의 (c)에 도시된 것과 같이 도로로 구분된 세그먼트(310) 및 빌딩으로 구분된 세그먼트(320)를 선택하고, 선택된 세그먼트들(310, 320)로부터 추출된 정보(예를 들어, 픽셀(들)(350)에 대한 정보)를 이용하여 카메라의 포즈 정보 추정 및 깊이 맵 산출을 수행할 수 있다. The calculation device may exclude segments that may cause tracking loss or segments with low reliability, and select segment(s) with high reliability. The calculation device selects a segment 310 divided into roads and a segment 320 divided into buildings, for example, as shown in (c) of FIG. 3 , and information extracted from the selected segments 310 and 320 Camera pose information estimation and depth map calculation may be performed using (eg, information on the pixel(s) 350 ).

도 4는 일 실시예에 따라 카메라의 포즈 정보를 추정하는 방법을 나타낸 흐름도이다. 도 4를 참조하면, 일 실시예에 따른 산출 장치는 신뢰도에 기초하여 선택된 세그먼트로부터 픽셀들을 선택할 수 있다(410). 이때, 신뢰도는 예를 들어, 도 2의 단계(250)에서 융합된 신뢰도일 수도 있고, 제1 신뢰도 또는 제2 신뢰도일 수도 있다. 산출 장치는 신뢰도에 비례하도록, 선택된 세그먼트로부터 픽셀들을 선택할 수 있다. 산출 장치는 예를 들어, 신뢰도가 높은 가장 세그먼트로부터 픽셀들을 선택할 수도 있다. 또는 산출 장치는 신뢰도가 높은 순서대로 세그먼트들로부터 골고루 픽셀들을 선택할 수도 있다. 산출 장치는 예를 들어, 신뢰도가 가장 높은 세그먼트들로부터는 많은 개수의 픽셀들을 선택하고, 점차 신뢰도가 낮아질수록 작은 개수의 픽셀들을 선택할 수도 있다. 4 is a flowchart illustrating a method of estimating pose information of a camera according to an exemplary embodiment. Referring to FIG. 4 , the calculation device according to an exemplary embodiment may select pixels from a selected segment based on reliability (410). At this time, the reliability may be, for example, the reliability fused in step 250 of FIG. 2, or may be the first reliability or the second reliability. The calculation device may select pixels from the selected segment in proportion to the reliability. The calculation device may, for example, select pixels from the most reliable segment. Alternatively, the calculation device may evenly select pixels from the segments in order of high reliability. For example, the calculation device may select a large number of pixels from segments having the highest reliability, and may select a smaller number of pixels as the reliability gradually decreases.

산출 장치는 선택된 픽셀들로부터의 카메라의 포즈 정보를 추정할 수 있다(420). 산출 장치는 깊이 값을 포함하는 픽셀들에 해당하는 3차원 점으로부터의 카메라의 포즈 정보를 추정할 수 있다. 또는 산출 장치는 예를 들어, 선택된 세그먼트에 대하여 아래의 수학식 3과 같은 비용 함수(cost function)(

)를 적용하여 카메라의 포즈 정보를 추정할 수 있다.The calculation device may estimate camera pose information from the selected pixels (420). The calculation device may estimate camera pose information from 3D points corresponding to pixels including depth values. Alternatively, the calculation device calculates, for example, a cost function such as Equation 3 below for the selected segment (

) can be applied to estimate the pose information of the camera.

수학식 3에서,

는 기준 프레임(reference frame)을 나타내고,

는 타겟 프레임(target frame)을 나타낸다.

는 기준 프레임

에서의 포인트(point), 다시 말해 픽셀을 나타내고,

와 같이 나타낼 수 있다.

는 SSD(sum of squared differences)에 포함된 픽셀들의 집합을 나타낸다.

는 기준 프레임

의 노출 시간(exposure time)을 나타내고,

는 타겟 프레임

의 노출 시간을 나타낸다.

는 손실 함수(loss function)인 후버 놈(Huber norm)을 나타낸다.

는 융합된 신뢰도에 기초한 가중치(weight)를 나타내고,

는 아핀 밝기 전달 함수(affine brightness transfer function)를 나타낸다.

는 기준 프레임의 밝기(brightness)를 나타내고,

는 타겟 프레임의 밝기를 나타낸다.

는 기준 프레임에 대한 밝기 전달 함수(brightness transfer function)의 파라미터(parameter)를 나타내고,

는 타겟 프레임에 대한 밝기 전달 함수의 파라미터를 나타낸다. In Equation 3,

denotes a reference frame,

represents a target frame.

is the reference frame

Represents a point, that is, a pixel, in

can be expressed as

represents a set of pixels included in the sum of squared differences (SSD).

is the reference frame

represents the exposure time of

is the target frame

indicates the exposure time of

denotes the Huber norm, which is a loss function.

represents a weight based on the fused reliability,

denotes an affine brightness transfer function.

represents the brightness of the reference frame,

represents the brightness of the target frame.

denotes a parameter of the brightness transfer function for the reference frame,

represents parameters of the brightness transfer function for the target frame.

는 역 깊이(inverse depth)

를 갖는 포인트

의 투영된 포인트 위치를 나타내고, 아래의 수학식 4를 통해 구할 수 있다.

is the inverse depth

point with

Represents the projected point position of , and can be obtained through Equation 4 below.

여기서,

이고,

와 같다.

는 카메라 포즈들이 변환 행렬로 표현됨을 나타낸다. here,

ego,

Same as

Indicates that camera poses are expressed as a transformation matrix.

전체 측광 오차(full photometric error)는 아래의 수학식 5와 같이 나타낼 수 있다. The full photometric error can be expressed as Equation 5 below.

수학식 5에서, i는 모든 프레임

에서 실행되고,

는 프레임 i에서의 모든 포인트들

대하여 실행되며, j는

가 보이는 모든 프레임들

에서 실행된다. In Equation 5, i is every frame

runs on

is all points in frame i

is executed for, and j is

all visible frames

runs on

전술한 수학식 3 내지 수학식 5는 프레임들 간의 밝기를 맞춰주기 위한 것으로서, 프레임들 간의 밝기(brightness)가 깊이 값에 영향을 줄 수 있으므로 일 실시예에서는 전술한 수학식들을 이용하여 밝기 차이를 맞춰 줌으로써 보다 정확한 깊이 맵이 산출되도록 할 수 있다. The aforementioned Equations 3 to 5 are for matching the brightness between frames. Since the brightness between frames can affect the depth value, in one embodiment, the brightness difference is calculated using the above-described Equations. By matching, a more accurate depth map can be calculated.

도 5는 일 실시예에 따른 깊이 맵을 산출하는 장치의 동작에 따른 구성을 설명하기 위한 도면이다. 도 5를 참조하면, 일 실시예에 따른 산출 장치(500)는 카메라(510), 분할(Segmentation)부(520), 선택부(Selector)(530), 트래킹(Tracking)부(540), 및 맵핑(Mapping)부(550)를 포함할 수 있다. 분할(Segmentation)부(520), 선택부(Selector)(530), 트래킹(Tracking)부(540), 및 맵핑(Mapping)부(550)는 예를 들어, 도 6에 도시된 프로세서(620)에 의해 구현될 수 있다. 5 is a diagram for explaining a configuration according to an operation of an apparatus for calculating a depth map according to an exemplary embodiment. Referring to FIG. 5 , a calculation device 500 according to an embodiment includes a camera 510, a segmentation unit 520, a selector 530, a tracking unit 540, and A mapping unit 550 may be included. The segmentation unit 520, the selector 530, the tracking unit 540, and the mapping unit 550 are, for example, the processor 620 shown in FIG. can be implemented by

카메라(510)는 일련의 입력 영상을 촬영 또는 캡쳐할 수 있다. The camera 510 may photograph or capture a series of input images.

분할부(520)는 입력 영상을 세그먼트들로 분할할 수 있다. 분할부(520)는 입력 영상을 깊이 값에 의해 깊이 세그먼트들로 분할하는 깊이 분할부(523) 및 입력 영상을 의미 단위의 의미 세그먼트들로 분할하는 의미 분할부(526)를 포함할 수 있다. The division unit 520 may divide the input image into segments. The segmentation unit 520 may include a depth segmentation unit 523 that divides the input image into depth segments based on depth values and a semantic segmentation unit 526 that divides the input image into semantic segments of semantic units.

선택부(530)는 분할부(520)에 의해 분할된 세그먼트들의 신뢰도에 기초하여 세그먼트들 중 카메라 포즈를 트래킹하고, 깊이 맵을 산출하는 데에 이용될 하나 이상의 세그먼트를 선택할 수 있다. 선택부(530)는 예를 들어, 세그먼트들의 신뢰도에 비례하여 세그먼트를 선택할 수 있다. The selection unit 530 may select one or more segments to be used for tracking a camera pose and calculating a depth map from among segments based on reliability of the segments divided by the segmentation unit 520 . For example, the selection unit 530 may select segments in proportion to reliability of the segments.

선택부(530)는 깊이 신뢰도 평가부(532), 의미 신뢰도 평가부(534), 신뢰도 융합부(536), 및 픽셀 선택부(538)를 포함할 수 있다. The selection unit 530 may include a depth reliability evaluation unit 532 , a semantic reliability evaluation unit 534 , a reliability fusion unit 536 , and a pixel selection unit 538 .

깊이 신뢰도 평가부(532)는 깊이 세그먼트들의 신뢰도를 평가(또는 산출)할 수 있다. 의미 신뢰도 평가부(534)는 의미 세그먼트들의 신뢰도를 평가(또는 산출)할 수 있다.The depth reliability evaluation unit 532 may evaluate (or calculate) reliability of depth segments. The semantic reliability evaluation unit 534 may evaluate (or calculate) reliability of semantic segments.

신뢰도 융합부(536)는 깊이 세그먼트들의 신뢰도 및 의미 세그먼트들의 신뢰도를 융합할 수 있다.The confidence fusion unit 536 may fuse the confidence levels of the depth segments and the confidence levels of the semantic segments.

픽셀 선택부(538)는 신뢰도 융합부(536)에서 융합된 신뢰도를 이용하여 세그먼트를 선택하고, 선택된 세그먼트로부터 픽셀들을 선택할 수 있다. The pixel selection unit 538 may select a segment using the reliability fused in the reliability fusion unit 536 and select pixels from the selected segment.

트래킹부(540)는 카메라의 위치 및 자세를 포함하는 카메라의 6 자유도 포즈 정보를 산출할 수 있다. 트래킹부(540)는 예를 들어, 새로운 입력 영상들을 지속적으로 트래킹하고, 이전 프레임에서의 카메라의 포즈 정보를 기초로 현재 프레임에서의 카메라의 포즈 정보를 산출할 수 있다. 이때, 트래킹부(540)는 픽셀 선택부(538)에서 이전 프레임의 선택된 세그먼트의 픽셀들로부터의 카메라의 포즈 정보를 추정할 수 있다. 트래킹부(540)는 선택된 세그먼트를 대상으로 전술한 비용 함수를 풀어 카메라의 포즈 정보(예를 들어, 카메라의 회전(rotation) 정보 및 이동(translation) 정보)를 추정할 수 있다. The tracking unit 540 may calculate 6 DOF pose information of the camera including the position and posture of the camera. For example, the tracking unit 540 may continuously track new input images and calculate camera pose information in a current frame based on camera pose information in a previous frame. In this case, the tracking unit 540 may estimate camera pose information from the pixels of the selected segment of the previous frame in the pixel selection unit 538 . The tracking unit 540 may estimate camera pose information (eg, camera rotation information and translation information) by solving the above-described cost function for the selected segment.

맵핑부(550)는 촬영한 객체의 깊이를 계산하여 깊이 맵을 산출할 수 있다. 맵핑부(550)는 픽셀 선택부(538)에서 선택된 세그먼트의 픽셀들로부터 추정된 카메라의 포즈 정보를 이용하여 입력 영상의 깊이 맵을 산출할 수 있다. 맵핑부(550)는 예를 들어, 카메라의 위치 좌표(u.v) 및 카메라의 회전(rotation) 정보(R) 및 이동(translation) 정보(T)를 이용하여 계산된 깊이 값에 의해 깊이 맵을 산출할 수 있다. The mapping unit 550 may calculate a depth map by calculating the depth of the photographed object. The mapping unit 550 may calculate a depth map of an input image using camera pose information estimated from pixels of a segment selected by the pixel selection unit 538 . The mapping unit 550 calculates a depth map using, for example, a depth value calculated using the position coordinates (u.v) of the camera, rotation information (R) and translation information (T) of the camera. can do.

맵핑부(550)는 트래킹된 프레임들을 이용하여 새로운 키 프레임을 생성하거나 또는 현재의 키 프레임을 재정의(refine)할 수 있다. 예를 들어, 입력 영상을 촬영한 카메라가 너무 멀리 이동하여 이전 프레임들에서 촬영된 객체들을 포함하지 못하는 경우, 산출 장치(500)는 가장 최근에 트래킹된 프레임들로부터 새로운 키 프레임을 생성할 수 있다. 만약, 새로운 키 프레임이 생성된 경우, 해당 키 프레임의 깊이 맵은 포인트를 이전 키 프레임에서 새로운 키 프레임으로 투영(projecting)함으로써 초기화될 수 있다. 또한, 트래킹된 프레임들 중 새로운 키 프레임이 되지 못한 프레임은 현재의 키 프레임을 재정의하는 데에 이용될 수 있다. The mapping unit 550 may generate a new key frame or refine a current key frame using the tracked frames. For example, when a camera capturing an input image moves too far to include objects captured in previous frames, the calculation device 500 may generate a new key frame from the most recently tracked frames. . If a new key frame is generated, the depth map of the corresponding key frame may be initialized by projecting points from the previous key frame to the new key frame. Also, frames that do not become new key frames among the tracked frames may be used to redefine the current key frames.

새로이 생성된 키 프레임 또는 재정의된 키 프레임에는 맵핑부(550)에 의해 새로이 산출된 깊이 맵이 추가될 수 있다. A depth map newly calculated by the mapping unit 550 may be added to the newly created key frame or the redefined key frame.

도 6는 일 실시예에 따른 깊이 맵을 산출하는 장치의 블록도이다. 도 6을 참조하면, 일 실시예에 따른 깊이 맵을 산출하는 장치(이하, '산출 장치')(600)는 카메라(610), 프로세서(620), 및 메모리(630)를 포함한다. 산출 장치(600)는 통신 인터페이스(640) 및/또는 디스플레이 장치(650)를 더 포함할 수 있다. 카메라(610), 프로세서(620), 메모리(630), 통신 인터페이스(640) 및 디스플레이 장치(650)는 통신 버스(605)를 통해 서로 통신할 수 있다. 6 is a block diagram of an apparatus for calculating a depth map according to an exemplary embodiment. Referring to FIG. 6 , an apparatus 600 for calculating a depth map according to an exemplary embodiment (hereinafter referred to as 'calculating apparatus') 600 includes a camera 610 , a processor 620 , and a memory 630 . The calculation device 600 may further include a communication interface 640 and/or a display device 650 . The camera 610 , processor 620 , memory 630 , communication interface 640 and display device 650 may communicate with each other through a communication bus 605 .

산출 장치(600)는 예를 들어, 증강 현실 헤드-업 디스플레이, 증강 현실/가상 현실 글래스, 자율 주행 자동차, 지능형 자동차, 스마트 폰, 및 모바일 기기 등과 같이 실시간으로 다양한 증강 현실 어플리케이션들을 구현하는 전자 장치일 수 있다.The computing device 600 is an electronic device that implements various augmented reality applications in real time, such as, for example, an augmented reality head-up display, augmented reality/virtual reality glasses, autonomous vehicles, intelligent vehicles, smart phones, and mobile devices. can be

카메라(610)는 입력 영상을 획득한다. 카메라(610)는 예를 들어, RGB 카메라 또는 RGB-D(Depth) 카메라일 수 있다. 입력 영상은 산출 장치(600)에 입력되는 영상으로, 예를 들어, 실시간 영상 또는 동영상일 수 있다. 또는 입력 영상은 모노 영상일 수도 있고, 스테레오 영상일 수도 있다. 입력 영상은 복수의 프레임들을 포함할 수 있다. 입력 영상은 카메라(610)를 통해 촬영 또는 캡쳐된 것일 수도 있고, 산출 장치(600)의 외부로부터 획득된 것일 수도 있다.The camera 610 acquires an input image. The camera 610 may be, for example, an RGB camera or an RGB-D (Depth) camera. The input image is an image input to the calculation device 600, and may be, for example, a real-time image or a video. Alternatively, the input image may be a mono image or a stereo image. An input image may include a plurality of frames. The input image may be photographed or captured through the camera 610 or obtained from the outside of the calculation device 600 .

프로세서(620)는 입력 영상을 세그먼트들로 분할하고, 세그먼트들에 대하여 산출된 신뢰도에 기초하여 세그먼트들 중 하나 이상의 세그먼트를 선택한다. 프로세서(620)는 선택된 세그먼트를 이용하여 입력 영상에 대한 카메라의 포즈 정보를 추정한다. 프로세서(620)는 카메라의 포즈 정보를 이용하여 입력 영상의 깊이 맵을 산출한다. The processor 620 divides the input image into segments and selects one or more of the segments based on the reliability calculated for the segments. The processor 620 estimates camera pose information for the input image using the selected segment. The processor 620 calculates a depth map of the input image using pose information of the camera.

프로세서(620)는 입력 영상에 포함된 객체를 의미 단위로 구분하여 의미 세그먼트들로 분할할 수 있다. 프로세서(620)는 입력 영상의 깊이 값에 기초하여 입력 영상을 깊이 세그먼트들로 분할할 수 있다. 또는 프로세서(620)는 입력 영상을 의미 세그먼트들 및 깊이 세그먼트들로 분할할 수 있다. The processor 620 may classify objects included in the input image into semantic units and divide them into semantic segments. The processor 620 may divide the input image into depth segments based on the depth value of the input image. Alternatively, the processor 620 may divide the input image into semantic segments and depth segments.

프로세서(620)는 의미 세그먼트들에 대한 제1 신뢰도를 산출하거나, 또는 깊이 세그먼트들에 대한 제2 신뢰도를 산출할 수 있다. 또는 프로세서(620)는 의미 세그먼트들에 대한 제1 신뢰도 및 깊이 세그먼트들에 대한 제2 신뢰도를 산출할 수 있다. 프로세서(620)는 예를 들어, 입력 영상에 포함된 객체가 이동 객체인지 여부에 기초하여 의미 세그먼트들에 대한 제1 신뢰도를 산출할 수 있다. The processor 620 may calculate a first confidence level for semantic segments or a second confidence level for depth segments. Alternatively, the processor 620 may calculate the first reliability for semantic segments and the second reliability for depth segments. For example, the processor 620 may calculate the first reliability of the semantic segments based on whether an object included in the input image is a moving object.

프로세서(620)는 제1 신뢰도 및 제2 신뢰도를 융합하고, 융합한 신뢰도를 세그먼트들에 대한 신뢰도로 결정할 수 있다. 프로세서(620)는 프레임들 중 키 프레임마다에 대하여 세그먼트들의 신뢰도를 산출할 수 있다.The processor 620 may fuse the first reliability level and the second reliability level, and determine the fused reliability level as the level of reliability for the segments. The processor 620 may calculate reliability of segments for each key frame among the frames.

프로세서(620)는 융합된 신뢰도에 기초하여 선택된 세그먼트로부터 픽셀들을 선택하고, 선택된 픽셀들로부터의 카메라의 포즈 정보를 추정할 수 있다. 프로세서(620)는 융합된 신뢰도에 비례하도록 선택된 세그먼트로부터 픽셀들을 선택할 수 있다. The processor 620 may select pixels from the selected segment based on the fused reliability and estimate camera pose information from the selected pixels. Processor 620 may select pixels from the selected segment proportional to the fused confidence.

프로세서(620)는 예를 들어, 선택된 세그먼트에 대하여 비용 함수를 적용하여 카메라의 포즈 정보를 추정할 수 있다.The processor 620 may estimate pose information of the camera by applying a cost function to the selected segment, for example.

이 밖에도, 프로세서(620)는 도 1 내지 도 5를 통해 전술한 방법 또는 방법에 대응되는 알고리즘을 수행할 수 있다. 프로세서(620)는 프로그램을 실행하고, 산출 장치(600)를 제어할 수 있다. 프로세서(620)에 의하여 실행되는 프로그램 코드는 메모리(630)에 저장될 수 있다.In addition, the processor 620 may perform the method or an algorithm corresponding to the method described above with reference to FIGS. 1 to 5 . The processor 620 may execute a program and control the calculation device 600 . Program codes executed by the processor 620 may be stored in the memory 630 .

메모리(630)는 입력 영상 및/또는 복수의 프레임들을 저장할 수 있다. 메모리(630)는 프로세서(620)가 추정한 입력 영상에 대한 카메라의 포즈 정보, 프로세서(620)가 산출한 입력 영상의 깊이 맵, 및/또는 프로세서(620)가 깊이 맵을 이용하여 재구성한 3차원 영상을 저장할 수 있다. The memory 630 may store an input image and/or a plurality of frames. The memory 630 stores the pose information of the camera for the input image estimated by the processor 620, the depth map of the input image calculated by the processor 620, and/or the three images reconstructed using the depth map by the processor 620. dimensional images can be stored.

또한, 메모리(630)는 전술한 프로세서(620)에서의 처리 과정에서 생성되는 다양한 정보들을 저장할 수 있다. 이 밖에도, 메모리(630)는 각종 데이터와 프로그램 등을 저장할 수 있다. 메모리(630)는 휘발성 메모리 또는 비휘발성 메모리를 포함할 수 있다. 메모리(630)는 하드 디스크 등과 같은 대용량 저장 매체를 구비하여 각종 데이터를 저장할 수 있다.In addition, the memory 630 may store various pieces of information generated in the process of processing in the processor 620 described above. In addition, the memory 630 may store various data and programs. The memory 630 may include volatile memory or non-volatile memory. The memory 630 may include a mass storage medium such as a hard disk to store various types of data.

실시예에 따라서, 산출 장치(600)는 통신 인터페이스(640)를 통해 산출 장치(600)의 외부에서 촬영된 입력 영상을 수신할 수 있다. 이 경우, 통신 인터페이스(640)는 입력 영상 이외에도 입력 영상을 촬영한 촬영 장치의 회전 정보 및 이동 정보 등과 같은 포즈 정보, 촬영 장치의 위치 정보 및/또는 촬영 장치의 캘리브레이션 정보 등을 함께 수신할 수 있다.According to embodiments, the calculation device 600 may receive an input image captured outside the calculation device 600 through the communication interface 640 . In this case, the communication interface 640 may receive, in addition to the input image, pose information such as rotation information and movement information of the photographing device that captured the input image, location information of the photographing device, and/or calibration information of the photographing device. .

디스플레이 장치(650)는 프로세서(620)가 산출한 깊이 맵에 의해 재구성한 3차원 영상을 디스플레이할 수 있다.The display device 650 may display a 3D image reconstructed by the depth map calculated by the processor 620.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 추정 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 추정 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 추정 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 추정 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 추정 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), programmable logic units (PLUs), microprocessors, or any other device capable of executing and responding to instructions. The estimation device may perform an operating system (OS) and one or more software applications running on the operating system. In addition, the estimation device may access, store, manipulate, process and generate data in response to execution of software. For convenience of understanding, there are cases in which one estimation device is used, but those skilled in the art will understand that the estimation device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, the estimation device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 추정 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 추정 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 추정 장치에 의하여 해석되거나 추정 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures the estimation device to operate as desired or which, independently or collectively, estimates You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or to provide instructions or data to an estimating device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved. Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

Dividing an input image into segments;
calculating reliability of the segments;
selecting one or more of the segments based on the confidence level;
estimating camera pose information for the input image using the selected segment; and
Calculating a depth map of the input image using pose information of the camera
including,
Calculating the reliability of the segments
calculating a first reliability level for semantic segments obtained by dividing objects included in the input image into semantic units;
calculating second reliability for depth segments obtained by dividing the input image based on a depth value of the input image; and
Determining reliability obtained by fusion of the first reliability and the second reliability as reliability for the segments
Including, a method for calculating a depth map.

According to claim 1,
The dividing step is
dividing the input image into semantic segments by classifying objects included in the input image into semantic units; and
Dividing the input image into depth segments based on a depth value of the input image
A method for calculating a depth map, including any one or a combination thereof.

delete

According to claim 1,
Calculating the first reliability
Calculating first reliability for the semantic segments based on whether an object included in the input image is a moving object
Including, a method for calculating a depth map.

According to claim 4,
Calculating the first reliability
determining a first reliability of a semantic segment corresponding to the moving object as a first value when the object is a moving object; and
If the object is a fixed object, determining a first reliability for a semantic segment corresponding to the fixed object as a second value.
Including, a method for calculating a depth map.

delete

According to claim 1,
selecting pixels from the selected segment based on the fused confidence
Including more,
Estimating the pose information of the camera
Estimating pose information of the camera from the selected pixels
Including, a method for calculating a depth map.

According to claim 7,
Selecting pixels from the selected segment comprises
selecting pixels from the selected segment to be proportional to the fused confidence.
Including, a method for calculating a depth map.

According to claim 1,
The input image includes frames,
The step of calculating the reliability is
calculating reliability of the segments for each key frame among the frames;
Including, a method for calculating a depth map.

According to claim 1,
Estimating the pose information of the camera
Estimating pose information of the camera by applying a cost function to the selected segment
Including, a method for calculating a depth map.

A computer program stored in a computer-readable recording medium in order to execute the method of any one of claims 1, 2, 4, 5, 7 to 10 in combination with hardware.

a camera that acquires an input image; and
Dividing the input image into segments, selecting one or more of the segments based on reliability calculated for the segments, and estimating camera pose information for the input image using the selected segments; , A processor for calculating a depth map of the input image using the pose information of the camera
including,
The processor
Calculating first reliability for semantic segments obtained by classifying objects included in the input image into semantic units and dividing them;
Calculating a second reliability of depth segments obtained by dividing the input image based on a depth value of the input image;
and determining a reliability obtained by combining the first reliability and the second reliability as reliability for the segments.

According to claim 12,
The processor
Objects included in the input image are divided into semantic units and the input image is divided into semantic segments, or the input image is divided into depth segments based on a depth value of the input image, or the input image is divided into semantic segments. Apparatus for calculating a depth map, dividing into the semantic segments and the depth segments.

delete

According to claim 12,
The processor
An apparatus for calculating a depth map, wherein a first reliability of the semantic segments is calculated based on whether an object included in the input image is a moving object.

delete

According to claim 12,
The processor
and selecting pixels from the selected segment based on the fused reliability, and estimating pose information of the camera from the selected pixels.

According to claim 17,
The processor
and selecting pixels from the selected segment to be proportional to the fused confidence.

According to claim 12,
The input image includes frames,
The processor
wherein reliability of the segments is calculated for each key frame among the frames.

According to claim 12,
The processor
Apparatus for calculating a depth map, estimating pose information of the camera by applying a cost function to the selected segment.