KR102366172B1

KR102366172B1 - Method and apparatus for identifying image mathcihng

Info

Publication number: KR102366172B1
Application number: KR1020200045739A
Authority: KR
Inventors: 전문구; 꾸앙 빈 딘; 장용준
Original assignee: 광주과학기술원
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2022-02-22
Anticipated expiration: 2040-04-16
Also published as: KR20210128076A

Abstract

본 개시는 이미지의 정합 여부를 식별하는 방법 및 상기 이미지의 정합 여부를 식별하는 방법을 수행하는 이미지 정합 장치에 관한 것이다. 일 실시 예에 의하면, 이미지의 정합 여부를 식별하는 방법은 서로 다른 파장 대역의 광들로부터 제1 이미지 패치 및 제2 이미지 패치를 획득하는 단계; 상기 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환하는 단계; 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치로부터 특징 벡터를 추출하는 단계; 및 상기 추출된 특징 벡터를 제1 신경망 모델에 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별하는 단계; 를 포함할 수 있다.The present disclosure relates to a method for identifying whether an image is matched, and an image matching apparatus for performing a method for identifying whether an image is matched. According to an embodiment, a method of identifying whether an image is matched may include: acquiring a first image patch and a second image patch from lights of different wavelength bands; converting the first image patch and the second image patch to exhibit image characteristics of a preset wavelength band; extracting a feature vector from the transformed first image patch and the transformed second image patch; and inputting the extracted feature vector into a first neural network model to identify whether the first image patch and the second image patch match; may include

Description

METHOD AND APPARATUS FOR IDENTIFYING IMAGE MATHCIHNG}

본 개시는 이미지의 정합 여부를 식별하는 방법 및 장치에 관한 것이다. 보다 상세하게는, 인공 신경망을 이용하여 이미지의 정합 여부를 식별하는 방법 및 장치에 관한 것이다.The present disclosure relates to a method and apparatus for identifying whether an image is matched. More particularly, it relates to a method and apparatus for identifying whether images are matched using an artificial neural network.

인공 신경망(Artificial Neural Network)는 인공 뉴런들의 상호 연결된 집합들을 구현하기 위하여 컴퓨팅 기기 또는 컴퓨팅 기기에 의해서 수행되는 방법을 지칭할 수 있다. 인공 신경망의 일 실시 예로, 심층 신경망(Deep Neural Network) 또는 딥 러닝(Deep Learning)은 멀티 레이어 구조를 가질 수 있고, 레이어들 각각이 다수의 데이터에 따라 학습될 수 있다.An artificial neural network may refer to a computing device or a method performed by a computing device to implement interconnected sets of artificial neurons. As an example of the artificial neural network, a deep neural network or deep learning may have a multi-layer structure, and each of the layers may be learned according to a plurality of data.

최근 인공 신경망 기술의 개발이 활성화 됨에 따라, 인공 지능 분야에서 이미지를 자동으로 인식하기 위한 기술이 활발히 연구되고 있다. 특히, 서로 다른 파장 영역의 광을 복합적으로 이용하는 다중 스펙트럼(multi spectrum) 영상들 사이의 정합 여부를 정확하게 식별하기 위한 교차 스펙트럼(cross-spectrum) 영상 정합 기술 역시 활발히 연구되고 있다.Recently, as the development of artificial neural network technology is activated, a technology for automatically recognizing an image is being actively studied in the field of artificial intelligence. In particular, a cross-spectrum image matching technique for accurately discriminating whether or not matching between multi-spectrum images using light of different wavelength ranges is actively being studied.

그러나, 서로 다른 파장 영역의 광을 이용하는 다중 스펙트럼 이미지를 처리하는 시스템에서, 각 이미지들은 상당히 큰 이미지 특성 차이를 나타내기 때문에, 서로 다른 스펙트럼의 이미지가 정합되는지 여부를 식별 하기 어려운 문제점이 있었다.However, in a system for processing multi-spectral images using light of different wavelength regions, since each image shows a fairly large difference in image characteristics, there is a problem in that it is difficult to identify whether images of different spectra are matched.

따라서, 서로 다른 스펙트럼 이미지의 정합을 효과적으로 식별하기 위한 기술 개발이 요구되고 있다.Therefore, there is a demand for technology development for effectively identifying the matching of different spectral images.

한국공개특허 제2019-0097640호Korean Patent Publication No. 2019-0097640

일 실시 예에 따르면, 이미지의 정합 여부를 식별하는 방법 및 이미지 정합 장치가 제공될 수 있다.According to an embodiment, a method and an image matching apparatus for identifying whether an image is matched may be provided.

또한, 일 실시 예에 의하면, 적어도 하나의 신경망 모델을 이용하여 이미지의 정합 여부를 식별하는 방법 및 이미지 정합 장치가 제공될 수 있다.Also, according to an embodiment, a method and an image matching apparatus for identifying whether an image is matched by using at least one neural network model may be provided.

상술한 기술적 과제를 달성하기 위한 본 개시의 일 실시 예에 따라, 서로 다른 파장 대역의 광들로부터 제1 이미지 패치 및 제2 이미지 패치를 획득하는 단계; 상기 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환하는 단계; 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치로부터 특징 벡터를 추출하는 단계; 및 상기 추출된 특징 벡터를 제1 신경망 모델에 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별하는 단계; 를 포함하는 이미지의 정합 여부를 식별하는 방법이 제공될 수 있다.According to an embodiment of the present disclosure for achieving the above-described technical problem, the method comprising: obtaining a first image patch and a second image patch from lights of different wavelength bands; converting the first image patch and the second image patch to exhibit image characteristics of a preset wavelength band; extracting a feature vector from the transformed first image patch and the transformed second image patch; and inputting the extracted feature vector into a first neural network model to identify whether the first image patch and the second image patch match; A method of identifying whether an image including a match may be provided.

또한, 일 실시 예에 의하면, 상기 획득된 제1 이미지 패치 및 상기 획득된 제2 이미지 패치로부터 특징 벡터를 추출하는 단계; 를 더 포함하고, 상기 정합 여부를 식별하는 단계는 상기 획득된 제1 이미지 패치, 상기 획득된 제2 이미지 패치 각각으로부터 추출된 특징 벡터를 제1 신경망 모델에 더 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별하는 것을 특징으로 하는 이미지의 정합 여부를 식별하는 방법이 제공될 수 있다.In addition, according to an embodiment, extracting a feature vector from the obtained first image patch and the obtained second image patch; The method further includes, wherein the step of identifying whether the match is performed comprises further inputting a feature vector extracted from each of the obtained first image patch and the obtained second image patch into a first neural network model, whereby the first image patch and There may be provided a method of identifying whether an image is matched, characterized in that it is identified whether the second image patch is matched.

또한, 상기 기술적 과제를 해결하기 위한 본 개시의 또 다른 실시 예에 따라, 하나의 인스트럭션을 저장하는 메모리; 및 상기 하나 이상의 인스트럭션을 실행하는 적어도 하나의 프로세서; 를 포함하고, 상기 적어도 하나의 프로세서는 상기 하나 이상의 인스트럭션을 실행함으로써, 서로 다른 파장 대역의 광들로부터 제1 이미지 패치 및 제2 이미지 패치를 획득하고, 상기 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환하고, 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치로부터 특징 벡터를 추출하고, 상기 추출된 특징 벡터를 제1 신경망 모델에 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별하는, 영상 정합 장치가 제공될 수 있다.In addition, according to another embodiment of the present disclosure for solving the above technical problem, a memory for storing one instruction; and at least one processor executing the one or more instructions. including, wherein the at least one processor obtains a first image patch and a second image patch from lights of different wavelength bands by executing the one or more instructions, and pre-processes the first image patch and the second image patch. By converting to represent image characteristics of a set wavelength band, extracting a feature vector from the converted first image patch and the converted second image patch, and inputting the extracted feature vector to a first neural network model, the first An image matching apparatus may be provided that identifies whether the first image patch and the second image patch are matched.

또한, 상기 기술적 과제를 해결하기 위한 본 개시의 또 다른 실시 예에 따라, 서로 다른 파장 대역의 광들로부터 제1 이미지 패치 및 제2 이미지 패치를 획득하는 단계; 상기 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환하는 단계; 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치로부터 특징 벡터를 추출하는 단계; 및 상기 추출된 특징 벡터를 제1 신경망 모델에 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별하는 단계; 를 포함하는 이미지의 정합 여부를 식별하는 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체가 제공될 수 있다.In addition, according to another embodiment of the present disclosure for solving the above technical problem, the method comprising: obtaining a first image patch and a second image patch from lights of different wavelength bands; converting the first image patch and the second image patch to exhibit image characteristics of a preset wavelength band; extracting a feature vector from the transformed first image patch and the transformed second image patch; and inputting the extracted feature vector into a first neural network model to identify whether the first image patch and the second image patch match; A computer-readable recording medium in which a program for executing a method of identifying whether an image is matched, including a computer-readable recording medium, may be provided.

도 1은 일 실시 예에 따른 이미지의 정합 여부를 식별하는 방법을 개략적으로 설명하기 위한 도면이다.
도 2는 일 실시 예에 따른 이미지 정합 장치가, 이미지의 정합 여부를 식별하는 방법을 나타내는 흐름도이다.
도 3은 일 실시 예에 따른 이미지 정합 장치가 이미지의 정합 여부를 식별하는 방법을 설명하기 위한 도면이다.
도 4는 또 다른 실시 예에 따른 이미지 정합 장치가 이미지의 정합 여부를 식별하는 방법을 나타내는 흐름도이다.
도 5는 또 다른 실시 예에 따른 이미지 정합 장치가 이미지의 정합 여부를 식별하는 방법을 설명하기 위한 도면이다.
도 6은 일 실시 예에 따른 이미지 정합 장치의 구조를 설명하기 위한 도면이다.
도 7은 일 실시 예에 따른 이미지 정합 장치가 이미지 패치를 변환하기 위해 이용하는 신경망 모델의 구조를 설명하기 위한 도면이다.
도 8은 일 실시 예에 따른 이미지 정합 장치가 이미지 패치로부터 특징 벡터를 추출하기 위해 이용하는 신경망 모델의 구조를 설명하기 위한 도면이다.
도 9는 일 실시 예에 따른 이미지 정합 장치가 신경망 모델을 이용하여 이미지 패치를 변환한 결과를 설명하기 위한 도면이다.
도 10은 일 실시 예에 따른 이미지 정합 장치의 블록도이다.
도 11은 일 실시 예에 따른 이미지 정합 장치와 연결되는 서버의 블록도이다.
도 12는 일 실시 예에 따라, 가시광선 이미지 패치 및 근적외선 이미지 패치 쌍에 기초하여, 학습된 신경망 모델의 성능을 비교하기 위한 도면이다.
도 13은 일 실시 예에 따라 가시광선 이미지 패치 및 열화상 이미지 패치 쌍에 기초하여 학습된 신경망 모델의 성능을 비교하기 위한 도면이다.
도 14는 일 실시 예에 따라 RGB 이미지 패치 및 근적외선 이미지 패치 쌍에 기초하여 학습된 신경망 모델의 성능을 비교하기 위한 도면이다.
도 15는 일반적인 샴 네트워크 구조의 신경망 모델, 본 개시의 실시 예에 따른 신경망 모델의 성능을 비교하기 위한 도면이다.1 is a diagram schematically illustrating a method of identifying whether images are matched, according to an exemplary embodiment.
2 is a flowchart illustrating a method of identifying whether an image is matched by an image matching apparatus according to an exemplary embodiment.
3 is a diagram for describing a method of identifying whether an image is matched by an image matching apparatus according to an exemplary embodiment.
4 is a flowchart illustrating a method of identifying whether an image is matched by an image matching apparatus according to another exemplary embodiment.
5 is a diagram for explaining a method of identifying whether an image is matched by an image matching apparatus according to another exemplary embodiment.
6 is a diagram for describing a structure of an image matching apparatus according to an exemplary embodiment.
7 is a diagram for describing a structure of a neural network model used by an image matching apparatus to convert an image patch, according to an embodiment.
8 is a diagram for explaining the structure of a neural network model used by an image matching apparatus to extract a feature vector from an image patch according to an embodiment.
9 is a diagram for explaining a result of converting an image patch using a neural network model by an image matching apparatus according to an exemplary embodiment.
10 is a block diagram of an image matching apparatus according to an exemplary embodiment.
11 is a block diagram of a server connected to an image matching apparatus according to an exemplary embodiment.
12 is a diagram for comparing performance of a trained neural network model based on a pair of a visible ray image patch and a near-infrared image patch, according to an embodiment.
13 is a diagram for comparing the performance of a neural network model learned based on a pair of a visible light image patch and a thermal image patch according to an embodiment.
14 is a diagram for comparing the performance of a neural network model trained based on a pair of RGB image patches and near-infrared image patches, according to an embodiment.
15 is a diagram for comparing the performance of a neural network model of a general Siamese network structure and a neural network model according to an embodiment of the present disclosure.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 개시에 대해 구체적으로 설명하기로 한다. Terms used in this specification will be briefly described, and the present disclosure will be described in detail.

본 개시에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다. The terms used in the present disclosure have been selected as currently widely used general terms as possible while considering the functions in the present disclosure, but these may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, and the like. In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding invention. Therefore, the terms used in the present disclosure should be defined based on the meaning of the term and the contents of the present disclosure, rather than the simple name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the entire specification, when a part "includes" a certain element, this means that other elements may be further included, rather than excluding other elements, unless otherwise stated. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software, or a combination of hardware and software. .

아래에서는 첨부한 도면을 참고하여 본 개시의 실시예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, the embodiments of the present disclosure will be described in detail so that those of ordinary skill in the art to which the present disclosure pertains can easily implement them. However, the present disclosure may be implemented in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present disclosure in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

도 1은 일 실시 예에 따른 이미지의 정합 여부를 식별하는 방법을 개략적으로 설명하기 위한 도면이다.1 is a diagram schematically illustrating a method of identifying whether images are matched, according to an exemplary embodiment.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 인공 신경망 모델 (Artificial Neural Network Model)(110)을 포함할 수 있다. 일 실시 예에 의하면, 인공 신경망 모델은 적어도 하나의 신경망 모델을 포함할 수 있다. 이미지 정합 장치(1000)는 이미지 패치 쌍을 획득하고, 인공 신경망 모델을 이용하여 획득된 이미지 패치 쌍의 정합(matching) 여부를 식별할 수 있다.According to an embodiment, the image matching apparatus 1000 may include an artificial neural network model 110 . According to an embodiment, the artificial neural network model may include at least one neural network model. The image matching apparatus 1000 may acquire an image patch pair, and identify whether the acquired image patch pair matches by using an artificial neural network model.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 이미지 패치 쌍이 서로 정합되는지 여부를 식별하기 위해, 이미지 패치로부터 특징 벡터를 추출하며, 추출된 특징 벡터에 기초하여 이미지 패치의 정합 여부를 식별할 수 있다. 이미지 정합 장치(1000)는 이미지 쌍 각각에 포함된 이미지 패치의 정합 여부에 기초하여, 이미지 쌍의 정합(matching) 여부를 식별할 수도 있다. 이미지 정합 장치(1000)는 신경망 모델을 이용하여 이미지를 처리하기 위한 AI 프로그램이 탑재되고, 이미지 촬영 기능을 포함하는 스마트폰, 태블릿 PC, PC, 스마트 TV, 휴대폰, 미디어 플레이어, 서버, 마이크로 서버, 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다.According to an embodiment, the image matching apparatus 1000 may extract a feature vector from the image patch to identify whether the image patch pair matches each other, and identify whether the image patch matches each other based on the extracted feature vector. there is. The image matching apparatus 1000 may identify whether the image pairs match based on whether the image patches included in each image pair match. The image matching device 1000 is equipped with an AI program for processing an image using a neural network model, and includes a smart phone, a tablet PC, a PC, a smart TV, a mobile phone, a media player, a server, a micro server, including an image shooting function, It may be, but is not limited to, any other mobile or non-mobile computing device.

일 실시 예에 의하면, 이미지 정합 장치(1000)가 이용하는 인공 신경망 모델은 생물학적 신경망에 착안된 컴퓨팅 시스템을 지칭할 수 있다. 인공 신경망은 미리 정의된 조건에 따라 작업을 수행하는 고전적인 알고리즘과 달리, 다수의 샘플들을 고려함으로써 작업을 수행하는 것을 학습할 수 있다. 인공 신경망은 인공 뉴런(neuron)들이 연결된 구조를 가질 수 있고, 뉴런들 간의 연결은 시냅스(synapse)로 지칭될 수 있다. 뉴런은 수신된 신호를 처리할 수 있고, 처리된 신호를 시냅스를 통해서 다른 뉴런에 전송할 수 있다. 뉴런의 출력은 액티베이션(activation)으로 지칭될 수 있고, 뉴런 및/또는 시냅스는 변동될 수 있는 가중치(weight)를 가질 수 있으며, 가중치에 따라 뉴런에 의해 처리된 신호의 영향력이 증가하거나 감소할 수 있다.According to an embodiment, the artificial neural network model used by the image matching apparatus 1000 may refer to a computing system focusing on a biological neural network. Unlike classical algorithms that perform tasks according to predefined conditions, artificial neural networks can learn to perform tasks by considering a large number of samples. An artificial neural network may have a structure in which artificial neurons are connected, and a connection between neurons may be referred to as a synapse. A neuron may process a received signal, and may transmit the processed signal to another neuron through a synapse. The output of a neuron may be referred to as activation, and a neuron and/or synapse may have a weight that can be varied, and the influence of a signal processed by the neuron may increase or decrease depending on the weight. .

예를 들어, 인공 신경망은 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들(weight values, weights)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공 신경망의 학습 결과에 의해 최적화될 수 있다. For example, the artificial neural network may be composed of a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values (weights), and a neural network operation is performed through an operation between an operation result of a previous layer and a plurality of weights. The plurality of weights of the plurality of neural network layers may be optimized by the learning result of the artificial neural network.

예를 들어, 학습 과정 동안 신경망 모델에서 획득한 손실(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 수정 및 갱신될 수 있다. 본 개시에 따른 신경망 모델은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다.For example, a plurality of weights may be modified and updated so that a loss value or a cost value obtained from the neural network model during the learning process is reduced or minimized. The neural network model according to the present disclosure may include a deep neural network (DNN), for example, a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), Restricted (RBM) Boltzmann Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), or deep Q-Networks, but is not limited to the above-described example.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 이미지(122) 및 제2 이미지(142)를 획득할 수 있다. 일 실시 예에 의하면, 제1 이미지는 제1 도메인(120)에 해당하는 이미지이고, 제2 이미지(142)는 제2 도메인(140)에 해당하는 이미지일 수 있다. 예를 들어, 제1 도메인(120) 및 제2 도메인(140)은 이미지를 생성하기 위해 이용되는 연속적 또는 불연속적인 파장 대역에 대응될 수 있다. 본 명세서에서 기술되는 제1 도메인, 제2 도메인, 제3 도메인 또는 제4 도메인은, 광의 파장 길이를 기초로, 광 스펙트럼 영역을 구분하기 위한 기준이 될 수 있다.According to an embodiment, the image matching apparatus 1000 may acquire the first image 122 and the second image 142 . According to an embodiment, the first image may be an image corresponding to the first domain 120 , and the second image 142 may be an image corresponding to the second domain 140 . For example, the first domain 120 and the second domain 140 may correspond to a continuous or discontinuous wavelength band used to generate an image. The first domain, the second domain, the third domain, or the fourth domain described herein may be a criterion for classifying a light spectrum region based on a wavelength length of light.

예를 들어, 제1 도메인(120)은 가시 광선 영역(예컨대 광의 파장이 380nm~780nm인 영역)에 속하는 광의 스펙트럼(spectrum)을 나타낼 수 있고, 제2 도메인(140)은 근적외선 영역(예컨대 광의 파장이 0.75um~1um인 영역)에 속하는 광의 스펙트럼을 나타낼 수 있다. 또 다른 실시 예에 의하면, 제1 도메인(120)은 RGB 영역의 파장으로써, 광의 파장이 400nm~500nm, 450nm~630nm, 500~650nm인 영역에 속하는 광의 스펙트럼을 포함할 수 있으나, 이에 한정되는 것은 아니며, 서로 다른 파장 영역을 나타내기 위해 임의로 설정될 수 있음은 물론이다.For example, the first domain 120 may represent a spectrum of light belonging to a visible ray region (eg, a region in which the wavelength of light is 380 nm to 780 nm), and the second domain 140 may indicate a near-infrared region (eg, a wavelength of light). It can represent the spectrum of light belonging to this 0.75um ~ 1um region). According to another embodiment, the first domain 120 is a wavelength of the RGB region, and may include a spectrum of light belonging to a region having a wavelength of 400 nm to 500 nm, 450 nm to 630 nm, and 500 to 650 nm, but is limited thereto. It goes without saying that it may be arbitrarily set to indicate different wavelength regions.

일 실시 예에 의하면, 이미지 정합 장치(1000)가 이용하는 제1 이미지 패치 및 제2 이미지 패치가 속하는 제1 도메인 및 제2 도메인은 각각이 서로 다른 파장 대역에 속하는 광들의 스펙트럼인 다중 스펙트럼(multi spectrum)을 나타낼 수도 있다.According to an embodiment, the first domain and the second domain to which the first image patch and the second image patch used by the image matching apparatus 1000 belong are multi-spectrum spectrums of lights belonging to different wavelength bands, respectively. ) can also be shown.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 이미지(122)로부터 제1 이미지 패치(124)를 결정하고, 제2 이미지(142)로부터 제2 이미지 패치(144)를 결정할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 이미지(122)의 픽셀 값에 기초하여 제1 이미지로부터 제1 특징점을 추출하고, 추출된 제1 특징점을 포함하는 제1 이미지의 일부 영역을 제1 이미지 패치로 결정할 수 있다. 또한, 이미지 정합 장치(1000)는 제2 이미지(142)의 픽셀 값에 기초하여 제2 이미지로부터 제2 특징점을 추출하고, 추출된 제2 특징점을 포함하는 제2 이미지의 일부 영역을 제2 이미지 패치(144)로 결정할 수 있다. 이미지 정합 장치(1000)가 결정한 제1 이미지 패치는 제1 도메인에 속할 수 있으며, 제2 이미지 패치는 제2 도메인에 속할 수 있다. According to an embodiment, the image matching apparatus 1000 may determine the first image patch 124 from the first image 122 and determine the second image patch 144 from the second image 142 . For example, the image matching apparatus 1000 extracts a first feature point from the first image based on the pixel value of the first image 122 and generates a partial region of the first image including the extracted first feature point. It can be determined by 1 image patch. In addition, the image matching apparatus 1000 extracts a second feature point from the second image based on the pixel value of the second image 142 , and converts a partial region of the second image including the extracted second feature point into the second image. It can be determined by patch 144 . The first image patch determined by the image matching apparatus 1000 may belong to the first domain, and the second image patch may belong to the second domain.

이미지 정합 장치(1000)가 획득한 제1 이미지 패치 및 제2 이미지 패치는 제1 이미지 및 제2 이미지 각각과 동일한 도메인에 속하고, 제1 이미지 패치는 제1 이미지의 파장 대역의 이미지 특성과 동일한 이미지 특성을 가지고, 제2 이미지 패치는 제2 이미지의 파장 대역의 이미지 특성과 동일한 이미지 특성을 가질 수 있다.The first image patch and the second image patch acquired by the image matching apparatus 1000 belong to the same domain as each of the first image and the second image, and the first image patch has the same image characteristics as the wavelength band of the first image. With the image characteristics, the second image patch may have the same image characteristics as those of the wavelength band of the second image.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 획득된 이미지 패치의 도메인을 변환(102) 기능을 수행할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 이미지 패치(124), 제2 이미지 패치(144)를 획득하고, 획득된 제1 이미지 패치 및 제2 이미지 패치가 속하는 도메인을 변환할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 이미지 패치의 도메인을 변환함으로써 제2 도메인에 속하는 이미지 특성이 나타나도록 제1 이미지 패치를 변환할 수 있고, 제2 이미지의 패치의 도메인을 변환함으로써 제1 도메인에 속하는 이미지 특성이 나타나도록 제2 이미지 패치를 변환할 수 있다. 이미지 정합 장치(1000)가 이미지 패치의 도메인을 변환하는 과정은 타겟 도메인에 대응되는 파장 대역의 이미지 특성을 나타내도록, 이미지 패치 내 픽셀 값들을 변환하는 동작에 대응될 수 있다.According to an embodiment, the image matching apparatus 1000 may perform the function of transforming the domain of the obtained image patch (102). For example, the image matching apparatus 1000 may obtain the first image patch 124 and the second image patch 144 , and may transform domains to which the obtained first image patch and the second image patch belong. For example, the image matching apparatus 1000 may transform the first image patch so that image characteristics belonging to the second domain appear by transforming the domain of the first image patch, and by transforming the domain of the patch of the second image The second image patch may be transformed so that image characteristics belonging to the first domain appear. The process of converting the domain of the image patch by the image matching apparatus 1000 may correspond to the converting of pixel values in the image patch to indicate image characteristics of a wavelength band corresponding to the target domain.

예를 들어, 이미지 정합 장치(1000)는 제1 파장 대역의 광에 의하여 생성되는 제1 이미지 패치가, 제2 파장 대역의 이미지 특성을 나타내도록 제1 이미지 패치를 변환할 수 있고, 제2 파장 대역의 광에 의하여 생성되는 제2 이미지 패치가, 제1 파장 대역의 이미지 특성을 나타내도록 제2 이미지 패치를 변환할 수 있다. 일 실시 예에 의하면 제2 파장 대역의 이미지 특성은, 제2 파장 대역에 속하는 광을 기초로 생성되는 이미지들 내 픽셀 값들의 세기, 수준, 패턴 등을 의미할 수 있고, 제1 파장 대역의 이미지 특성은, 제1 파장 대역에 속하는 광을 기초로 생성되는 이미지들 내 픽셀 값들의 세기, 수준 패턴 등을 의미할 수 있다.For example, the image matching apparatus 1000 may convert the first image patch so that the first image patch generated by the light of the first wavelength band exhibits image characteristics of the second wavelength band, and the second wavelength band The second image patch may be converted so that the second image patch generated by the light of the band exhibits image characteristics of the first wavelength band. According to an embodiment, the image characteristic of the second wavelength band may mean the intensity, level, pattern, etc. of pixel values in images generated based on light belonging to the second wavelength band, and the image of the first wavelength band The characteristic may mean intensity, level pattern, etc. of pixel values in images generated based on light belonging to the first wavelength band.

이미지 정합 장치(1000)는 소정의 이미지 패치를 이용하여 특징 벡터 추출(104) 기능을 수행할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 획득된 이미지 패치 또는 다른 도메인으로 변환된 이미지 패치로부터 특징 벡터를 추출할 수 있다. 또한, 이미지 정합 장치(1000)는 각 이미지 패치로부터 추출된 특징 벡터를 이용하여, 이미지 패치 쌍이 정합되는지 여부를 판단하는 정합 식별(106) 기능을 수행할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 추출된 특징 벡터에 기초하여 이미지 패치 쌍이 정합되는지 여부에 관한 정합 결과(152)를 획득할 수 있다. 예를 들어, 이미지 정합 장치(1000)가 획득하는 정합 결과는 이진수로 표현된 정합 결과를 포함할 수 있고, 이미지 패치 쌍의 정합 정도에 대한 확률 값에 대한 정보를 더 포함할 수도 있다.The image matching apparatus 1000 may perform a feature vector extraction 104 function using a predetermined image patch. For example, the image matching apparatus 1000 may extract a feature vector from an acquired image patch or an image patch converted to another domain. In addition, the image matching apparatus 1000 may perform the matching identification 106 function of determining whether an image patch pair is matched by using a feature vector extracted from each image patch. For example, the image matching apparatus 1000 may obtain a matching result 152 regarding whether an image patch pair is matched based on the extracted feature vector. For example, the matching result obtained by the image matching apparatus 1000 may include a matching result expressed in a binary number, and may further include information on a probability value for a matching degree of an image patch pair.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 이미지 패치 쌍의 정합 여부를 식별한 결과에 기초하여, 각 이미지 패치를 포함하는 이미지 전체의 정합 여부를 식별할 수 있다. 본 개시에 따른 정합(matching)은 이미지 내 픽셀들의 위치, 해당 위치에서의 픽셀 값, 소정의 영역에서 픽셀 값들의 변화량의 유사 정도를 나타낼 수 있다.According to an embodiment, the image matching apparatus 1000 may identify whether the entire image including each image patch matches, based on a result of identifying whether the image patch pairs match. Matching according to the present disclosure may indicate the similarity of positions of pixels in an image, pixel values at the corresponding positions, and variations in pixel values in a predetermined region.

일반적으로, 서로 다른 파장 대역의 스펙트럼을 가지는 광으로부터 생성된 이미지들은 픽셀의 수준(level) 및 세기(intensity)모두에서 상당히 차이가 크기 때문에, 서로 다른 파장 대역의 스펙트럼에 속하는 이미지들의 정합(matching) 여부를 정확하게 식별하기 위하여 샴 네트워크 구조의 합성곱 기반 신경망 모델이 사용될 수 있다. 그러나, 여전히 일반적인 샴 네트워크 구조의 합성곱 기반 신경망 모델은 서로 다른 파장 대역의 광에 의하여 생성된 이미지들 간의 픽셀 세기 및 픽셀 수준(level)의 차이를 정확하게 처리하지 못하기 때문에 서로 다른 파장 대역의 스펙트럼에 속하는 이미지들의 정합 여부를 식별하는데 한계가 있었다.In general, since images generated from light having spectra of different wavelength bands have a significant difference in both pixel level and intensity, matching of images belonging to spectra of different wavelength bands A convolution-based neural network model of a Siamese network structure can be used to accurately identify whether or not However, since the convolution-based neural network model of the general Siamese network structure does not accurately process the difference in pixel intensity and pixel level between images generated by light of different wavelength bands, the spectrum of different wavelength bands There was a limit in identifying whether the images belonging to .

그러나, 본 개시에 따른 이미지 정합 장치(1000)는 이미지 패치들로부터 직접 특징 벡터를 추출하는 것이 아니고, 이미지 패치들의 도메인을 변환한 후, 듀얼 샴(dual siamese) 네트워크 구조의 신경망 모델을 이용하여, 변환된 도메인을 가지는 이미지 패치들 및 변환되지 않은 도메인을 가지는 이미지 패치들로부터 특징 벡터를 추출함으로써, 서로 다른 파장 대역의 스펙트럼을 가지는 광으로부터 생성된 이미지들의 정합여부를 정확하게 식별할 수 있다.However, the image matching apparatus 1000 according to the present disclosure does not extract a feature vector directly from image patches, but converts the domains of the image patches, and then uses a neural network model of a dual siamese network structure, By extracting feature vectors from the image patches having the converted domain and the image patches having the unconverted domain, it is possible to accurately identify whether images generated from light having different wavelength bands match.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 서버(2000)와 연결될 수 있다. 일 실시 예에 의하면, 이미지 정합 장치(1000)는 서버(2000)와 연동되어, 이미지 패치 쌍의 정합 여부를 식별할 수도 있다. 서버(2000)는 네트워크를 통하여 이미지 정합 장치(1000)와 연결됨으로써, 이미지 정합 장치와 데이터를 송수신할 수 있는 기타 컴퓨팅 장치를 포함할 수 있다. 일 실시 예에 의하면, 서버(2000)는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 부가가치 통신망(Value Added Network; VAN), 이동 통신망(mobile radio communication network), 위성 통신망 및 이들의 상호 조합을 통하여 이미지 정합 장치(1000)와 연결될 수 있다. 또한, 일 실시 예에 의하면, 서버(2000)는 도 1에 도시된 각 네트워크 구성 주체(예컨대 이미지 정합 장치 및 서버)가 서로 원활하게 통신을 할 수 있도록 하는 포괄적인 의미의 데이터 통신망이며, 유선 인터넷, 무선 인터넷 및 모바일 무선 통신망 자체를 포함할 수 도 있다.According to an embodiment, the image matching apparatus 1000 may be connected to the server 2000 . According to an embodiment, the image matching apparatus 1000 may work with the server 2000 to identify whether an image patch pair is matched. The server 2000 may include other computing devices capable of transmitting/receiving data to and from the image matching device by being connected to the image matching device 1000 through a network. According to an embodiment, the server 2000 includes a local area network (LAN), a wide area network (WAN), a value added network (VAN), a mobile radio communication network, It may be connected to the image matching apparatus 1000 through a satellite communication network and a combination thereof. In addition, according to an embodiment, the server 2000 is a data communication network in a comprehensive sense that allows each network constituent entity (eg, an image matching device and a server) shown in FIG. 1 to communicate smoothly with each other, and is a wired Internet. , wireless Internet and mobile wireless communication networks themselves.

도 2는 일 실시 예에 따른 이미지 정합 장치가, 이미지의 정합 여부를 식별하는 방법을 나타내는 흐름도이다.2 is a flowchart illustrating a method of identifying whether an image is matched by an image matching apparatus according to an exemplary embodiment.

S210에서, 이미지 정합 장치(1000)는 서로 다른 파장 대역의 광들로부터 제1 이미지 패치 및 제2 이미지 패치를 획득할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 가시 광선 파장 영역에 속하는 광들에 의하여 생성되는 제1 이미지(예컨대 가시광선 이미지) 및 근적외선 파장 영역에 속하는 광들에 의하여 생성되는 제2 이미지(예컨데 근적외선 이미지)를 획득하고, 획득된 각각의 이미지 내 픽셀 값의 변화량에 기초하여 특징점을 추출하며, 추출된 특징점에 기초하여, 제1 이미지 및 제2 이미지 각각으로부터 제1 이미지 패치 및 제2 이미지 패치를 획득할 수 있다. In S210 , the image matching apparatus 1000 may obtain a first image patch and a second image patch from lights of different wavelength bands. For example, the image matching apparatus 1000 may include a first image (eg, a visible ray image) generated by lights belonging to a visible ray wavelength region and a second image (eg, a near-infrared image) generated by lights belonging to a near-infrared wavelength region. , extracting feature points based on the amount of change in pixel values in each acquired image, and obtaining a first image patch and a second image patch from each of the first image and the second image based on the extracted feature points. can

또 다른 실시 예에 의하면, 이미지 정합 장치(1000)는 이미지 정합 장치와 연결된, 가시광선 이미지를 촬영하는 가시광선 카메라로부터 제1 이미지를 획득하고, 근적외선 이미지를 촬영하는 근적외선 카메라로부터 제2 이미지를 획득하며, 획득된 제1 이미지 및 제2 이미지 각각으로부터 제1 이미지 패치 및 제2 이미지 패치를 각각 획득할 수도 있다.According to another embodiment, the image matching apparatus 1000 acquires a first image from a visible ray camera that captures a visible ray image connected to the image matching apparatus, and acquires a second image from a near infrared camera that captures a near infrared image In addition, the first image patch and the second image patch may be respectively obtained from the obtained first image and the second image, respectively.

S220에서, 이미지 정합 장치(1000)는 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 파장 대역의 광에 의하여 생성되는 제1 이미지 패치가, 제2 파장 대역의 이미지 특성을 나타내도록 제1 이미지 패치를 변환하고, 제2 파장 대역의 광에 의하여 생성되는 제2 이미지 패치가, 제1 파장 대역의 이미지 특성을 나타내도록 제2 이미지 패치를 변환할 수 있다. 즉, 이미지 정합 장치(1000)는 제1 이미지 패치의 도메인을 제1 도메인에서 제2 도메인으로 변환하고, 제2 이미지 패치의 도메인을 제2 도메인에서 제1 도메인으로 변환할 수 있다.In S220 , the image matching apparatus 1000 may convert the first image patch and the second image patch to represent image characteristics of a preset wavelength band. For example, the image matching apparatus 1000 converts the first image patch so that the first image patch generated by the light of the first wavelength band exhibits image characteristics of the second wavelength band, and The second image patch may be converted so that the second image patch generated by the light exhibits image characteristics of the first wavelength band. That is, the image matching apparatus 1000 may convert the domain of the first image patch from the first domain to the second domain, and may convert the domain of the second image patch from the second domain to the first domain.

그러나 또 다른 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 이미지 패치가, 제3 파장 대역의 이미지 특성을 나타내도록 제1 이미지 패치를 변환하고, 제2 이미지 패치가 제3 파장 대역의 이미지 특성을 나타내도록 제2 이미지 패치를 변환할 수 있다. 즉, 이미지 정합 장치(1000)는 제1 이미지 패치의 도메인을 제1 도메인에서 제3 도메인으로 변환하고, 제2 이미지 패치의 도메인을 제2 도메인에서 제3 도메인으로 변환할 수 있다. 상술한 바와 같이, 이미지 정합 장치(1000)가 이미지 패치의 도메인을 변환하는 동작은, 타겟 도메인에 대응되는 타겟 파장 대역의 이미지 특성을 나타내도록, 이미지 패치의 픽셀 값들을 변환하는 동작에 대응될 수 있다.However, according to another embodiment, the image matching apparatus 1000 converts the first image patch so that the first image patch exhibits image characteristics of the third wavelength band, and the second image patch is an image of the third wavelength band The second image patch may be transformed to exhibit the characteristics. That is, the image matching apparatus 1000 may convert the domain of the first image patch from the first domain to the third domain, and may convert the domain of the second image patch from the second domain to the third domain. As described above, the operation of the image matching apparatus 1000 converting the domain of the image patch may correspond to the operation of converting the pixel values of the image patch to indicate image characteristics of the target wavelength band corresponding to the target domain. there is.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 이미지 패치가 속하는 도메인을, 타겟 도메인으로 변환하도록 미리 학습된 신경망 모델을 이용하여 이미지 패치의 도메인을 변환할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 파장 대역의 이미지 특성을 나타내는 제1 이미지 패치가 입력되는 경우, 제2 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제2 신경망 모델을 이용하여, 제1 이미지 패치를 변환할 수 있다. 또한, 이미지 정합 장치(1000)는 제2 파장 대역의 이미지 특성을 나타내는 제2 이미지 패치가 입력되는 경우, 제1 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제3 신경망 모델을 이용하여 제2 이미지 패치를 변환할 수도 있다.According to an embodiment, the image matching apparatus 1000 may convert the domain of the image patch by using a neural network model previously learned to convert the domain to which the image patch belongs to the target domain. For example, when the first image patch representing the image property of the first wavelength band is input, the image matching apparatus 1000 generates a second neural network model trained to output the image patch representing the image property of the second wavelength band. It can be used to transform the first image patch. Also, when the second image patch representing the image property of the second wavelength band is input, the image matching apparatus 1000 uses the third neural network model trained to output the image patch representing the image property of the first wavelength band. It is also possible to transform the second image patch.

S230에서, 이미지 정합 장치(1000)는 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치로부터 특징 벡터를 추출할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 미리 학습된 제4 신경망 모델을 이용하여 변환된 제1 이미지 패치로부터 제1 특징 벡터를 추출하고, 미리 학습된 제5 신경망 모델을 이용하여 변환된 제2 이미지 패치로부터 제4 특징 벡터를 추출할 수 있다. 일 실시 예에 의하면, 제4 신경망 모델 및 제5 신경망 모델은 소정의 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하기 위해 미리 학습된 합성곱 기반의 신경망 모델일 수 있다.In S230 , the image matching apparatus 1000 may extract a feature vector from the converted first image patch and the converted second image patch. For example, the image matching apparatus 1000 extracts a first feature vector from a first image patch transformed using a pre-trained fourth neural network model, and a second transformed second neural network model using a pre-trained fifth neural network model. A fourth feature vector may be extracted from the image patch. According to an embodiment, the fourth neural network model and the fifth neural network model may be a convolution-based neural network model that is pre-trained in order to extract a feature vector from an image patch belonging to a predetermined domain.

일반적으로, 이미지로부터 특징 벡터를 추출하기 위해 이미지 패치 내 픽셀 값들의 유사성을 기반으로 하는 특징(feature)으로써, SIFT, SURF, FAST 특징 추출 알고리즘 등이 사용될 수 있다. 그러나, 이러한 특징 추출 알고리즘들은, 다중 스펙트럼 이미지에서 픽셀 세기(intensity)의 변화 및 텍스처 불일치(texture inconsistency)의 한계가 있다. 이러한 한계들을 극복하기 위해, 다중 스펙트럼 SIFT, RGB-근적외선 쌍의 통계적 이미지 특성을 분석하는 방법 등이 개시되었으나, 여전히 다중 스펙트럼 이미지 매칭을 위한 정확한 특징 추출에는 한계가 있다. In general, as a feature based on the similarity of pixel values in an image patch to extract a feature vector from an image, a SIFT, SURF, FAST feature extraction algorithm, etc. may be used. However, these feature extraction algorithms have limitations in the change of pixel intensity and texture inconsistency in a multi-spectral image. In order to overcome these limitations, multi-spectral SIFT and methods for analyzing statistical image characteristics of RGB-near-infrared pairs have been disclosed, but there is still a limit to accurate feature extraction for multi-spectral image matching.

그러나, 본 개시에 따른 이미지 정합 장치(1000)는 후술하는 바와 같이, 이미지 패치로부터 직접 특징 벡터를 추출하는 것이 아니라, 이미지 패치의 도메인을 변환하고, 변환된 도메인의 이미지 패치로부터 추출된 특징 벡터 및 변환되지 않은 도메인의 이미지 패치로부터 추출된 특징 벡터를 모두 이용함으로써, 서로 다른 스펙트럼 이미지 간의 매칭을 정확하게 수행할 수 있다.However, as will be described later, the image matching apparatus 1000 according to the present disclosure does not extract the feature vector directly from the image patch, but transforms the domain of the image patch, and the feature vector extracted from the image patch of the converted domain and By using all the feature vectors extracted from image patches in the untransformed domain, matching between different spectral images can be accurately performed.

S240에서, 이미지 정합 장치(1000)는 변환된 제1 이미지 패치 및 변환된 제2 이미지 패치로부터 각각 추출된 특징 벡터들을 제1 신경망 모델에 입력함으로써, 제1 이미지 패치 및 제2 이미지 패치의 정합 여부를 식별할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 변환된 제1 이미지 패치로부터 추출된 제1 특징 벡터, 상기 변환된 제2 이미지 패치로부터 추출된 제4 특징 벡터의 원소 별 거리 값의 제곱을 원소로 포함하는 차이 벡터를 결정하고, 결정된 차이 벡터, 제1 특징 벡터, 및 제4 특징 벡터를 중합(concatenate)함으로써 중합 벡터를 생성할 수 있다. 이미지 정합 장치(1000)는 중합 벡터를 미리 학습된 제1 신경망 모델에 입력하고, 제1 신경망 모델의 출력 값에 기초하여 제1 이미지 패치 및 제2 이미지 패치의 정합 여부를 식별할 수 있다.In S240 , the image matching apparatus 1000 determines whether the first image patch and the second image patch are matched by inputting feature vectors respectively extracted from the transformed first image patch and the transformed second image patch to the first neural network model. can be identified. For example, the image matching apparatus 1000 includes, as an element, the square of a distance value for each element of a first feature vector extracted from the transformed first image patch and a fourth feature vector extracted from the transformed second image patch. The polymerization vector may be generated by determining a difference vector to be used and concatenating the determined difference vector, the first feature vector, and the fourth feature vector. The image matching apparatus 1000 may input the polymerization vector to the pre-trained first neural network model, and identify whether the first image patch and the second image patch match based on an output value of the first neural network model.

도 3은 일 실시 예에 따른 이미지 정합 장치가 이미지의 정합 여부를 식별하는 방법을 설명하기 위한 도면이다.3 is a diagram for describing a method of identifying whether an image is matched by an image matching apparatus according to an exemplary embodiment.

도 3을 참조하여, 일 실시 예에 따른 이미지 정합 장치(1000)가 이미지의 정합 여부를 식별하는 과정을 더 구체적으로 설명하기로 한다. 상술한 바와 같이, 이미지 정합 장치(1000)는 제1 도메인에 속하는 제1 이미지 패치(302) 및 제2 도메인에 속하는 제2 이미지 패치(312)를 획득할 수 있다. 일 실시 예에 의하면, 제1 도메인 및 제2 도메인은 가시광선 영역에 속하는 광의 스펙트럼(spectrum) 및 제2 도메인은 근적외선 영역에 속하는 광의 스펙트럼을 의미할 수 있다. 그러나, 이에 한정되는 것은 아니며, 제1 도메인은 가시광선 영역에 속하는 광의 스펙트럼 및 제2 도메인은 열적외선 영역이 속하는 광의 스펙트럼을 나타낼 수도 있음은 물론이다.Referring to FIG. 3 , a process in which the image matching apparatus 1000 according to an exemplary embodiment identifies whether an image is matched will be described in more detail. As described above, the image matching apparatus 1000 may acquire the first image patch 302 belonging to the first domain and the second image patch 312 belonging to the second domain. According to an embodiment, the first domain and the second domain may mean a spectrum of light belonging to a visible ray region, and the second domain may mean a spectrum of light belonging to a near-infrared region. However, the present invention is not limited thereto, and the first domain may indicate a spectrum of light belonging to a visible ray region and the second domain may indicate a spectrum of light belonging to a thermal infrared region.

이미지 정합 장치(1000)는 제1 이미지 패치(302) 및 제2 이미지 패치(312)를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 파장 대역의 광에 의하여 생성되는 제1 이미지 패치가, 제2 파장 대역의 이미지 특성을 나타내도록 제1 이미지 패치를 변환하고, 제2 파장 대역의 광에 의하여 생성되는 제2 이미지 패치가, 제1 파장 대역의 이미지 특성을 나타내도록 제2 이미지 패치를 변환할 수 있다. 그러나, 또 다른 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 이미지 패치가, 제3 파장 대역의 이미지 특성을 나타내도록 제1 이미지 패치를 변환하고, 제2 이미지 패치가 제3 파장 대역의 이미지 특성을 나타내도록 제2 이미지 패치를 변환할 수 있음은 도 2에서 상술한 바와 같다.The image matching apparatus 1000 may convert the first image patch 302 and the second image patch 312 to represent image characteristics of a preset wavelength band. For example, the image matching apparatus 1000 converts the first image patch so that the first image patch generated by the light of the first wavelength band exhibits image characteristics of the second wavelength band, and The second image patch may be converted so that the second image patch generated by the light exhibits image characteristics of the first wavelength band. However, according to another embodiment, the image matching apparatus 1000 converts the first image patch so that the first image patch exhibits image characteristics of the third wavelength band, and the second image patch converts the second image patch into the image characteristic of the third wavelength band. As described above with reference to FIG. 2, the second image patch can be transformed to represent the image characteristics.

일 실시 예에 의하면, 이미지 정합 장치(1000)가 타겟 파장 대역의 이미지 특성을 나타내도록 이미지 패치를 변환하는 동작은, 이미지 패치의 현재 도메인을 타겟 도메인으로 변환하기 위해, 이미지 패치의 픽셀 값들을 변환하는 동작에 대응될 수 있다.According to an embodiment, the converting of the image patch to indicate the image characteristic of the target wavelength band by the image matching apparatus 1000 may include converting pixel values of the image patch to convert the current domain of the image patch into the target domain. It can correspond to the action.

또한, 일 실시 예에 의하면, 이미지 정합 장치(1000)는 이미지 패치가 속하는 도메인을, 타겟 도메인으로 변환하도록 미리 학습된 신경망 모델을 이용하여 이미지 패치의 도메인을 변환할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 파장 대역의 이미지 특성을 나타내는 제1 이미지 패치가 입력되는 경우, 제2 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제2 신경망 모델(304)을 이용하여, 제1 이미지 패치를 변환할 수 있다. 또한, 이미지 정합 장치(1000)는 제2 파장 대역의 이미지 특성을 나타내는 제2 이미지 패치가 입력되는 경우, 제1 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제3 신경망 모델(314)을 이용하여 제2 이미지 패치를 변환할 수도 있다.Also, according to an embodiment, the image matching apparatus 1000 may convert the domain of the image patch by using a neural network model previously learned to convert the domain to which the image patch belongs to the target domain. For example, the image matching apparatus 1000 receives a second neural network model ( 304) may be used to transform the first image patch. Also, when the second image patch representing the image property of the second wavelength band is input, the image matching apparatus 1000 has a third neural network model 314 trained to output an image patch representing the image property of the first wavelength band may be used to transform the second image patch.

그러나, 또 다른 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 파장 대역의 이미지 특성을 나타내는 제1 이미지 패치가 입력되는 경우, 제3 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제2 신경망 모델(304)을 이용하여, 제1 이미지 패치를 변환할 수도 있고, 제2 파장 대역의 이미지 특성을 나타내는 제2 이미지 패치가 입력되는 경우, 제3 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제3 신경망 모델(314)을 이용하여 제2 이미지 패치를 변환할 수도 있다. However, according to another embodiment, when the first image patch representing the image property of the first wavelength band is input, the image matching apparatus 1000 is trained to output the image patch representing the image property of the third wavelength band. The first image patch may be transformed using the second neural network model 304 , and when a second image patch representing the image property of the second wavelength band is input, the image patch representing the image property of the third wavelength band The second image patch may be transformed using the third neural network model 314 trained to output .

즉, 본 개시에 따른 이미지 정합 장치(1000)는 서로 다른 도메인에 속하는 이미지 패치들의 도메인을 공통 도메인으로 변환할 수도 있지만, 서로 다른 도메인에 속하는 이미지 패치들의 도메인이 상호 교환되도록, 이미지 패치들의 도메인을 변환할 수도 있다.That is, the image matching apparatus 1000 according to the present disclosure may convert domains of image patches belonging to different domains into a common domain, but domains of image patches may be exchanged so that domains of image patches belonging to different domains are exchanged. You can also convert

이미지 정합 장치(1000)는 제2 신경망 모델(304)을 이용하여, 제1 이미지 패치를 변환함으로써, 변환된 제1 이미지 패치(306)를 획득하고, 제3 신경망 모델(314)를 이용하여 제2 이미지 패치를 변환함으로써, 변환된 제2 이미지 패치(316)를 획득할 수 있다. 이미지 정합 장치(1000)는 변환된 제1 이미지 패치(306)로부터 제1 특징 벡터(308)를 추출하고, 변환된 제2 이미지 패치(316)로부터 제2 특징 벡터(318)를 추출할 수 있다. 예를 들어, 도 3에 도시되지는 않았지만, 이미지 정합 장치(1000)는 미리 학습된 적어도 하나의 합성곱 기반의 신경망 모델을 이용하여 변환된 제1 이미지 패치(306)로부터 제1 특징 벡터(308)를 추출하고, 변환된 제2 이미지 패치(316)로부터 제2 특징 벡터(318)를 추출할 수 있다.The image matching apparatus 1000 obtains a converted first image patch 306 by transforming the first image patch using the second neural network model 304 , and uses the third neural network model 314 to obtain the second image patch. By transforming the two image patches, a transformed second image patch 316 may be obtained. The image matching apparatus 1000 may extract a first feature vector 308 from the transformed first image patch 306 and extract a second feature vector 318 from the transformed second image patch 316 . . For example, although not shown in FIG. 3 , the image matching apparatus 1000 uses at least one pre-trained convolution-based neural network model to obtain a first feature vector 308 from a transformed first image patch 306 . ), and a second feature vector 318 may be extracted from the transformed second image patch 316 .

일 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 특징 벡터 및 제2 특징 벡터의 원소 별 차이 값의 제곱을 원소로 포함하는 차이 벡터를 결정하고, 결정된 차이 벡터에 기초하여, 제1 신경망 모델이 이미지 패치의 유사도(similarity)를 출력하도록 학습시킬 수 있다. 예를 들어, 이미지 정합 장치(1000) 이미지 패치의 정합 정도를 확률 값으로 출력하는 제1 신경망 모델을 학습하는 과정은, 제1 특징 벡터 및 제2 특징 벡터의 거리차이에 기초하여 신경망 모델을 metric learning(320) 시키는 과정에 대응될 수 있다.According to an embodiment, the image matching apparatus 1000 determines a difference vector including, as an element, the square of the difference value for each element of the first feature vector and the second feature vector, and based on the determined difference vector, the first neural network The model can be trained to output the similarity of image patches. For example, in the image matching apparatus 1000, the process of learning the first neural network model that outputs the matching degree of the image patch as a probability value is to metric the neural network model based on the distance difference between the first feature vector and the second feature vector. It may correspond to the process of learning (320).

도 4는 또 다른 실시 예에 따른 이미지 정합 장치가 이미지의 정합 여부를 식별하는 방법을 나타내는 흐름도이다.4 is a flowchart illustrating a method of identifying whether an image is matched by an image matching apparatus according to another exemplary embodiment.

도 2 내지 3에서 상술한 바와 달리, 이미지 정합 장치(1000)는 제1 이미지 패치, 제2 이미지 패치, 변환된 제1 이미지 패치 및 변환된 제2 이미지 패치 각각으로부터 추출된 특징 벡터를 이용하여, 제1 이미지 패치 및 제2 이미지 패치의 정합을 식별할 수도 있다. 이하에서는 도 4 내지 5를 참조하여 또 다른 실시 예에 따른 이미지의 정합 여부를 식별하는 방법을 구체적으로 설명하기로 한다.2 to 3, the image matching apparatus 1000 uses feature vectors extracted from each of the first image patch, the second image patch, the converted first image patch, and the converted second image patch, A registration of the first image patch and the second image patch may be identified. Hereinafter, a method of identifying whether an image is matched according to another exemplary embodiment will be described in detail with reference to FIGS. 4 to 5 .

S410에서, 이미지 정합 장치(1000)는 서로 다른 파장 대역의 광들로부터 제1 이미지 패치 및 제2 이미지 패치를 획득할 수 있다. 일 실시 예에 의하면, 제1 이미지 패치는 제1 파장 대역에 속하는 광에 의하여 생성된 제1 이미지 내 일부 영역 이미지일 수 있고, 제2 이미지 패치는 제2 파장 대역에 속하는 광에 의하여 생성된 제2 이미지 내 일부 영역 이미지 일 수 있다. 제1 이미지 패치는 제1 도메인에 대응되는 영역 이미지일 수 있고, 제2 이미지 패치는 제2 도메인에 대응되는 영역 이미지 일 수 있다. S410은 도 2의 S210에 대응될 수 있으므로 구체적인 설명은 생략하기로 한다.In S410 , the image matching apparatus 1000 may obtain a first image patch and a second image patch from lights of different wavelength bands. According to an embodiment, the first image patch may be an image of a partial region in the first image generated by light belonging to a first wavelength band, and the second image patch may be a second image patch generated by light belonging to the second wavelength band. 2 It may be a partial area image within the image. The first image patch may be a region image corresponding to the first domain, and the second image patch may be a region image corresponding to the second domain. Since S410 may correspond to S210 of FIG. 2 , a detailed description thereof will be omitted.

S420에서, 이미지 정합 장치(1000)는 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환할 수 있다. 일 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 파장 대역의 이미지 특성을 나타내는 제1 이미지 패치가 입력되는 경우, 제2 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제2 신경망 모델(304)을 이용하여, 제1 이미지 패치를 변환할 수 있다. 또한, 이미지 정합 장치(1000)는 제2 파장 대역의 이미지 특성을 나타내는 제2 이미지 패치가 입력되는 경우, 제1 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하도록 학습된 제3 신경망 모델(314)을 이용하여 제2 이미지 패치를 변환할 수 있다. In S420 , the image matching apparatus 1000 may convert the first image patch and the second image patch to represent image characteristics of a preset wavelength band. According to an embodiment, the image matching apparatus 1000 is a second neural network trained to output an image patch representing an image property of a second wavelength band when a first image patch representing an image property of a first wavelength band is input. The model 304 may be used to transform the first image patch. Also, when the second image patch representing the image property of the second wavelength band is input, the image matching apparatus 1000 has a third neural network model 314 trained to output an image patch representing the image property of the first wavelength band can be used to transform the second image patch.

즉, 본 개시에 따른 이미지 정합 장치(1000)는 제1 이미지 패치의 도메인을 제1 도메인에서 제2 도메인으로 변환하고, 제2 이미지 패치의 도메인을 제2 도메인에서 제1 도메인으로 변환함으로써, 제1 이미지 패치가 제2 도메인에 속하는 이미지 특성을 나타내도록 하고, 제2 이미지 패치가 제1 도메인에 속하는 이미지 특성을 나타내도록 변환할 수 있다.That is, the image matching apparatus 1000 according to the present disclosure converts the domain of the first image patch from the first domain to the second domain, and converts the domain of the second image patch from the second domain to the first domain. It is possible to convert the first image patch to indicate the image characteristic belonging to the second domain, and the second image patch to indicate the image characteristic belonging to the first domain.

S430에서, 이미지 정합 장치(1000)는 변환된 제1 이미지 패치 및 변환된 제2 이미지 패치로부터 특징 벡터를 추출할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 변환된 제1 이미지 패치를 제4 신경망 모델에 입력함으로써, 상기 변환된 제1 이미지 패치로부터 제1 특징 벡터를 추출하고, 상기 변환된 제2 이미지 패치를 제5 신경망 모델에 입력함으로써, 상기 변환된 제2 이미지 패치로부터 제4 특징 벡터를 추출할 수 있다. In S430 , the image matching apparatus 1000 may extract a feature vector from the transformed first image patch and the transformed second image patch. For example, the image matching apparatus 1000 extracts a first feature vector from the converted first image patch by inputting the converted first image patch to a fourth neural network model, and uses the converted second image patch By inputting the fifth neural network model, it is possible to extract a fourth feature vector from the transformed second image patch.

S440에서, 이미지 정합 장치(1000)는 상기 획득된 제1 이미지 패치 및 상기 획득된 제2 이미지 패치로부터 특징 벡터를 추출할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 상기 획득된 제2 이미지 패치를 제6 신경망 모델에 입력함으로써, 상기 획득된 제2 이미지 패치로부터 제2 특징 벡터를 추출하고, 상기 획득된 제1 이미지 패치를 제7 신경망 모델에 입력함으로써, 상기 획득된 제1 이미지 패치로부터 제3 특징 벡터를 추출할 수 있다.In S440 , the image matching apparatus 1000 may extract a feature vector from the obtained first image patch and the obtained second image patch. For example, the image matching apparatus 1000 extracts a second feature vector from the obtained second image patch by inputting the obtained second image patch to a sixth neural network model, and the obtained first image patch By inputting to the seventh neural network model, it is possible to extract a third feature vector from the obtained first image patch.

S460에서, 이미지 정합 장치(1000)는 상기 획득된 제1 이미지 패치, 상기 획득된 제2 이미지 패치, 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치 각각으로부터 추출된 특징 벡터를 제1 신경망 모델에 입력함으로써, 제1 이미지 패치 및 제2 이미지 패치의 정합 여부를 식별할 수 있다.In step S460 , the image matching apparatus 1000 sets a first feature vector extracted from each of the obtained first image patch, the obtained second image patch, the converted first image patch, and the converted second image patch. By input to the neural network model, it is possible to identify whether the first image patch and the second image patch match.

예를 들어, 이미지 정합 장치(1000)는 변환된 제1 이미지 패치로부터 추출된 제1 특징 벡터 및 제2 이미지 패치로부터 추출된 제2 특징 벡터의 제1 거리 차이를 결정하고, 제1 이미지 패치로부터 추출된 제3 특징 벡터 및 변환된 제2 이미지 패치로부터 추출된 제4 특징 벡터의 제2 거리 차이를 결정할 수 있다. 이미지 정합 장치(1000)는 상기 제1 거리 차이 및 제2 거리 차이에 관한 차이 벡터를 제1 신경망 모델에 입력함으로써, 제1 신경망 모델로부터 제1 이미지 패치 및 제2 이미지 패치의 정합 정도에 관한 확률 값을 획득할 수 있다. 이미지 정합 장치(1000)는 제1 신경망 모델로부터 출력된 확률 값에 기초하여, 제1 이미지 패치 및 제2 이미지 패치의 정합 여부를 식별할 수 있다.For example, the image matching apparatus 1000 may determine a first distance difference between a first feature vector extracted from the first transformed image patch and a second feature vector extracted from the second image patch, and from the first image patch A second distance difference between the extracted third feature vector and the fourth feature vector extracted from the transformed second image patch may be determined. The image matching apparatus 1000 inputs the difference vector regarding the first distance difference and the second distance difference to the first neural network model, so that the probability regarding the degree of matching of the first image patch and the second image patch from the first neural network model value can be obtained. The image matching apparatus 1000 may identify whether the first image patch and the second image patch match based on the probability value output from the first neural network model.

도 5는 또 다른 실시 예에 따른 이미지 정합 장치가 이미지의 정합 여부를 식별하는 방법을 설명하기 위한 도면이다.5 is a diagram for explaining a method of identifying whether an image is matched by an image matching apparatus according to another exemplary embodiment.

도 5를 참조하여, 일 실시 예에 따른 이미지 정합 장치(1000)가 이미지의 정합 여부를 식별하는 과정을 더 구체적으로 설명하기로 한다. 상술한 바와 같이, 이미지 정합 장치(1000)는 제1 도메인(501)에 속하는 제1 이미지 패치(502) 및 제2 도메인(503)에 속하는 제2 이미지 패치(504)를 획득할 수 있다. 이미지 정합 장치(1000)는 도 2 내지 도 3에서 상술한 바와 달리, 변환된 제1 이미지 패치(522) 및 변환된 제2 이미지 패치(524) 외에, 제1 이미지 패치(502) 및 제2 이미지 패치(504)로부터 추출된 특징 벡터를 더 이용하여, 제1 이미지 패치 및 제2 이미지 패치의 정합 여부를 식별할 수도 있다.A process in which the image matching apparatus 1000 according to an embodiment identifies whether an image is matched will be described in more detail with reference to FIG. 5 . As described above, the image matching apparatus 1000 may acquire the first image patch 502 belonging to the first domain 501 and the second image patch 504 belonging to the second domain 503 . The image matching apparatus 1000, in addition to the converted first image patch 522 and the converted second image patch 524, differently from those described above with reference to FIGS. 2 to 3 , includes the first image patch 502 and the second image patch. Whether the first image patch and the second image patch match may be identified by further using the feature vector extracted from the patch 504 .

예를 들어, 이미지 정합 장치(1000)는 제1 도메인(502)에 속하는 제1 이미지 패치(502)를 획득하고, 제1 도메인에 따른 이미지 특성이 나타나는 이미지 패치가 입력되면, 입력된 이미지 패치가 제2 도메인에 따른 이미지 특성을 나타내도록 입력된 이미지 패치를 변환하는 제2 신경망 모델(512)을 이용하여, 제1 이미지 패치(502)를 변환함으로써, 변환된 제1 이미지 패치(522)를 획득할 수 있다.For example, the image matching apparatus 1000 obtains a first image patch 502 belonging to the first domain 502, and when an image patch showing image characteristics according to the first domain is input, the input image patch is A transformed first image patch 522 is obtained by transforming the first image patch 502 using a second neural network model 512 that transforms the input image patch to represent image characteristics according to the second domain. can do.

일 실시 예에 의하면, 제1 도메인(502)은 제1 파장 대역의 광들이 속하는 스펙트럼 영역을 의미할 수 있다. 이미지 정합 장치(1000)는 제1 파장 대역의 광들로부터 제1 이미지를 획득하고, 제1 이미지 내 픽셀 값에 기초하여 제1 이미지로부터 추출되는 특징점을 포함하는 제1 이미지의 일부 영역을 제1 이미지 패치로 결정할 수 있다.According to an embodiment, the first domain 502 may mean a spectral region to which lights of the first wavelength band belong. The image matching apparatus 1000 obtains a first image from lights of a first wavelength band, and converts a partial region of the first image including feature points extracted from the first image based on pixel values in the first image to the first image. It can be decided by patch.

또한, 일 실시 예에 의하면, 이미지 정합 장치(1000)는 제2 도메인(503)에 속하는 제2 이미지 패치(504)를 획득하고, 제2 도메인에 따른 이미지 특성이 나타나는 이미지 패치가 입력되면, 입력된 이미지 패치가 제1 도메인에 따른 이미지 특성을 나타내도록, 입력된 이미지 패치를 변환하는 제3 신경망 모델(514)을 이용하여, 제2 이미지 패치(504)를 변환함으로써, 변환된 제2 이미지 패치(524)를 획득할 수 있다.Also, according to an embodiment, the image matching apparatus 1000 acquires the second image patch 504 belonging to the second domain 503, and when an image patch showing image characteristics according to the second domain is input, the input The transformed second image patch by transforming the second image patch 504 using the third neural network model 514 that transforms the input image patch so that the converted image patch exhibits image characteristics according to the first domain (524) can be obtained.

일 실시 예에 의하면, 제2 도메인(503)은 제2 파장 대역의 광들이 속하는 스펙트럼 영역을 의미할 수 있다. 이미지 정합 장치(1000)는 제2 파장 대역의 광들로부터 제2 이미지를 획득하고, 제2 이미지 내 픽셀 값에 기초하여 제2 이미지로부터 추출되는 특징점을 포함하는 제2 이미지의 일부 영역을 제2 이미지 패치로 결정할 수 있다.According to an embodiment, the second domain 503 may mean a spectral region to which lights of the second wavelength band belong. The image matching apparatus 1000 obtains a second image from lights of a second wavelength band, and converts a partial region of the second image including feature points extracted from the second image based on pixel values in the second image to the second image. It can be decided by patch.

일 실시 예에 의하면, 상술한 제1 도메인 및 제2 도메인은 각각 가시광선 영역에 속하는 광의 스펙트럼 및 근적외선 영역에 속하는 광의 스펙트럼을 나타낼 수 있다. 그러나, 또 다른 실시 예에 의하면, 제1 도메인 및 제2 도메인은 각각 가시광선 영역에 속하는 광의 스펙트럼 및 열적외선 영역에 속하는 광의 스펙트럼을 나타낼 수도 있다. 또한, 일 실시 예에 의하면, 제1 도메인 및 제2 도메인은 각각 RGB 각각의 영역에 속하는 광의 스펙트럼 및 근적외선 영역에 속하는 광의 스펙트럼을 나타낼 수도 있다.According to an embodiment, the above-described first domain and the second domain may represent a spectrum of light belonging to a visible ray region and a spectrum of light belonging to a near-infrared region, respectively. However, according to another embodiment, the first domain and the second domain may represent a spectrum of light belonging to a visible ray region and a spectrum of light belonging to a thermal infrared region, respectively. Also, according to an embodiment, the first domain and the second domain may represent a spectrum of light belonging to each RGB region and a spectrum of light belonging to a near-infrared region, respectively.

또 다른 실시 예에 의하면, 이미지 정합 장치(1000)가 획득한 제1 이미지 패치 및 제2 이미지 패치는 각각 가시광선 영역의 광에 의해 생성된 이미지들의 이미지 특성 및 근적외선 영역의 광에 의해 생성된 이미지들의 특성을 포함할 수 있다. 그러나, 또 다른 실시 예에 의하면, 제1 이미지 패치 및 제2 이미지 패치는 각각 가시 광선 영역의 광에 의해 생성된 이미지들의 이미지 특성 및 열적외선 영역에 속하는 광에 의해 생성된 이미지들의 특성을 포함할 수도 있다. 또한, 일 실시 예에 의하면, 제1 이미지 패치 및 제2 이미지 패치는 각각 RGB 각각의 영역에 속하는 광에 의해 생성된 이미지들의 특성 및 근적외선 영역에 속하는 광에 의해 생성된 이미지들의 특성을 포함할 수도 있다. According to another embodiment, the first image patch and the second image patch acquired by the image matching apparatus 1000 may have image characteristics of images generated by light in the visible light region and images generated by light in the near infrared region, respectively. may include their characteristics. However, according to another embodiment, the first image patch and the second image patch may include image characteristics of images generated by light in the visible light region and characteristics of images generated by light belonging to the thermal infrared region, respectively. may be Also, according to an embodiment, the first image patch and the second image patch may include characteristics of images generated by light belonging to each RGB region and characteristics of images generated by light belonging to a near-infrared region, respectively. there is.

또한, 일 실시 예에 의하면, 본 개시에 따른 이미지들의 특성은, 이미지 내 픽셀 값의 세기(intensity), 수준(level), 값(value)의 최소 및 최대 범위(range)를 의미할 수 있다. 예를 들어, 근적외선 카메라에 의해 촬영된 근적외선 이미지 패치와 가시광선 카메라에 의해 촬영된 가시광선 이미지 패치 내 픽셀 값들의 세기, 수준, 값들의 최소 및 최대 범위(range)는 서로 달라질 수 있다.Also, according to an embodiment, characteristics of images according to the present disclosure may mean intensity, level, and minimum and maximum ranges of pixel values in the image. For example, intensities, levels, and minimum and maximum ranges of pixel values in a near-infrared image patch photographed by a near-infrared camera and a visible light image patch photographed by a visible light camera may be different from each other.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 제2 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습되는 제4 신경망 모델을 이용하여, 변환된 제1 이미지 패치(522)로부터 제1 특징 벡터(532)를 추출할 수 있다. 또한, 이미지 정합 장치(1000)는 제2 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습되는 제6 신경망 모델을 이용하여, 제2 이미지 패치(504)로부터 제2 특징 벡터(534)를 추출할 수 있다. 일 실시 예에 의하면, 변환된 제1 이미지 패치 및 제2 이미지 패치는 제2 도메인에 따른 이미지 특성을 나타내는 이미지 패치일 수 있고, 제4 신경망 모델 및 제6 신경망 모델은 서로 가중치를 공유하는 샴(Siamese) 네트워크 구조의 신경망 모델일 수 있다.According to an embodiment, the image matching apparatus 1000 uses a fourth neural network model trained in advance to extract a feature vector from an image patch belonging to the second domain, and receives the first feature from the converted first image patch 522 . A vector 532 may be extracted. Also, the image matching apparatus 1000 extracts the second feature vector 534 from the second image patch 504 by using a sixth neural network model trained in advance to extract the feature vector from the image patch belonging to the second domain. can do. According to an embodiment, the converted first image patch and the second image patch may be image patches representing image characteristics according to the second domain, and the fourth neural network model and the sixth neural network model share a weight with each other. Siamese) may be a neural network model of a network structure.

또한, 일 실시 예에 의하면, 이미지 정합 장치(1000)는 제1 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습되는 제7 신경망 모델을 이용하여, 제1 이미지 패치(502)로부터 제3 특징 벡터(536)를 추출할 수 있다. 또한, 이미지 정합 장치(1000)는 제1 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습되는 제5 신경망 모델을 이용하여, 변환된 제2 이미지 패치(524)로부터 제4 특징 벡터(538)을 추출할 수 있다. 일 실시 예에 의하면, 변환된 제2 이미지 패치 및 제1 이미지 패치는 제1 도메인에 따른 이미지 특성을 나타내는 이미지 패치일 수 있고, 제7 신경망 모델 및 제5 신경망 모델은 서로 가중치를 공유하는 샴(Siamese) 네트워크 구조의 신경망 모델일 수 있다.Also, according to an embodiment, the image matching apparatus 1000 uses a seventh neural network model trained in advance to extract a feature vector from an image patch belonging to the first domain, and the third feature from the first image patch 502 is used. A vector 536 may be extracted. In addition, the image matching apparatus 1000 uses a fifth neural network model trained in advance to extract a feature vector from an image patch belonging to the first domain, and a fourth feature vector 538 from the converted second image patch 524 . can be extracted. According to an embodiment, the converted second image patch and the first image patch may be image patches representing image characteristics according to the first domain, and the seventh neural network model and the fifth neural network model share a weight with each other. Siamese) may be a neural network model of a network structure.

본 개시의 일 실시 예에 따른 이미지 정합 장치(1000)는 샴 네트워크 구조의 신경망 모델을 쌍(dual Siamese Network)을 포함할 수 있고, 샴 네트워크 구조의 신경망 모델 쌍은 제1 타입의 샴 네트워크 신경망 모델 및 제2 타임의 샴 네트워크 신경망 모델을 포함할 수 있다. The image matching apparatus 1000 according to an embodiment of the present disclosure may include a pair of neural network models of a Siamese network structure (dual Siamese Network), and the pair of neural network models of a Siamese network structure is a first type of Siamese network neural network model. and a Siamese network neural network model of the second time.

이미지 정합 장치(1000)는 제1 특징 벡터(532) 및 제2 특징 벡터(534)의 원소 별 거리 값의 제곱을 원소로 포함하는 제1 차이 벡터(542)를 결정할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제1 특징 벡터(532) 및 제2 특징 벡터(534)의 원소 별 거리 차이 값의 제곱을 원소로 포함하는 벡터를 제1 차이 벡터(542)로 결정할 수도 있다. 또한, 일 실시 예에 의하면, 이미지 정합 장치(1000)는 제3 특징 벡터(536) 및 제4 특징 벡터(538)의 원소 별 거리 값의 제곱을 원소로 포함하는 제2 차이 벡터(544)를 결정할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 제3 특징 벡터(536) 및 제4 특징 벡터(538)의 원소 별 거리 차이 값의 제곱을 원소로 포함하는 벡터를 제2 차이 벡터(544)로 결정할 수도 있다.The image matching apparatus 1000 may determine a first difference vector 542 including, as an element, the square of the distance values for each element of the first feature vector 532 and the second feature vector 534 . For example, the image matching apparatus 1000 determines, as the first difference vector 542 , a vector including, as an element, the square of the distance difference value for each element of the first feature vector 532 and the second feature vector 534 . may be Also, according to an embodiment, the image matching apparatus 1000 may generate a second difference vector 544 including, as an element, the square of the distance values for each element of the third feature vector 536 and the fourth feature vector 538 . can decide For example, the image matching apparatus 1000 determines, as the second difference vector 544 , a vector including, as an element, the square of the distance difference value for each element of the third feature vector 536 and the fourth feature vector 538 . may be

이미지 정합 장치(1000)는 제1 차이 벡터(542) 및 제2 차이 벡터(544)가 입력되면, 제1 이미지 패치 및 제2 이미지 패치의 정합 정도를 나타내는 확률 값을 출력하도록 제1 신경망 모델을 학습시킬 수 있다. 예를 들어, 이미지 정합 장치(1000)는 이미지 간의 유사성(similarity)을 식별하도록 하기 위해, metric learning(546) 기법을 이용하여 제1 신경망 모델을 학습시킬 수 있다. 이미지 정합 장치(1000)는 제1 차이 벡터 및 제2 차이 벡터에 기초하여, 제1 이미지 패치 및 제2 이미지 패치의 유사 정도를 식별하도록 학습된 제1 신경망 모델을 이용하여, 제1 이미지 패치 및 제2 이미지 패치의 정합(matching)을 식별할 수 있다. When the first difference vector 542 and the second difference vector 544 are input, the image matching apparatus 1000 generates a first neural network model to output a probability value indicating the degree of matching of the first image patch and the second image patch. can learn For example, the image matching apparatus 1000 may train the first neural network model using a metric learning 546 technique in order to identify similarity between images. The image matching apparatus 1000 uses the first neural network model trained to identify the degree of similarity between the first image patch and the second image patch based on the first difference vector and the second difference vector, Matching of the second image patch may be identified.

본 개시의 일 실시 예에 따른 이미지 정합 장치(1000)는 서로 다른 파장 영역의 스펙트럼을 가지는 이미지 패치들 사이의 정합 여부를 식별함으로써, 각 이미지 패치가 속하는 이미지들의 정합(matching) 여부를 식별할 수 있다. 또한, 일 실시 예에 의하면, 이미지 정합 장치(1000)는 시간의 흐름에 따른 연속적인 이미지 시퀀스들을 획득함으로써, 서로 다른 스펙트럼에 속하는 영상들을 획득할 수도 있다. 이미지 정합 장치(1000)는 서로 다른 스펙트럼에 속하는 이미지 패치들의 정합 여부를 식별함으로써 서로 다른 스펙트럼에 속하는 영상들 사이의 정합을 식별할 수도 있다.The image matching apparatus 1000 according to an embodiment of the present disclosure may identify whether images to which each image patch belongs by identifying whether image patches having spectra of different wavelength regions are matched. there is. Also, according to an embodiment, the image matching apparatus 1000 may acquire images belonging to different spectra by acquiring continuous image sequences according to the passage of time. The image matching apparatus 1000 may identify matching between images belonging to different spectra by identifying whether image patches belonging to different spectra are matched.

본 개시에 따른 이미지 정합 장치(1000)가 다중 스펙트럼(multi-spectrum)의 영상 정합(matching) 여부를 식별한 결과, 다중 스펙트럼의 영상들이 정합되는 것으로 식별되는 경우, 상기 정합된 다중 스펙트럼 영상들 사이의 정합은, 교차 스펙트럼(cross spectrum) 영상 정합이라고 표현될 수 있다.When the image matching apparatus 1000 according to the present disclosure identifies whether multi-spectrum images are matched, and it is determined that multi-spectral images are matched, between the matched multi-spectrum images The matching of can be expressed as cross-spectrum image matching.

도 6은 일 실시 예에 따른 이미지 정합 장치의 구조를 설명하기 위한 도면이다.6 is a diagram for describing a structure of an image matching apparatus according to an exemplary embodiment.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 복수의 신경망 모델을 이용하여 이미지 패치 쌍의 정합 여부를 식별할 수 있다. 이하에서는 편의상, 제1 이미지 패치(602)가 가시광선 영역에 속하는 광의 스펙트럼에 대응되는 가시광선 도메인(Visible Domain, 601)에 해당하고, 제2 이미지 패치(604)가 근적외선 영역에 속하는 광의 스펙트럼에 대응되는 근적외선 도메인(NIR Domain, 603)에 해당하는 경우를 가정하여 설명하기로 한다.According to an embodiment, the image matching apparatus 1000 may identify whether a pair of image patches is matched by using a plurality of neural network models. Hereinafter, for convenience, the first image patch 602 corresponds to a visible domain 601 corresponding to the spectrum of light belonging to the visible ray region, and the second image patch 604 corresponds to the spectrum of light belonging to the near-infrared region. A case corresponding to the corresponding near-infrared domain (NIR domain, 603) will be described on the assumption that.

예를 들어, 이미지 정합 장치(1000)는 가시광선 도메인(601)의 제1 이미지 패치(602) 및 근적외선 도메인(604)의 제2 이미지 패치를 획득할 수 있다. 이미지 정합 장치(1000)는 가시광선 카메라 및 근적외선 카메라와 연결될 수 있고, 가시광선 카메라 및 근적외선 카메라 각각으로부터 제1 이미지 패치 및 제2 이미지 패치를 획득할 수도 있다.For example, the image matching apparatus 1000 may acquire the first image patch 602 of the visible ray domain 601 and the second image patch of the near-infrared domain 604 . The image matching apparatus 1000 may be connected to a visible ray camera and a near-infrared camera, and may acquire a first image patch and a second image patch from the visible ray camera and the near-infrared camera, respectively.

이미지 정합 장치(1000)는 제2 신경망 모델(612)을 이용하여, 제1 이미지 패치의 도메인인 가시광선 도메인을 근적외선 도메인으로 변환할 수 있다. 예를 들어, 이미지 정합 장치(1000)가 제1 이미지 패치의 도메인을 가시광선 도메인에서 근적외선 도메인으로 변환하는 동작은, 가시광선 도메인에 속하는 제1 이미지 패치가 근적외선 도메인에 따른 이미지 특성을 나타내도록, 제1 이미지 패치의 픽셀 값을 변환하는 동작에 대응될 수 있다. 이미지 정합 장치(1000)가 이용하는 제2 신경망 모델(612)은 가시광선 도메인에 속하는 이미지 패치가 입력되면, 입력된 이미지 패치가, 근적외선 도메인에 따른 이미지 특성을 나타내도록, 입력된 이미지 패치의 픽셀 값을 변환하도록 학습된 신경망 모델일 수 있다. 일 실시 예에 의하면, 제2 신경망 모델은 10개의 블록을 포함하는 U-Net 모델로써, 입력 레이어, 엔코딩 블록, 디코딩 블록 및 출력 레이어를 포함할 수 있다. 제2 신경망 모델의 구조는 도 7을 참조하여 더 구체적으로 설명하기로 한다.The image matching apparatus 1000 may convert the visible ray domain, which is the domain of the first image patch, into a near-infrared domain by using the second neural network model 612 . For example, the operation of the image matching apparatus 1000 converting the domain of the first image patch from the visible ray domain to the near-infrared domain is such that the first image patch belonging to the visible ray domain exhibits image characteristics according to the near-infrared domain; It may correspond to an operation of converting a pixel value of the first image patch. In the second neural network model 612 used by the image matching apparatus 1000, when an image patch belonging to the visible ray domain is input, the input image patch exhibits image characteristics according to the near-infrared domain. Pixel values of the input image patch It may be a neural network model trained to transform According to an embodiment, the second neural network model is a U-Net model including 10 blocks, and may include an input layer, an encoding block, a decoding block, and an output layer. The structure of the second neural network model will be described in more detail with reference to FIG. 7 .

이미지 정합 장치(1000)는 가시광선 도메인에 속하는 제1 이미지 패치를 변환함으로써, 근적외선 도메인에 따른 이미지 특성을 나타내는 변환된 제1 이미지 패치(622)를 획득할 수 있다. 변환된 제1 이미지 패치는, 근적외선 이미지 센서를 포함하는 카메라로부터 촬영된 이미지 패치는 아니지만, 근적외선 이미지 특성을 나타내도록 변환된 이미지 패치일 수 있다. 또한, 변환된 제1 이미지 패치는, 제2 신경망 모델로 입력된 제1 이미지 패치와 저 수준의 특징 값들을 유지할 뿐만 아니라, 유사한 형상(appearance)을 가질 수 있다. 변환된 제1 이미지 패치는 제1 이미지 패치와 픽셀의 세기 및 수준이 약간 다를 수 있지만, 이미지 정합 장치(1000)는 변환된 제1 이미지 패치를 이용함으로써, 다중 스펙트럼 이미지의 정합을 정확하게 식별할 수 있다.The image matching apparatus 1000 may acquire the converted first image patch 622 representing image characteristics according to the near-infrared domain by converting the first image patch belonging to the visible ray domain. The converted first image patch may not be an image patch captured by a camera including a near-infrared image sensor, but may be an image patch converted to exhibit near-infrared image characteristics. In addition, the transformed first image patch may have a similar appearance to that of the first image patch input to the second neural network model, as well as maintaining low-level feature values. Although the converted first image patch may have slightly different pixel intensity and level from the first image patch, the image matching apparatus 1000 may accurately identify the registration of the multi-spectral image by using the converted first image patch. there is.

또한, 일 실시 예에 의하면, 이미지 정합 장치(1000)는 제3 신경망 모델(614)을 이용하여 제2 이미지 패치의 도메인인 근적외선 도메인을 가시광선 도메인으로 변환할 수 있다. 예를 들어, 이미지 정합 장치(1000)가 제2 이미지 패치의 도메인을 근적외선 도메인에서 가시광선 도메인으로 변환하는 동작은, 근적외선 도메인에 속하는 제2 이미지 패치가, 가시광선 도메인에 따른 이미지 특성을 나타내도록, 제2 이미지 패치의 픽셀 값을 변환하는 동작에 대응될 수 있다. 이미지 정합 장치(1000)가 이용하는 제3 신경망 모델(614)은 근적외선 도메인에 속하는 이미지 패치가 입력되면, 입력된 이미지 패치가, 근적외선 도메인에 따른 이미지 특성을 나타내도록, 입력된 이미지 패치의 픽셀 값을 변환하도록 학습된 신경망 모델일 수 있다. 일 실시 예에 의하면, 제3 신경망 모델은 10개의 블록을 포함하는 U-Net 모델로써, 입력 레이어, 엔코딩 블록, 디코딩 블록 및 출력 레이어를 포함할 수 있다. 제3 신경망 모델의 구조는 도 7을 참조하여 더 구체적으로 설명하기로 한다.Also, according to an embodiment, the image matching apparatus 1000 may convert the near-infrared domain, which is the domain of the second image patch, into a visible ray domain by using the third neural network model 614 . For example, the image matching apparatus 1000 converts the domain of the second image patch from the near-infrared domain to the visible ray domain so that the second image patch belonging to the near-infrared domain exhibits image characteristics according to the visible ray domain. , may correspond to an operation of converting pixel values of the second image patch. When an image patch belonging to the near-infrared domain is input, the third neural network model 614 used by the image matching apparatus 1000 calculates the pixel values of the input image patch so that the input image patch exhibits image characteristics according to the near-infrared domain. It can be a neural network model that has been trained to transform. According to an embodiment, the third neural network model is a U-Net model including 10 blocks, and may include an input layer, an encoding block, a decoding block, and an output layer. The structure of the third neural network model will be described in more detail with reference to FIG. 7 .

이미지 정합 장치(1000)는 근 적외선 도메인에 속하는 제2 이미지 패치를 변환함으로써, 가시광선 도메인에 따른 이미지 특성을 나타내는, 변환된 제2 이미지 패치(624)를 획득할 수 있다. 변환된 제2 이미지 패치는, 가시광선 이미지 센서를 포함하는 카메라로부터 촬영된 이미지는 아니지만, 가시광선 이미지 특성을 나타내도록 변환된 이미지 패치일 수 있다.The image matching apparatus 1000 may obtain the converted second image patch 624 representing image characteristics according to the visible ray domain by converting the second image patch belonging to the near-infrared domain. The converted second image patch may not be an image captured by a camera including a visible light image sensor, but may be an image patch converted to exhibit visible light image characteristics.

이미지 정합 장치(1000)는 제1 이미지 패치(602), 제2 이미지 패치(604), 변환된 제1 이미지 패치(622), 변환된 제2 이미지 패치(624)로부터 특징 벡터를 추출할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 근적외선 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습된 제4 신경망 모델(626) 및 제6 신경망 모델(628)을 이용하여, 변환된 제1 이미지 패치(622) 및 제2 이미지 패치로부터 각각 제1 특징 벡터(632) 및 제2 특징 벡터(634)를 추출할 수 있다. 제4 신경망 모델 및 제6 신경망 모델은, 근적외선 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습될 수 있고, 가중치를 공유하는 샴 네트워크 구조의 합성곱 신경망 모델일 수 있다.The image matching apparatus 1000 may extract a feature vector from the first image patch 602 , the second image patch 604 , the transformed first image patch 622 , and the transformed second image patch 624 . . For example, the image matching apparatus 1000 uses the fourth neural network model 626 and the sixth neural network model 628 trained in advance to extract feature vectors from image patches belonging to the near-infrared domain, and the converted first image A first feature vector 632 and a second feature vector 634 may be extracted from the patch 622 and the second image patch, respectively. The fourth neural network model and the sixth neural network model may be pre-trained to extract feature vectors from image patches belonging to the near-infrared domain, and may be convolutional neural network models of a Siamese network structure sharing weights.

이미지 정합 장치(1000)는 가시광선 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습된 제7 신경망 모델(629) 및 제5 신경망(631)을 이용하여, 제1 이미지 패치(602) 및 변환된 제2 이미지 패치(624)로부터 각각 제3 특징 벡터(636) 및 제4 특징 벡터(638)를 추출할 수 있다. 제7 신경망 모델 및 제5 신경망 모델은, 가시광선 도메인에 속하는 이미지 패치로부터 특징 벡터를 추출하도록 미리 학습될 수 있고, 가중치를 공유하는 샴 네트워크 구조의 합성곱 신경망 모델일 수 있다.The image matching apparatus 1000 uses the seventh neural network model 629 and the fifth neural network 631 trained in advance to extract feature vectors from image patches belonging to the visible ray domain, the first image patch 602 and the transformation A third feature vector 636 and a fourth feature vector 638 may be extracted from the second image patch 624 , respectively. The seventh neural network model and the fifth neural network model may be pre-trained to extract feature vectors from image patches belonging to the visible ray domain, and may be convolutional neural network models of a Siamese network structure sharing weights.

일 실시 예에 의하면, 상술한 제4 신경망 모델, 제5 신경망 모델, 제6 신경망 모델 및 제7 신경망 모델은 이미지 패치가 입력되면 이미지 패치의 픽셀 정보 및 기 설정된 크기의 커널의 상관 관계에 기초하여 결정되는 특징 값들을 벡터 형태로 출력하는 합성곱 기반의 신경망 모델일 수 있다. According to an embodiment, the above-described fourth neural network model, fifth neural network model, sixth neural network model, and seventh neural network model are based on a correlation between pixel information of an image patch and a kernel of a preset size when an image patch is input. It may be a convolution-based neural network model that outputs determined feature values in a vector form.

이미지 정합 장치(1000)는 제1 특징 벡터(632) 및 제2 특징 벡터(634)의 원소 별 거리 차이 값을 원소로 포함하는 제1 차이 벡터(642)를 결정할 수 있다. 또한, 이미지 정합 장치(1000)는 제3 특징 벡터(636) 및 제4 특징 벡터(638)의 원소 별 거리 차이 값을 원소로 포함하는 제2 차이 벡터(644)를 결정할 수 있다. 이미지 정합 장치(1000)는 제1 차이 벡터(642), 제1 특징 벡터(632) 및 제2 특징 벡터(634)를 연결(concatenate)함으로써, 제1 중합 벡터(646)를 생성할 수 있다. The image matching apparatus 1000 may determine a first difference vector 642 including, as an element, a distance difference value for each element of the first feature vector 632 and the second feature vector 634 . Also, the image matching apparatus 1000 may determine a second difference vector 644 including, as an element, a distance difference value for each element of the third feature vector 636 and the fourth feature vector 638 . The image matching apparatus 1000 may generate a first polymerization vector 646 by concatenating the first difference vector 642 , the first feature vector 632 , and the second feature vector 634 .

일 실시 예에 의하면, 이미지 정합 장치(1000)는 제2 차이 벡터(644), 제3 특징 벡터(636) 및 제4 특징 벡터(638)를 중합(concatenate)함으로서, 제2 중합 벡터(644)를 생성할 수 있다. 이미지 정합 장치(1000)는 제1 중합 벡터(646) 및 제2 중합 벡터(648)를 중합(concatenate)함으로써 제3 중합 벡터(648)를 생성할 수 있다. According to an embodiment, the image registration apparatus 1000 concatenates the second difference vector 644 , the third feature vector 636 , and the fourth feature vector 638 , so that the second polymerization vector 644 . can create The image matching apparatus 1000 may generate the third polymerization vector 648 by concatenating the first polymerization vector 646 and the second polymerization vector 648 .

이미지 정합 장치(1000)는 생성된 제3 중합 벡터(648)를 미리 학습된 제1 신경망 모델에 입력하고, 제1 신경망 모델로부터 제1 이미지 패치 및 제2 이미지 패치의 정합 정도를 나타내는 유사도 점수(similarity score, 658)를 획득할 수 있다. 일 실시 예에 의하면, 제1 신경망 모델은 복수의 풀리 커넥티드 레이어(652, 654, 656) 및 풀리 커넥티드의 출력단에 연결된 활성화 함수를 더 포함할 수 있다.The image matching apparatus 1000 inputs the generated third polymerization vector 648 to the pre-trained first neural network model, and a similarity score ( A similarity score, 658) can be obtained. According to an embodiment, the first neural network model may further include an activation function connected to a plurality of fully connected layers 652 , 654 , and 656 and an output terminal of the fully connected network.

예를 들어, 풀리 커넥티드 레이어(652)는 특징 벡터들로부터 생성된 제3 중합 벡터(648)를 입력 받고, 입력된 제3 중합 벡터의 차원을 변환하며, 변환된 차원의 제3 중합 벡터를 다음 풀리 커넥티드 레이어(654) 및 풀리커넥티드 레이어(656)로 순차적으로 전달한다. 풀리 커넥티드 레이어(656)에는, 중합 벡터로부터 유사도 점수를 생성하기 위해 미리 학습된 비선형 시그모이드 활성화 함수가 더 연결될 수 있다.For example, the fully connected layer 652 receives the third polymerization vector 648 generated from the feature vectors, transforms the dimension of the input third polymerization vector, and converts the third polymerization vector of the transformed dimension. The next fully connected layer 654 and the fully connected layer 656 are sequentially transferred. In the fully connected layer 656, a pre-trained non-linear sigmoid activation function may be further connected to generate a similarity score from the polymerization vector.

일 실시 예에 의하면, 이미지 정합 장치(1000)가 제1 이미지 패치 및 제2 이미지 패치의 정합을 식별하기 위해 이용하는, 제1 신경망 모델, 제2 신경망 모델, 제3 신경망 모델, 제4 신경망 모델, 제5 신경망 모델, 제6 신경망 모델 및 제7 신경망 모델은 하나의 신경망 모델로써 스펙트럼 불변 이미지 정합 모델(Spectral-Invariant Matching Network, SPIMNet)로 구현될 수 있다. According to an embodiment, the first neural network model, the second neural network model, the third neural network model, the fourth neural network model, The fifth neural network model, the sixth neural network model, and the seventh neural network model may be implemented as a single neural network model as a Spectral-Invariant Matching Network (SPIMNet).

일 실시 예에 의하면, 이미지 정합 장치(1000)가 제1 이미지 패치 및 제2 이미지 패치의 정합을 식별하기 위해 이용하는, 상술한 제1 신경망 모델, 상기 제2 신경망 모델, 상기 제3 신경망 모델, 상기 제4 신경망 모델, 상기 제5 신경망 모델, 상기 제6 신경망 모델 및 상기 제7 신경망 모델은, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 정도에 관한 제1 손실 함수와 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치의 변환 정도에 관한 제2 손실 함수가 최소화되도록, 레이어들 및 레이어들 간의 연결 강도에 관한 가중치가 수정 및 갱신될 수 있다.According to an embodiment, the above-described first neural network model, the second neural network model, the third neural network model, The fourth neural network model, the fifth neural network model, the sixth neural network model, and the seventh neural network model include a first loss function related to a degree of matching of the first image patch and the second image patch and the transformed first Layers and weights related to connection strength between layers may be corrected and updated so that the second loss function related to the transformation degree of the image patch and the transformed second image patch is minimized.

일 실시 예에 따라, 이미지 정합 장치(1000)가 이용하는 신경망 모델이 단일의 신경망 모델로써 스펙트럼 불변 이미지 정합 모델로 구현되는 경우, 스펙트럼 불변 이미지 정합 모델 내 가중치들 역시, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 정도에 관한 제1 손실 함수와 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치의 변환 정도에 관한 제2 손실 함수가 최소화되도록 수정 및 갱신될 수 있다. 일 실시 예에 의하면, 제1 손실 함수 및 제2 손실 함수는 하나의 항으로 연결됨으로써 제3 손실 함수에 포함될 수 있다.According to an embodiment, when the neural network model used by the image matching apparatus 1000 is implemented as a spectral invariant image matching model as a single neural network model, weights in the spectral invariant image matching model are also included in the first image patch and the second The first loss function regarding the degree of matching of the two image patches and the second loss function regarding the degree of conversion of the transformed first image patch and the converted second image patch may be modified and updated to be minimized. According to an embodiment, the first loss function and the second loss function may be included in the third loss function by being connected as one term.

즉, 본 개시의 일 실시 예에 따른 이미지 정합 장치(1000)는 하기 수학식 1과 같이 표현되는 제3 손실 함수가 최소화 되도록 이미지 정합 장치(1000)내 신경망 모델들 내의 가중치들을 수정 및 갱신할 수 있다.That is, the image matching apparatus 1000 according to an embodiment of the present disclosure may modify and update weights in the neural network models in the image matching apparatus 1000 so that the third loss function expressed as Equation 1 below is minimized. there is.

상기 수학식 1에서, L_SPIMNet은 제1 손실 함수 및 제2 손실 함수를 포함하는 제3 손실 함수이고, L_matching은 제1 이미지 패치 및 제2 이미지 패치의 정합 정도(similarity level)을 학습하는데 사용되는 제1 손실 함수이며, yL_conversion은 이미지 정합 장치(1000)가 이미지 패치들의 도메인을 변환하도록 학습하는데 사용되는 제2 손실 함수이다. 이미지 정합 장치(1000)는 상기 수학식 1에 따른 제3 손실 함수가 최소화되도록, 이미지 정합 장치 내 신경망 모델들 내의 가중치들을 수정 및 갱신할 수 있다.In Equation 1, L _SPIMNet is a third loss function including a first loss function and a second loss function, and L _matching is used to learn a similarity level of the first image patch and the second image patch. is a first loss function, yL _conversion is a second loss function used to learn the image matching apparatus 1000 to transform domains of image patches. The image matching apparatus 1000 may modify and update weights in the neural network models in the image matching apparatus so that the third loss function according to Equation 1 is minimized.

여기에서, L_matching은 이진 크로스 엔트로피 함수(binary cross-entropy function)로써, 제1 이미지 패치 및 제2 이미지 패치의 정합 정도(similarity level)를 학습하는데 사용되는 제1 손실 함수이고,

는 하나의 트레이닝 이미지에 대한 이미지 정합 장치(1000)가 이용하는 스펙트럼 불변 이미지 정합 모델의 출력 값이며, y는 학습 데이터의 클래스를 나타낸다. 예를 들어, y가 1인 경우, 제1 이미지 패치 및 제2 이미지 패치의 정합을 나타내고, y가 0이면, 제1 이미지 패치 및 제2 이미지 패치가 정합되지 않음을 나타낼 수 있다.Here, L _matching is a binary cross-entropy function, and is a first loss function used to learn a similarity level of the first image patch and the second image patch,

is an output value of a spectral invariant image matching model used by the image matching apparatus 1000 for one training image, and y represents a class of training data. For example, when y is 1, it may indicate that the first image patch and the second image patch are matched, and when y is 0, it may indicate that the first image patch and the second image patch are not matched.

상기 수학식 3에서, L_conversion은 이미지 정합 장치(1000)가 이미지 패치들의 도메인을 변환하도록 학습하는데 사용되는 제2 손실 함수이고,

은 L1 손실이며,

는 perceptual 손실이다. Perceptual 손실이 사용되지 않으면, Lconversion 손실은 정합(matched) 레이블링된 패치 쌍들이 misaligned 되는 경우, 상당히 증가할 수 있다. 상기 수학식 3에서,

은 Perceptual 손실을 결정하기 위해, ImageNet 데이터 셋에 기초하여 미리 학습된 VGG19 network일 수 있다. 또한, 상기 수학식 3에서, xvis는 가시광선 도메인에 속하는 제1 이미지 패치이고, xnir은 근적외선 도메인에 속하는 제2 이미지 패치를 나타내며, xtvis는 가시광선 도메인에 속하는 이미지 패치가 입력되면, 근적외선 도메인에 따른 이미지 특성을 나타내도록, 입력된 이미지 패치를 변환하는 제2 신경망 모델로부터 출력된, 변환된(translated) 제1 이미지 패치이고, xtnir은 근적외선 도메인에 속하는 이미지 패치가 입력되면, 가시광선 도메인에 따른 이미지 특성을 나타내도록, 입력된 이미지 패치를 변환하는 제3 신경망 모델로부터 출력된, 변환된(translated) 제2 이미지 패치를 나타낸다.In Equation 3, L _conversion is a second loss function used to learn the image matching apparatus 1000 to transform domains of image patches,

is the L1 loss,

is a perceptual loss. If perceptual loss is not used, Lconversion loss can increase significantly if matched labeled patch pairs are misaligned. In Equation 3 above,

may be a pre-trained VGG19 network based on the ImageNet data set to determine the perceptual loss. Also, in Equation 3, xvis is a first image patch belonging to the visible ray domain, xnir indicates a second image patch belonging to the near-infrared domain, and xtvis is an image patch belonging to the visible ray domain. A first image patch that is translated and output from a second neural network model that transforms an input image patch to show image characteristics according to It represents a translated second image patch output from the third neural network model that transforms the input image patch to represent image characteristics.

일 실시 예에 의하면,

는 0.1,

는 30으로 설정될 수 있으나, 이에 제한되는 것은 아니며, 제3 손실 함수를 최소화하기 위해 학습 과정에서 다르게 설정될 수도 있다. 예를 들어, 이미지 정합 장치(1000)는 이미지 데이터 셋에 기초하여 이미지 정합 장치 내 신경망 모델을 훈련하는 과정에서 L_conversion이 최소화 되도록, 이미지 신경망 모델 내 가중치들을 수정 및 갱신할 수 있다. L_conversion은, 제2 신경망 모델 및 제3 신경망 모델로부터 각각 출력된 변환된 제1 이미지 패치 및 변환된 제2 이미지 패치가, 제1 이미지 패치 및 제2 이미지 패치와 유사해질수록, 작아질 수 있다.According to one embodiment,

is 0.1,

may be set to 30, but is not limited thereto, and may be set differently in the learning process to minimize the third loss function. For example, the image matching apparatus 1000 may modify and update weights in the image neural network model so that L _conversion is minimized during training of the neural network model in the image matching apparatus based on the image data set. L _conversion may be smaller as the converted first image patch and the converted second image patch output from the second neural network model and the third neural network model, respectively, become similar to the first image patch and the second image patch. .

도 7은 일 실시 예에 따른 이미지 정합 장치가 이미지 패치를 변환하기 위해 이용하는 신경망 모델의 구조를 설명하기 위한 도면이다.7 is a diagram for describing a structure of a neural network model used by an image matching apparatus to convert an image patch, according to an embodiment.

도 7을 참조하여 이미지 정합 장치(1000)가 타겟 도메인에 따른 이미지 특성을 나타내도록 이미지 패치를 변환하기 위해 이용하는 제2 신경망 모델 및 제3 신경망 모델의 구조를 구체적으로 설명한다. 이하에서는 편의상, 제1 이미지 패치(602)가 가시광선 영역에 속하는 광의 스펙트럼에 대응되는 가시광선 도메인(Visible Domain, 601)에 해당하고, 제2 이미지 패치(604)가 근적외선 영역에 속하는 광의 스펙트럼에 대응되는 근적외선 도메인(NIR Domain, 603)에 해당하는 경우를 가정하여 설명하기로 한다.The structures of the second neural network model and the third neural network model used by the image matching apparatus 1000 to transform an image patch to represent image characteristics according to a target domain will be described in detail with reference to FIG. 7 . Hereinafter, for convenience, the first image patch 602 corresponds to a visible domain 601 corresponding to the spectrum of light belonging to the visible ray region, and the second image patch 604 corresponds to the spectrum of light belonging to the near-infrared region. A case corresponding to the corresponding near-infrared domain (NIR domain, 603) will be described on the assumption that.

이미지 정합 장치(1000)는 가시광선 도메인에 속하는 제1 이미지 패치가, 근적외선 도메인 이미지 특성을 나타내도록 하기 위해, 제2 신경망 모델(712)을 이용하여 제1 이미지 패치를 변환할 수 있다. 일 실시 예에 의하면, 제2 신경망 모델(712)은 가시광선 도메인에 속하는 이미지 패치가 입력되면, 근적외선 도메인의 이미지 특성을 나타내도록, 입력된 이미지 패치를 변환하는 신경망 모델일 수 있다. 일 실시 예에 의하면, 제2 신경망 모델(712)은 10개의 블록들을 포함하는 U-net일 수 있다.The image matching apparatus 1000 may transform the first image patch using the second neural network model 712 so that the first image patch belonging to the visible ray domain exhibits image characteristics in the near-infrared domain. According to an embodiment, when an image patch belonging to the visible ray domain is input, the second neural network model 712 may be a neural network model that transforms the input image patch to represent image characteristics of the near-infrared domain. According to an embodiment, the second neural network model 712 may be a U-net including 10 blocks.

일 실시 예에 의하면, 제2 신경망 모델(712)은 복수의 컨벌루션 레이어들(722, 724, 728) 및 컨벌루션 전치 레이어들(726, convolution transponse layer)을 포함할 수 있다. 보다 상세하게는, 제2 신경망 모델(712)은 입력 레이어, 엔코딩 블록, 디코딩 블록 및 출력 레이어를 포함할 수 있다. 일 실시 예에 의하면, 각 엔코딩 블록들은 컨벌루션 레이어, batch norm 레이어(732), instance norm 레이어(734) 및 ReLU 활성화 레이어(736)를 포함할 수 있고, 각 디코딩 블록들은 컨벌루션 전치 레이어(Convolution transpose layer, 726), batch norm 레이어(732), instance norm 레이어(734) 및 ReLU 활성화 레이어(736)를 포함할 수 있다.According to an embodiment, the second neural network model 712 may include a plurality of convolutional layers 722 , 724 , and 728 and convolutional transponse layers 726 . More specifically, the second neural network model 712 may include an input layer, an encoding block, a decoding block, and an output layer. According to an embodiment, each of the encoding blocks may include a convolutional layer, a batch norm layer 732 , an instance norm layer 734 , and a ReLU activation layer 736 , and each decoding block is a convolution transpose layer. , 726 ), a batch norm layer 732 , an instance norm layer 734 , and a ReLU activation layer 736 .

일 실시 예에 의하면, batch norm 레이어는 신경망 모델 내 특징 벡터 값의 평균과 분산 값을 batch 단위로 normalize할 수 있다. 또한, instance norm 레이어는, 신경망 모델 내 특징 벡터들의 평균 값과 분산 값을, 신경망 모델의 batch 단위가 아닌 채널 단위의 normalize 할 수 있다. 또한, Relu 활성화 레이어는, 각 블록의 출력 값들이 0보다 작으면 0을 출력하고, 각 블록의 출력 값들이 0보다 크면 해당 출력 값을 그대로 출력할 수 있다. 또한, Tanh 활성화 레이어는 시그 모이드 활성화 레이어를 변환함으로써 생성되는 것으로, 각 블록의 출력 값을 Tanh Function의 입력값으로 할 때, Tanh Function의 출력 값을, 활성화 레이어의 출력으로 할 수 있다.According to an embodiment, the batch norm layer may normalize the average and variance values of the feature vector values in the neural network model in batch units. In addition, the instance norm layer may normalize the average value and variance value of the feature vectors in the neural network model in a channel unit rather than a batch unit in the neural network model. In addition, the Relu activation layer may output 0 when the output values of each block are less than 0, and output the corresponding output value when the output values of each block are greater than 0. In addition, the tanh activation layer is generated by transforming the sigmoid activation layer. When the output value of each block is used as the input value of the tanh function, the output value of the tanh function can be used as the output of the activation layer.

일 실시 예에 의하면, 제2 신경망 모델(712)는 5개의 엔코딩 블록들을 포함할 수 있고, 각 엔코딩 블록들에서 컨벌루션 레이어들의 필터 수는 64, 128, 256, 256. 256과 같이 점차적으로 증가될 수 있다. 또한, 제2 신경망 모델(712)내 4개의 디코딩 블록들에서, 컨벌루션 전치 레이어들의 필터 수는 256, 256, 128, 64와 같이 점차적으로 감소할 수 있다. 일 실시 예에 의하면, 엔코딩 블록 및 디코딩 블록 내 컨벌루션 레이어들의 필터(또는 커널 kernel) 사이즈는 4*4 이고, 필터들이 이동하는 간격 stride는 2이며, 컨벌루션 레이어들의 출력 데이터 감소 방지를 위한 패딩(padding) 값은 2일 수 있다.According to an embodiment, the second neural network model 712 may include five encoding blocks, and the number of filters of the convolutional layers in each encoding block may be gradually increased, such as 64, 128, 256, 256. 256. can In addition, in the four decoding blocks in the second neural network model 712 , the number of filters of the convolutional preposition layers may be gradually reduced, such as 256, 256, 128, 64. According to an embodiment, the filter (or kernel kernel) size of the convolutional layers in the encoding block and the decoding block is 4*4, the interval stride at which the filters are moved is 2, and padding for preventing output data reduction of the convolutional layers ) value may be 2.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 근적외선 도메인에 속하는 제2 이미지 패치가, 가시광선 도메인에 따른 이미지 특성을 나타내도록 하기 위해, 제3 신경망 모델(714)을 이용하여 제2 이미지 패치를 변환할 수 있다. 일 실시 예에 의하면, 제3 신경망 모델(714)은 10개의 블록들을 포함하는 U-net일 수 있다. 제3 신경망 모델(714)은 제2 신경망 모델(714)과 마지막 블록의 구조를 제외하고, 구조가 동일할 수 있다. 제2 신경망 모델 및 제3 신경망 모델 내 마지막 블록의 구조는, 각 신경망 모델이 목표로 하는 타겟 도메인의 종류에 따라 달라질 수 있다.According to an embodiment, the image matching apparatus 1000 uses the third neural network model 714 to make the second image patch belonging to the near-infrared domain exhibit image characteristics according to the visible ray domain, to make the second image patch. can be converted According to an embodiment, the third neural network model 714 may be a U-net including 10 blocks. The third neural network model 714 may have the same structure as the second neural network model 714 except for the structure of the last block. The structure of the last block in the second neural network model and the third neural network model may vary depending on the type of target domain that each neural network model targets.

예를 들어, 제3 신경망 모델(714)은 근적외선 도메인에 해당하는 이미지 패치가 입력될 경우, 입력된 이미지 패치가 가시광선 도메인에 따른 이미지 특성을 나타내도록 변환하기 위한 모델이므로, 타겟 도메인은 가시광선 도메인으로 설정될 수 있다. 따라서, 가시광선 도메인의 경우, 총 3개의 채널을 필요로 하므로, 제3 신경망 모델의 마지막 블록의 컨벌루션 레이어의 필터수는 3일 수 있다. 반대로, 제2 신경망 모델(712)은 가시광선 도메인에 해당하는 이미지 패치가 입력될 경우, 입력된 이미지 패치가 근적외선 도메인에 따른 이미지 특성을 나타내도록 변환하기 위한 모델이므로, 타겟 도메인은 근적외선 도메인으로 설정될 수 있다. 따라서, 제2 신경망 모델의 마지막 블록의 컨벌루션 레이어의 필터수는 1일 수 있다.For example, since the third neural network model 714 is a model for converting an image patch corresponding to the near-infrared domain to represent image characteristics according to the visible ray domain when an image patch corresponding to the near-infrared domain is input, the target domain is the visible ray domain. It can be set as a domain. Accordingly, in the case of the visible ray domain, since a total of three channels are required, the number of filters in the convolutional layer of the last block of the third neural network model may be three. Conversely, since the second neural network model 712 is a model for converting an image patch corresponding to the visible ray domain to show image characteristics according to the near-infrared domain when an image patch is input, the target domain is set to the near-infrared domain. can be Accordingly, the number of filters in the convolutional layer of the last block of the second neural network model may be 1.

또한, 제2 신경망 모델(712) 및 제3 신경망 모델(714)의 각 일단에는 Tanh 활성화 레이어(738)이 연결될 수 있다. 상술한 바와 같이, Tanh 활성화 레이어(738)는 각 제2 신경망 모델 및 제3 신경망 모델로부터 출력되는 이미지 패치의 범위가 [-1, 1]이 되도록, normalize 하는 기능을 수행할 수 있다.In addition, a Tanh activation layer 738 may be connected to each end of the second neural network model 712 and the third neural network model 714 . As described above, the Tanh activation layer 738 may perform a normalizing function so that the range of the image patch output from each of the second neural network model and the third neural network model becomes [-1, 1].

도 8은 일 실시 예에 따른 이미지 정합 장치가 이미지 패치로부터 특징 벡터를 추출하기 위해 이용하는 신경망 모델의 구조를 설명하기 위한 도면이다.8 is a diagram for explaining the structure of a neural network model used by an image matching apparatus to extract a feature vector from an image patch according to an embodiment.

일 실시 예에 의하면, 이미지 정합 장치(1000)는 듀얼 샴 네트워크 구조(dual-Siamese network structure)의 합성곱 신경망 모델을 이용하여, 이미지 패치로부터 특징 벡터를 추출할 수 있다. 예를 들어, 이미지 정합 장치(1000)는 도 6에서 상술한 바와 같이, 제4 신경망 모델을 이용하여 변환된 제1 이미지 패치로부터 제1 특징 벡터를 추출하고, 제6 신경망 모델을 이용하여 제2 이미지 패치로부터 제2 특징 벡터를 추출하며, 제7 신경망 모델을 이용하여 제1 이미지 패치로부터 제3 특징 벡터를 추출하고, 제5 신경망 모델을 이용하여 변환된 제2 이미지 패치로부터 제4 특징 벡터를 추출할 수 있다. 일 실시 예에 의하면, 제4 신경망 모델, 제6 신경망 모델, 제7 신경망 모델 및 제5 신경망 모델 각각은 도 8에 도시된 신경망 모델 구조(802)를 포함할 수 있다.According to an embodiment, the image matching apparatus 1000 may extract a feature vector from an image patch using a convolutional neural network model of a dual-Siamese network structure. For example, as described above with reference to FIG. 6 , the image matching apparatus 1000 extracts a first feature vector from a first image patch converted using a fourth neural network model, and uses a sixth neural network model to extract a second feature vector. A second feature vector is extracted from the image patch, a third feature vector is extracted from the first image patch using a seventh neural network model, and a fourth feature vector is extracted from the converted second image patch using a fifth neural network model. can be extracted. According to an embodiment, each of the fourth neural network model, the sixth neural network model, the seventh neural network model, and the fifth neural network model may include the neural network model structure 802 shown in FIG. 8 .

이미지 정합 장치(1000)가 이용하는 제4 신경망 모델 및 제6 신경망 모델은 가중치를 서로 공유하는 샴 네트워크 구조(dual-Siamese network structure)의 제1 타입의 합성곱 신경망 모델이고, 제7 신경망 모델 및 제5 신경망 모델은 가중치를 서로 공유하는 샴 네트워크 구조(dual-Siamese network structure)의 제2 타입의 합성곱 신경망 모델일 수 있다. 즉, 이미지 정합 장치(1000)는 듀얼 샴 네트워크 구조의 합성곱 신경망 모델을 이용하여 제1 이미지 패치, 제2 이미지 패치, 변환된 제1 이미지 패치 및 변환된 제2 이미지 패치 각각으로부터 특징 벡터를 추출할 수 있다.The fourth neural network model and the sixth neural network model used by the image matching apparatus 1000 are a first type of convolutional neural network model of a dual-Siamese network structure that shares weights with each other, and the seventh neural network model and the sixth neural network model are 5 The neural network model may be a second type of convolutional neural network model of a dual-Siamese network structure that shares weights with each other. That is, the image matching apparatus 1000 extracts a feature vector from each of the first image patch, the second image patch, the converted first image patch, and the converted second image patch using the convolutional neural network model of the dual Siamese network structure. can do.

일 실시 예에 의하면, 이미지 정합 장치(1000)가 이미지 패치 또는 변환된 이미지 패치로부터 특징 벡터를 추출하기 위해 이용하는 제4 신경망 모델, 제6 신경망 모델, 제5 신경망 모델 및 제7 신경망 모델 각각은 총 8개의 블록들을 포함할 수 있고, 각 블록들은 컨벌루션 레이어, batch norm 레이어, instance norm 레이어, ReLU활성화 레이어를 포함할 수 있다. 각 레이어들의 구성은 도 7에서 상술한 바에 대응될 수 있으므로 구체적인 설명은 생략하기로 한다. 또한, 일 실시 예에 의하면, 각 블록 내 컨벌루션 레이어의 필터 수, 필터 사이즈 및 stride는 각 블록 별로(32, 3*3, 1), (64, 3*3, 1), (128, 3*3, 1), (128, 5*5, 2), (256, 3*3, 1), (256, 5*5, 2), (256, 3*3, 1) 및 (256, 5*5, 2)와 같이 설정될 수 있다. According to an embodiment, each of the fourth neural network model, the sixth neural network model, the fifth neural network model, and the seventh neural network model used by the image matching apparatus 1000 to extract a feature vector from the image patch or the converted image patch is a total It may include eight blocks, and each block may include a convolutional layer, a batch norm layer, an instance norm layer, and a ReLU activation layer. Since the configuration of each layer may correspond to that described above with reference to FIG. 7 , a detailed description thereof will be omitted. In addition, according to an embodiment, the number of filters, filter size, and stride of the convolutional layer in each block are (32, 3*3, 1), (64, 3*3, 1), (128, 3*) for each block. 3, 1), (128, 5*5, 2), (256, 3*3, 1), (256, 5*5, 2), (256, 3*3, 1) and (256, 5* 5, 2) can be set.

이미지 정합 장치(1000)는 변환된 제1 이미지 패치로부터 추출된 제1 특징 벡터 및 제2 이미지 패치로부터 추출된 제2 특징 벡터의 원소 별 거리 차이의 절대값의 제곱 값에 기초하여 제1 차이 벡터를 결정할 수 있다. 또한, 이미지 정합 장치(1000)는 제1 이미지 패치로부터 추출된 제3 특징 벡터 및 변환된 제2 이미지 패치로부터 추출된 제4 특징 벡터의 원소 별 거리 차이의 절대 값의 제곱 값에 기초하여 제2 차이 벡터를 결정할 수 있다. 이미지 정합 장치(1000)는 상기 결정된 제1 차이 벡터 및 제2 차이 벡터를 이미지간의 유사도를 판단하도록 학습된 제1 신경망 모델에 입력함으로써, 이미지 패치 쌍 간의 정합 정도에 관한 유사도 점수를 획득할 수 있다.The image matching apparatus 1000 may configure the first difference vector based on the square value of the absolute value of the distance difference between elements of the first feature vector extracted from the converted first image patch and the second feature vector extracted from the second image patch. can be decided Also, the image matching apparatus 1000 may be configured to perform a second function based on the square value of the absolute value of the distance difference between elements of the third feature vector extracted from the first image patch and the fourth feature vector extracted from the converted second image patch. The difference vector can be determined. The image matching apparatus 1000 may obtain a similarity score regarding the degree of matching between image patch pairs by inputting the determined first difference vector and the second difference vector into the first neural network model trained to determine the similarity between images. .

도 9는 일 실시 예에 따른 이미지 정합 장치가 신경망 모델을 이용하여 이미지 패치를 변환한 결과를 설명하기 위한 도면이다.FIG. 9 is a diagram for explaining a result of converting an image patch using a neural network model by an image matching apparatus according to an exemplary embodiment.

도 9를 참조하면, 가시광선 도메인에 속하는 이미지 패치가 입력되면, 입력된 이미지 패치가, 근적외선 도메인에 따른 이미지 특성을 나타내도록, 입력 이미지 패치를 변환하는 제2 신경망 모델에서 출력된 이미지 패치와 근적외선 도메인에 속하는 이미지 패치가 입력되면, 입력된 이미지 패치가, 가시광선 도메인에 따른 이미지 특성을 나타내도록, 입력된 이미지 패치를 변환하는 제3 신경망 모델에서 출력된 이미지 패치가 도시된다. Referring to FIG. 9 , when an image patch belonging to the visible ray domain is input, the image patch output from the second neural network model that converts the input image patch and the near infrared ray so that the input image patch exhibits image characteristics according to the near infrared domain When an image patch belonging to a domain is input, an image patch output from the third neural network model that transforms the input image patch is shown so that the input image patch exhibits image characteristics according to the visible ray domain.

보다 상세하게는 도 9에 도시된 이미지 패치들은, 상기 제3 손실 함수에서, 제2 손실 함수(예컨대, 상기 수학식 3에 따른 Lconversion 손실)가 제거된 손실 함수(예컨대 제1 손실 함수)를 최소화하도록 학습된 제2 신경망 모델 및 제3 신경망 모델의 출력 이미지 패치들을 나타낼 수 있다.In more detail, the image patches shown in FIG. 9 minimize a loss function (eg, a first loss function) in which a second loss function (eg, an Lconversion loss according to Equation 3) is removed from the third loss function. Output image patches of the second neural network model and the third neural network model that have been trained to do so may be represented.

예를 들어, 제3 손실 함수로부터 제2 손실 함수가 제거됨으로써 획득되는 제1 손실 함수만을 최소화하도록 학습되는 제2 신경망 모델 및 제3 신경망 모델에 이미지 패치들(912, 916, 922)이 입력되면, 제2 신경망 모델 및 제3 신경망 모델로부터 변환된 이미지 패치들(913, 918, 924)이 출력될 수 있다. 변환된 이미지 패치들(913, 918, 924)은, 도 9에 도시된 바와 같이, 원본 이미지 패치와 유사한 객체 또는 객체의 부분들을 포함하지 않기 때문에, 원본 이미지와 형상(appearance) 면에서 차이가 클 수 있다. 또한, 제1 손실 함수만을 최소화하도록 학습된 제2 신경망 모델 및 제3 신경망 모델로부터 출력된 변환된 이미지 패치들(913, 918, 924))은 이미지 패치 쌍간의 정합 여부를 식별하는데 필요한 이미지 정보들을 거의 포함하지 않을 수 있다.For example, when the image patches 912 , 916 , and 922 are input to the second neural network model and the third neural network model that are trained to minimize only the first loss function obtained by removing the second loss function from the third loss function , image patches 913 , 918 , and 924 converted from the second neural network model and the third neural network model may be output. The transformed image patches 913 , 918 , and 924 have a large difference in appearance from the original image because they do not include an object or parts of an object similar to the original image patch, as shown in FIG. 9 . can In addition, the transformed image patches 913, 918, and 924 output from the second neural network model and the third neural network model trained to minimize only the first loss function are image information necessary to identify whether the image patch pairs are matched. There may be little to no

그러나, 도 9에 도시된 실시 예와 달리, 본 개시의 일 실시 예에 따른 이미지 정합 장치(1000)는 패치의 정합 정도를 결정하기 위한 제1 손실 함수 및 이미지 패치 및 변환된 이미지 패치 사이의 변환을 결정하는 제2 손실 함수를 모두 포함하는 제3 손실 함수를 최소화하도록, 이미지 정합 장치 내의 신경망 모델들의 가중치를 수정 및 갱신함으로써, 서로 다른 스펙트럼 이미지 패치 쌍간에도 정확하게 이미지 패치의 정합 여부를 식별할 수 있다.However, unlike the embodiment shown in FIG. 9 , the image matching apparatus 1000 according to an embodiment of the present disclosure includes a first loss function for determining a matching degree of a patch and a conversion between the image patch and the converted image patch By modifying and updating the weights of the neural network models in the image matching device to minimize the third loss function including all the second loss functions that determine there is.

도 10은 일 실시 예에 따른 이미지 정합 장치의 블록도이다.10 is a block diagram of an image matching apparatus according to an exemplary embodiment.

도 10에 도시된 바와 같이, 이미지 정합 장치(1000)는 프로세서(1400) 및 메모리(1402)를 포함할 수 있다. 그러나, 도시된 구성 요소가 모두 필수구성요소인 것은 아니고, 도시된 구성 요소보다 많은 구성 요소에 의해 이미지 정합 장치(1000)가 구현될 수도 있고, 그보다 적은 구성 요소에 의해서도 이미지 정합 장치(1000)는 구현될 수도 있다. 일 실시 예에 의하면, 이미지 정합 장치(1000)는 프로세서(1400) 및 메모리(1402)외에 통신부(미도시)를 더 포함할 수도 있다.As shown in FIG. 10 , the image matching apparatus 1000 may include a processor 1400 and a memory 1402 . However, not all illustrated components are essential components, and the image registration apparatus 1000 may be implemented by more components than the illustrated components, and the image registration apparatus 1000 may be implemented with fewer components than the illustrated components. may be implemented. According to an embodiment, the image matching apparatus 1000 may further include a communication unit (not shown) in addition to the processor 1400 and the memory 1402 .

프로세서(1400)는, 통상적으로 이미지 정합 장치(1000)의 전반적인 동작을 제어할 수 있다.The processor 1400 may generally control the overall operation of the image matching apparatus 1000 .

일 실시 예에 의하면, 본 개시에 따른 프로세서(1400)는 메모리(1402)에 저장된 프로그램들을 실행함으로써, 도 1 내지 도 9에 기재된 이미지 정합 장치(1000)의 기능을 수행할 수 있다. 또한, 프로세서(1400)는 하나 또는 복수의 프로세서로 구성될 수 있고, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU와 같은 그래픽 전용 프로세서 또는 인공지능(AI) 전용 프로세서일 수 있다. 일 실시 예에 의하면, 프로세서(1400)가 범용 프로세서, 인공지능 프로세서 및 그래픽 전용 프로세서를 포함하는 경우, 인공지능 프로세서는 범용 프로세서 또는 그래픽 전용 프로세서와 별도의 칩으로 구현될 수도 있다.According to an embodiment, the processor 1400 according to the present disclosure executes programs stored in the memory 1402 to perform the function of the image matching apparatus 1000 described in FIGS. 1 to 9 . In addition, the processor 1400 may be composed of one or a plurality of processors, and the one or the plurality of processors is a general-purpose processor such as a CPU, AP, DSP (Digital Signal Processor), etc., a graphics-only processor such as a GPU, or artificial intelligence (AI). ) may be a dedicated processor. According to an embodiment, when the processor 1400 includes a general-purpose processor, an artificial intelligence processor, and a graphics-only processor, the artificial intelligence processor may be implemented as a general-purpose processor or a chip separate from the graphics-only processor.

일 실시 예에 의하면, 프로세서(1400)가 복수의 프로세서 또는 그래픽 전용 프로세서 또는 인공 지능 전용 프로세서로 구현될 때, 복수의 프로세서 또는 그래픽 전용 프로세서 또는 인공 지능 전용 프로세서 중 적어도 일부는 이미지 정합 장치(1000) 및 이미지 정합 장치(1000)와 연결된 임의의 전자 장치 또는 서버에 탑재될 수도 있다. According to an embodiment, when the processor 1400 is implemented as a plurality of processors or a graphics-only processor or an artificial intelligence-only processor, at least some of the plurality of processors or the graphics-only processor or the artificial intelligence-only processor may be configured in the image matching apparatus 1000 . and any electronic device or server connected to the image matching apparatus 1000 may be mounted.

예를 들어, 프로세서(1400)는, 메모리(1402)에 저장된 프로그램들을 실행함으로써, 서로 다른 파장 대역의 광들로부터 제1 이미지 패치 및 제2 이미지 패치를 획득하고, 상기 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환하고, 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치로부터 특징 벡터를 추출하고, 상기 추출된 특징 벡터를 제1 신경망 모델에 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별할 수 있다.For example, the processor 1400 executes programs stored in the memory 1402 to obtain a first image patch and a second image patch from lights of different wavelength bands, and the first image patch and the second image By converting a patch to represent image characteristics of a preset wavelength band, extracting a feature vector from the converted first image patch and the converted second image patch, and inputting the extracted feature vector to a first neural network model , it is possible to identify whether the first image patch and the second image patch match.

일 실시 예에 의하면, 프로세서(1400)는 상기 획득된 제1 이미지 패치 및 상기 획득된 제2 이미지 패치로부터 특징 벡터를 추출하고, 상기 획득된 제1 이미지 패치, 상기 획득된 제2 이미지 패치 각각으로부터 추출된 특징 벡터를 제1 신경망 모델에 더 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별 할 수 있다.According to an embodiment, the processor 1400 extracts a feature vector from the obtained first image patch and the obtained second image patch, and from each of the obtained first image patch and the obtained second image patch. By further inputting the extracted feature vector into the first neural network model, it is possible to identify whether the first image patch and the second image patch match.

일 실시 예에 의하면, 프로세서(1400)는 제1 파장 대역의 광들로부터 제1 이미지 및 제2 파장 대역의 광들로부터 제2 이미지를 획득하고, 상기 제1 이미지 및 상기 제2 이미지 내 각각의 픽셀 값에 기초하여, 상기 제1 이미지 및 상기 제2 이미지 각각으로부터 특징점을 추출하고, 상기 추출된 특징점을 포함하는 이미지의 일부 영역을 상기 제1 이미지 패치 및 상기 제2 이미지 패치로 획득할 수 있다.According to an embodiment, the processor 1400 obtains a first image from lights of a first wavelength band and a second image from lights of a second wavelength band, and each pixel value in the first image and the second image Based on , a feature point may be extracted from each of the first image and the second image, and a partial region of the image including the extracted feature point may be obtained as the first image patch and the second image patch.

일 실시 예에 의하면, 프로세서(1400)는 제1 파장 대역의 이미지 특성을 나타내는 제1 이미지 패치가 입력되는 경우, 제2 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하는 제2 신경망 모델을 이용하여, 상기 제1 이미지 패치를 변환하고, 제2 파장 대역의 이미지 특성을 나타내는 제2 이미지 패치가 입력되는 경우, 상기 제1 파장 대역의 이미지 특성을 나타내는 이미지 패치를 출력하는 제3 신경망 모델을 이용하여, 상기 제2 이미지 패치를 변환할 수 있다.According to an embodiment, the processor 1400 uses a second neural network model that outputs an image patch representing image characteristics of a second wavelength band when a first image patch representing image characteristics of a first wavelength band is input. , using a third neural network model that transforms the first image patch and outputs an image patch representing image characteristics of the first wavelength band when a second image patch representing image characteristics of the second wavelength band is input , the second image patch may be transformed.

일 실시 예에 의하면, 프로세서(1400)는 상기 변환된 제1 이미지 패치를 제4 신경망 모델에 입력함으로써, 상기 변환된 제1 이미지 패치로부터 제1 특징 벡터를 추출하고, 상기 변환된 제2 이미지 패치를 제5 신경망 모델에 입력함으로써, 상기 변환된 제2 이미지 패치로부터 제4 특징 벡터를 추출할 수 있다.According to an embodiment, the processor 1400 extracts a first feature vector from the transformed first image patch by inputting the transformed first image patch to a fourth neural network model, and the transformed second image patch. By inputting ? into the fifth neural network model, it is possible to extract a fourth feature vector from the transformed second image patch.

일 실시 예에 의하면, 프로세서(1400)는 상기 획득된 제2 이미지 패치를 제6 신경망 모델에 입력함으로써, 상기 획득된 제2 이미지 패치로부터 제2 특징 벡터를 추출하고, 상기 획득된 제1 이미지 패치를 제7 신경망 모델에 입력함으로써, 상기 획득된 제1 이미지 패치로부터 제3 특징 벡터를 추출할 수 있다.According to an embodiment, the processor 1400 extracts a second feature vector from the obtained second image patch by inputting the obtained second image patch to a sixth neural network model, and the obtained first image patch By inputting to the seventh neural network model, it is possible to extract a third feature vector from the obtained first image patch.

일 실시 예에 의하면, 프로세서(1400)는 상기 제1 특징 벡터 및 상기 제2 특징 벡터의 원소 별 거리 값의 제곱을 원소로 포함하는 제1 차이 벡터를 결정하고, 상기 제3 특징 벡터 및 상기 제4 특징 벡터의 원소 별 거리 값의 제곱을 원소로 포함하는 제2 차이 벡터를 결정할 수 있다. 상기 제1 신경망 모델은 상기 제1 차이 벡터 및 상기 제2 차이 벡터에 기초하여 상기 제1 이미지 패치 및 상기 제2 이미지 패치가 정합(matching)될 확률 값을 출력할 수 있음은 상술한 바와 같다.According to an embodiment, the processor 1400 determines a first difference vector including, as an element, the square of the distance values for each element of the first feature vector and the second feature vector, and the third feature vector and the second feature vector A second difference vector including the square of the distance value for each element of the 4 feature vector as an element may be determined. As described above, the first neural network model may output a matching probability value between the first image patch and the second image patch based on the first difference vector and the second difference vector.

통신부(미도시)는, 이미지 정합 장치(1000)가 다른 장치(미도시) 및 서버(2000)와 통신을 하게 하는 하나 이상의 구성요소를 포함할 수 있다. 다른 장치(미도시)는 이미지 정합 장치(1000)와 같은 컴퓨팅 장치이거나, 센싱 장치일 수 있으나, 이에 제한되지 않는다. 예를 들어, 통신부(미도시)는, 근거리 통신부, 이동 통신부를 포함할 수 있다.The communication unit (not shown) may include one or more components that allow the image matching apparatus 1000 to communicate with another device (not shown) and the server 2000 . The other device (not shown) may be a computing device such as the image matching device 1000 or a sensing device, but is not limited thereto. For example, the communication unit (not shown) may include a short-distance communication unit and a mobile communication unit.

근거리 통신부(short-range wireless communication unit)는, 블루투스 통신부, BLE(Bluetooth Low Energy) 통신부, 근거리 무선 통신부(Near Field Communication unit), WLAN(와이파이) 통신부, 지그비(Zigbee) 통신부, 적외선(IrDA, infrared Data Association) 통신부, WFD(Wi-Fi Direct) 통신부, UWB(ultra wideband) 통신부, 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 이동 통신부는, 이동 통신망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신한다. Short-range wireless communication unit, Bluetooth communication unit, BLE (Bluetooth Low Energy) communication unit, near field communication unit (Near Field Communication unit), WLAN (Wi-Fi) communication unit, Zigbee communication unit, infrared (IrDA, infrared) It may include a data association) communication unit, a Wi-Fi Direct (WFD) communication unit, an ultra wideband (UWB) communication unit, and the like, but is not limited thereto. The mobile communication unit transmits and receives a radio signal to and from at least one of a base station, an external terminal, and a server on a mobile communication network.

일 실시 예에 의하면, 통신부(미도시)는 프로세서의 제어에 의하여, 서버로 서로 다른 파장 대역의 광들에 의해 생성되는 제1 이미지 패치 및 제2 이미지 패치를 전송할 수 있다. 일 실시 예에 의하면, 통신부는 가시광선 카메라, 열화상 카메라, 근적외선 카메라로부터 각각 가시광선 이미지 패치, 열화상 이미지 패치, 근적외선 이미지 패치등을 수신할 수도 있다. 또한, 일 실시 예에 의하면, 통신부(미도시)는 서버가 이미지 패치 쌍들에 대하여 판단한 이미지 정합 결과에 대한 정보를, 서버로부터 수신할 수도 있다.According to an embodiment, the communication unit (not shown) may transmit the first image patch and the second image patch generated by lights of different wavelength bands to the server under the control of the processor. According to an embodiment, the communication unit may receive a visible ray image patch, a thermal image patch, a near-infrared image patch, and the like from the visible ray camera, the thermal imager, and the near-infrared camera, respectively. Also, according to an embodiment, the communication unit (not shown) may receive information about the image matching result determined by the server for the image patch pairs from the server.

메모리(1402)는, 프로세서(1400)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 이미지 정합 장치(1000)로 입력되거나, 이미지 정합 장치(1000)로부터 출력되는 데이터등을 저장할 수 있다. 또한, 메모리(1402)는 이미지 정합 장치가 이미지 패치 쌍간의 정합 여부를 식별하기 위해 이용하는, 제1 신경망 모델, 제2 신경망 모델, 제3 신경망 모델, 제4 신경망 모델, 제5 신경망 모델, 제6 신경망 모델 및 제7 신경망 모델에 대한 정보를 더 저장할 수도 있다. 일 실시 예에 의하면, 메모리(1402)등은 제3 손실 함수를 최소화하도록 미리 학습되는 이미지 정합 장치 내 인공 신경망 모델 내 레이어들 및 상기 레이어들 간의 연결 강도에 관한 가중치 값들을 더 저장할 수도 있다.The memory 1402 may store a program for processing and control of the processor 1400 , and may store data input to or output from the image matching apparatus 1000 . In addition, the memory 1402 is a first neural network model, a second neural network model, a third neural network model, a fourth neural network model, a fifth neural network model, and a sixth Information on the neural network model and the seventh neural network model may be further stored. According to an embodiment, the memory 1402 and the like may further store the layers in the artificial neural network model in the image matching apparatus trained in advance to minimize the third loss function and weight values related to the connection strength between the layers.

메모리(1402)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다.The memory 1402 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory), and a RAM. (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk , may include at least one type of storage medium among optical disks.

도 11은 일 실시 예에 따른 이미지 정합 장치와 연결되는 서버의 블록도이다.11 is a block diagram of a server connected to an image matching apparatus according to an exemplary embodiment.

일 실시 예에 따르면, 서버(2000)는 통신부(2100), 데이터 베이스(Data Base, 2200) 및 프로세서(2300)를 포함할 수 있다. According to an embodiment, the server 2000 may include a communication unit 2100 , a database 2200 , and a processor 2300 .

통신부(2100)는 상술한 이미지 정합 장치 (1000)의 통신부(미도시)에 대응될 수 있다. 예를 들어, 통신부(2100)는 이미지 정합 장치(1000)로부터 신경망 모델의 레이어들 및 레이어들에 포함된 조드에 관한 정보 또는 신경망 내 레이어들의 연결 강도에 관한 가중치 값들을 수신할 수 있다.The communication unit 2100 may correspond to the communication unit (not shown) of the image matching apparatus 1000 described above. For example, the communication unit 2100 may receive, from the image matching apparatus 1000 , layers of the neural network model, information on zoad included in layers, or weight values relating to connection strength of layers in the neural network.

데이터 베이스(2200)는 도 10에 도시된 이미지 정합 장치의 메모리(1402)에 대응될 수 있다. 예를 들어, 데이터 베이스(2200)는 프로세서(2300)의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 이미지 정합 장치(1000)로 입력되거나, 이미지 정합 장치(1000)로부터 출력되는 데이터를 더 저장할 수도 있다. 또한, 데이터 베이스(2200)는 이미지 정합 여부를 식별하기 위해 사용되는, 신경망 모델을 구성하는 레이어들, 레이어들에 포함된 노드들 및 레이어들의 연결 강도에 관한 가중치들에 대한 정보를 더 저장할 수도 있다. 또한, 데이터 베이스(2200)는 이미지 정합 여부를 식별하기 위해 사용되는 인공 신경망 모델 내 가중치들이 수정 및 갱신될 경우, 수정 및 갱신된 가중치에 관한 정보를 더 저장할 수도 있다.The database 2200 may correspond to the memory 1402 of the image matching apparatus shown in FIG. 10 . For example, the database 2200 may store a program for processing and controlling the processor 2300 , and may further store data input to or output from the image matching apparatus 1000 . there is. In addition, the database 2200 may further store information about the layers constituting the neural network model, the nodes included in the layers, and weights related to the connection strength of the layers, which are used to identify whether the images match. . In addition, when the weights in the artificial neural network model used to identify whether the images match or not are modified and updated, the database 2200 may further store information about the modified and updated weights.

프로세서(2300)는 통상적으로 서버(2000)의 전반적인 동작을 제어한다. 예를 들어, 프로세서(2300)는, 서버(2000)의 DB(2200)에 저장된 프로그램들을 실행함으로써, DB(2200) 및 통신부(2100) 등을 전반적으로 제어할 수 있다. 또한, 프로세서(2300)는 DB(2100)에 저장된 프로그램들을 실행함으로써, 도 1 내지 도10에서의 이미지 정합 장치(1000)의 동작의 일부를 수행할 수 있다.The processor 2300 typically controls the overall operation of the server 2000 . For example, the processor 2300 may control the DB 2200 and the communication unit 2100 in general by executing programs stored in the DB 2200 of the server 2000 . Also, the processor 2300 may perform some of the operations of the image matching apparatus 1000 in FIGS. 1 to 10 by executing programs stored in the DB 2100 .

예를 들어, 프로세서(2300)는 서로 다른 파장 대역의 광들로부터 생성되는 제1 이미지 패치 및 제2 이미지 패치를 획득하고, 상기 제1 이미지 패치 및 제2 이미지 패치를 미리 설정된 파장 대역의 이미지 특성을 나타내도록 변환하며, 상기 변환된 제1 이미지 패치 및 상기 변환된 제2 이미지 패치로부터 특징 벡터를 추출하고, 상기 추출된 특징 벡터를 제1 신경망 모델에 입력함으로써, 상기 제1 이미지 패치 및 상기 제2 이미지 패치의 정합 여부를 식별할 수도 있다.For example, the processor 2300 acquires a first image patch and a second image patch generated from lights of different wavelength bands, and sets the image characteristics of the first image patch and the second image patch in a preset wavelength band. The first image patch and the second image patch are transformed to represent It is also possible to identify whether the image patches are matched.

도 12는 일 실시 예에 따라, 가시광선 이미지 패치 및 근적외선 이미지 패치 쌍에 기초하여, 학습된 신경망 모델의 성능을 비교하기 위한 도면이다.12 is a diagram for comparing performance of a trained neural network model based on a pair of a visible ray image patch and a near-infrared image patch, according to an embodiment.

도 12는 이미지 정합 여부를 식별하기 위해, 가시광선 이미지 패치 및 근적외선 이미지 패치 쌍에 기초하여 학습된 여러 타입의 인공 신경망 모델(902)의 카테고리 별(904, 906, 908, 910, 912, 914, 916, 918) False positive rate 재현율(또는 민감도) 95의 비교 결과를 나타낸다. 각 모델의 명칭 뒤에 연결된 DA는 데이터 증강(Data Augmentation)을 의미한다. 도 12에 도시된 신경망 모델들 중, 본 개시에 따른 이미지 정합 장치(1000)가 이용하는 스펙트럼 불변 이미지 정합 모델(도 12에 도시된 Ours DA)가 가장 뛰어난 성능을 나타냄을 알 수 있다.12 shows several types of artificial neural network models 902 trained based on a pair of visible light image patches and near infrared image patches, by category (904, 906, 908, 910, 912, 914, 916, 918) shows the comparison result of the false positive rate recall (or sensitivity) 95. DA connected after the name of each model means data augmentation. It can be seen that, among the neural network models shown in FIG. 12 , the spectral invariant image matching model (Ours DA shown in FIG. 12 ) used by the image matching apparatus 1000 according to the present disclosure exhibits the best performance.

도 13은 일 실시 예에 따라 가시광선 이미지 패치 및 열화상 이미지 패치 쌍에 기초하여 학습된 신경망 모델의 성능을 비교하기 위한 도면이다.13 is a diagram for comparing the performance of a neural network model learned based on a pair of a visible light image patch and a thermal image patch according to an embodiment.

도 13은 가시광선 이미지 패치 및 열화상 이미지 패치 쌍에 기초하여 학습된 여러 신경망 모델들의 성능 지표로써, False positive rate 재현율(또는 민감도) 95 및 False positive rate 재현율(또는 민감도) 99의 비교 결과를 나타낸다. 도 13을 참조하면, 가시광선 이미지 패치 및 열화상 이미지 패치 쌍에 기초하여 학습된 여러 신경망 모델들 중, 본 개시에 따른 이미지 정합 장치(1000)가 이용하는 스펙트럼 불변 이미지 정합 모델(도 13에 도시된 Ours DA)이, False positive rate 재현율(또는 민감도) 95 및 False positive rate 재현율(또는 민감도) 99 지표 모두에서 가장 뛰어난 성능을 나타냄을 알 수 있다. 13 shows a comparison result of false positive rate recall (or sensitivity) 95 and false positive rate recall (or sensitivity) 99 as performance indicators of several neural network models trained based on a pair of visible light image patches and thermal image patches. . Referring to FIG. 13 , a spectral invariant image matching model (shown in FIG. 13 ) used by the image matching apparatus 1000 according to the present disclosure among several neural network models learned based on a pair of a visible light image patch and a thermal image patch Ours DA) shows the best performance in both the false positive rate recall (or sensitivity) 95 and the false positive rate recall (or sensitivity) 99 indicators.

즉, 본 개시에 따른 이미지 정합 장치(1000)가 이미지의 정합 여부를 식별하기 위해 애용하는 스펙트럼 불변 이미지 정합 모델은, 가시광선 이미지 패치 및 근적외선 이미지 패치 쌍뿐만 아니라, 가시광선 이미지 패치 및 열화상 이미지 패치 쌍에 대해서도 가장 뛰어난 성능을 나타냄을 알 수 있다. 따라서, 본 개시에 따른 이미지 정합 장치(1000)는, 야간 영상에서의 스테레오 정합, 다중 스펙트럼 영상을 이용한 얼굴, 물체 인식 및 보행자 검출 등 다양한 분야에 활용될 수 있다.That is, the spectral invariant image registration model used by the image registration apparatus 1000 according to the present disclosure to identify whether an image is matched is not only a visible light image patch and a near-infrared image patch pair, but also a visible light image patch and a thermal image image. It can be seen that the patch pair also shows the best performance. Accordingly, the image matching apparatus 1000 according to the present disclosure may be used in various fields such as stereo matching in a night image, face and object recognition using a multi-spectral image, and pedestrian detection.

도 14는 일 실시 예에 따라 RGB 이미지 패치 및 근적외선 이미지 패치 쌍에 기초하여 학습된 신경망 모델의 성능을 비교하기 위한 도면이다.14 is a diagram for comparing the performance of a neural network model trained based on a pair of RGB image patches and near-infrared image patches, according to an embodiment.

도 13는 이미지 정합 여부를 식별하기 위해, RGB 이미지 패치 및 근적외선 이미지 패치 쌍에 기초하여 학습된 여러 타입의 신경망 모델의 성능 비교 결과를 나타낸다. 본 개시에 따른 이미지 정합 장치(1000)가 이용하는 스펙트럼 불변 이미지 정합 모델(도 14에 도시된 Ours DA)이, False positive rate 재현율(또는 민감도) 95, False positive rate 재현율(또는 민감도) 97 및 False positive rate 재현율(또는 민감도) 99 모두에서 가장 뛰어난 성능을 나타냄을 알 수 있다. 13 shows performance comparison results of various types of neural network models trained based on a pair of RGB image patches and near-infrared image patches in order to identify whether images are matched. The spectral invariant image registration model (Ours DA shown in FIG. 14 ) used by the image matching apparatus 1000 according to the present disclosure is, False positive rate recall (or sensitivity) 95, False positive rate recall (or sensitivity) 97 and False positive It can be seen that it shows the best performance in all of rate recall (or sensitivity) 99.

도 15는 일반적인 샴 네트워크 구조의 신경망 모델, 본 개시의 실시 예에 따른 신경망 모델의 성능을 비교하기 위한 도면이다.15 is a diagram for comparing the performance of a neural network model of a general Siamese network structure and a neural network model according to an embodiment of the present disclosure.

도 15를 참조하면, 이미지의 정합 여부를 식별하기 위해 일반적인 샴 네트워크 구조의 신경망 모델로부터 출력된 유사도 점수(similarity scores, 1502), 도 3 내지 도 4에서 상술된 이미지 정합 장치(1000)가 이용하는 신경망 모델로부터 출력된 유사도 점수(1504) 및 도 4 내지 도 5에서 상술된 이미지 정합 장치(1000)가 이용하는 신경망 모델로부터 출력된 유사도 점수(1506)이 도시된다. 도 15를 참조하면, 듀얼 샴 네트워크 구조의 신경망 모델로써, 제2 신경망 모델 및 제3 신경망 모델을 모두 포함하는 스펙트럼 불변 이미지 정합 모델의 유사도 점수가 가장 높은 것을 확인할 수 있다.Referring to FIG. 15 , similarity scores 1502 output from a neural network model of a general Siamese network structure to identify whether images are matched, a neural network used by the image matching apparatus 1000 described above in FIGS. 3 to 4 . The similarity score 1504 output from the model and the similarity score 1506 output from the neural network model used by the image matching apparatus 1000 described above in FIGS. 4 to 5 are shown. Referring to FIG. 15 , as a neural network model having a dual Siamese network structure, it can be confirmed that the spectral invariant image matching model including both the second neural network model and the third neural network model has the highest similarity score.

일 실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 개시를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. The method according to an embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present disclosure, or may be known and available to those skilled in the art of computer software.

또한, 상기 일 실시 예에 다른 방법을 수행하도록 하는 프로그램이 저장된 기록매체를 포함하는 컴퓨터 프로그램 장치가 제공될 수 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. In addition, according to the embodiment, a computer program apparatus including a recording medium storing a program for performing another method may be provided. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상에서 본 개시의 실시예에 대하여 상세하게 설명하였지만 본 개시의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 개시의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 개시의 권리범위에 속한다.Although the embodiment of the present disclosure has been described in detail above, the scope of the present disclosure is not limited thereto, and various modifications and improved forms of the present disclosure are also provided by those skilled in the art using the basic concept of the present disclosure as defined in the following claims. belong to the scope of the right.

Claims

In the method of identifying whether an image is matched,
obtaining a first image patch and a second image patch from lights of different wavelength bands;
converting the first image patch and the second image patch to exhibit image characteristics of a preset wavelength band;
extracting a feature vector from the obtained first image patch and the obtained second image patch;
extracting a feature vector from the transformed first image patch and the transformed second image patch; and
By inputting feature vectors extracted from each of the obtained first image patch, the obtained second image patch, the converted first image patch, and the converted second image patch to a first neural network model, the first image identifying whether the patch and the second image patch match; A method comprising

delete

The method of claim 1, wherein the obtaining
obtaining a first image from lights of a first wavelength band and a second image from lights of a second wavelength band;
extracting a feature point from each of the first image and the second image based on respective pixel values in the first image and the second image; and
acquiring a partial region of an image including the extracted feature points as the first image patch and the second image patch; A method comprising

The method of claim 1, wherein the converting comprises:
converting the first image patch by using a second neural network model that outputs an image patch representing image characteristics of a second wavelength band when a first image patch representing image characteristics of a first wavelength band is input; and
converting the second image patch by using a third neural network model that outputs an image patch representing the image characteristic of the first wavelength band when a second image patch representing the image characteristic of the second wavelength band is input; ; A method comprising

5. The method of claim 4, wherein the extracting of a feature vector from the transformed first image patch and the transformed second image patch comprises:
extracting a first feature vector from the transformed first image patch by inputting the transformed first image patch into a fourth neural network model; and
extracting a fourth feature vector from the transformed second image patch by inputting the transformed second image patch into a fifth neural network model; A method comprising

The method of claim 5, wherein extracting a feature vector from the obtained first image patch and the obtained second image patch comprises:
extracting a second feature vector from the obtained second image patch by inputting the obtained second image patch into a sixth neural network model; and
extracting a third feature vector from the obtained first image patch by inputting the obtained first image patch into a seventh neural network model; A method comprising

7. The method of claim 6,
The fourth neural network model and the sixth neural network model share a weight with respect to layers in the fourth neural network model and the sixth neural network model and a connection strength between the layers,
The fifth neural network model and the seventh neural network model are a neural network of a siamese network structure in which layers within the fifth neural network model and the seventh neural network model and a weight related to a connection strength between the layers are shared. A method, characterized in that it is a model.

The method of claim 6, wherein the step of identifying whether the match
determining a first difference vector including, as an element, the square of the distance values for each element of the first feature vector and the second feature vector; and
determining a second difference vector including, as an element, the square of the distance values for each element of the third feature vector and the fourth feature vector; further comprising,
The method of claim 1, wherein the first neural network model outputs a probability value that the first image patch and the second image patch match based on the first difference vector and the second difference vector.

7. The method of claim 6,
The first neural network model, the second neural network model, the third neural network model, the fourth neural network model, the fifth neural network model, the sixth neural network model, and the seventh neural network model,
A layer such that a first loss function related to the degree of matching of the first image patch and the second image patch and a second loss function related to the degree of conversion of the first transformed image patch and the transformed second image patch are minimized. and updating the weights regarding connection strength between layers and layers.

7. The method of claim 6,
The fourth neural network model, the fifth neural network model, the sixth neural network model, and the seventh neural network model are determined based on the correlation between pixel information of the image patch and a kernel of a preset size when an image patch is input. A method that is a convolutional neural network model that outputs the feature values to be used in a vector form.

The method of claim 4, wherein the second neural network model and the third neural network model are
An encoder including a plurality of convolutional layers for encoding the image patch and a decoder for outputting an image patch representing image characteristics of a predetermined wavelength band by decoding the encoded image patch.

An image matching device comprising:
a memory for storing one instruction; and
at least one processor executing the one or more instructions; including,
The at least one processor by executing the one or more instructions,
obtaining a first image patch and a second image patch from lights of different wavelength bands,
converting the first image patch and the second image patch to exhibit image characteristics of a preset wavelength band;
extracting a feature vector from the obtained first image patch and the obtained second image patch;
extracting a feature vector from the transformed first image patch and the transformed second image patch;
By inputting feature vectors extracted from each of the obtained first image patch, the obtained second image patch, the converted first image patch, and the converted second image patch to a first neural network model, the first image An image matching device for identifying whether a patch and the second image patch are matched.

delete

13. The method of claim 12, wherein the at least one processor comprises:
obtaining a first image from lights of a first wavelength band and a second image from lights of a second wavelength band;
extracting a feature point from each of the first image and the second image based on each pixel value in the first image and the second image;
An image matching apparatus for acquiring a partial region of an image including the extracted feature points as the first image patch and the second image patch.

13. The method of claim 12, wherein the at least one processor comprises:
When a first image patch representing an image characteristic of a first wavelength band is input, the first image patch is converted using a second neural network model that outputs an image patch representing an image characteristic of a second wavelength band;
Transforming the second image patch by using a third neural network model that outputs an image patch representing the image characteristic of the first wavelength band when a second image patch representing the image characteristic of the second wavelength band is input Phosphorus, an image matching device.

16. The method of claim 15, wherein the at least one processor comprises:
By inputting the transformed first image patch into a fourth neural network model, extracting a first feature vector from the transformed first image patch,
By inputting the transformed second image patch into a fifth neural network model, the fourth feature vector is extracted from the transformed second image patch.

17. The method of claim 16, wherein the at least one processor comprises:
By inputting the obtained second image patch into a sixth neural network model, a second feature vector is extracted from the obtained second image patch,
By inputting the obtained first image patch into a seventh neural network model, extracting a third feature vector from the obtained first image patch.

18. The method of claim 17,
The fourth neural network model and the sixth neural network model share a weight with respect to layers in the fourth neural network model and the sixth neural network model and a connection strength between the layers,
The fifth neural network model and the seventh neural network model are a neural network of a siamese network structure in which layers within the fifth neural network model and the seventh neural network model and a weight related to a connection strength between the layers are shared. An image matching device, characterized in that the model.

18. The method of claim 17, wherein the at least one processor comprises:
determining a first difference vector including, as an element, the square of the distance values for each element of the first feature vector and the second feature vector;
determining a second difference vector including, as an element, the square of the distance values for each element of the third feature vector and the fourth feature vector;
The first neural network model outputs a probability value that the first image patch and the second image patch are matched based on the first difference vector and the second difference vector.

obtaining a first image patch and a second image patch from lights of different wavelength bands;
converting the first image patch and the second image patch to exhibit image characteristics of a preset wavelength band;
extracting a feature vector from the obtained first image patch and the obtained second image patch;
extracting a feature vector from the transformed first image patch and the transformed second image patch; and
By inputting feature vectors extracted from each of the obtained first image patch, the obtained second image patch, the converted first image patch, and the converted second image patch to a first neural network model, the first image identifying whether the patch and the second image patch match; A computer-readable recording medium recording a program for executing a method of identifying whether an image is matched or not, including a computer-readable recording medium.