KR102242334B1

KR102242334B1 - Method and Device for High Resolution Video Frame Rate Conversion with Data Augmentation

Info

Publication number: KR102242334B1
Application number: KR1020190167488A
Authority: KR
Inventors: 정진우; 안하은; 김제우
Original assignee: 한국전자기술연구원
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2021-04-20
Anticipated expiration: 2039-12-16
Also published as: WO2021125369A1

Abstract

고해상도 비디오 영상에 대하여 고품질, 고속으로 프레임 보간을 수행하는 딥러닝에 기반 프레임 율 고속 변환 방법 및 장치가 제공된다. 본 발명의 실시예에 따른 데이터 변형 방법은 프레임 보간을 위한 중간 프레임을 생성하는 인공지능 모델의 학습에 이용될 연속하는 입력 영상들을 입력받는 단계; 입력된 입력 영상들의 영상값을 변경하여, 변형된 영상들을 생성하는 제1 생성단계; 입력된 입력 영상들의 중간 프레임의 영상값을 변경하여, 변형된 영상들의 중간 프레임을 생성하는 제2 생성단계;를 포함한다.
이에 의해, 4K 프레임과 같은 고해상도 영상을 고속으로 보간이 가능하며, Fade-in/out, 조명 변화, 줌 영상에 강인하게 학습 데이터 변형을 수행하여 이런 다양한 영상에서도 정확한 광학 흐름 지도를 생성해 낼 수 있게 하여, 고해상도 영상에 대한 고속의 강인한 프레임 보간이 가능해진다.A method and apparatus for high-speed frame rate conversion based on deep learning for performing frame interpolation with high quality and high speed on a high-resolution video image are provided. A data transformation method according to an embodiment of the present invention includes the steps of: receiving continuous input images to be used for learning an artificial intelligence model that generates an intermediate frame for frame interpolation; A first generating step of generating transformed images by changing image values of input input images; And a second generation step of generating an intermediate frame of the transformed images by changing an image value of an intermediate frame of the input input images.
As a result, it is possible to interpolate high-resolution images such as 4K frames at high speed, and it is possible to generate accurate optical flow maps even in such various images by performing robust learning data transformation on fade-in/out, lighting changes, and zoom images. Thus, high-speed robust frame interpolation for high-resolution images becomes possible.

Description

Method and Device for High Resolution Video Frame Rate Conversion with Data Augmentation}

본 발명은 고해상도 동영상의 프레임 율 변환 기술에 관한 것으로, 더욱 상세하게는 딥러닝 기법에 기반한 광학 흐름 추정 지도를 저해상도에서 예측하고 이를 고해상도로 복원하여 고해상도 동영상의 츠레임을 고속으로 보간하는 방법에 관한 것이다.The present invention relates to a frame rate conversion technology of a high-resolution video, and more particularly, to a method of predicting an optical flow estimation map based on a deep learning technique at a low resolution and restoring it to high resolution to interpolate the frame rate of a high-resolution video at high speed. will be.

1) 동영상 프레임 율 변환 기법 개요1) Outline of video frame rate conversion technique

동영상은 연속된 정지 영상의 집합으로 구성된다. 비디오에서 정지 영상을 프레임이라고 부르며 단위 시간 당 프레임의 수를 동영상의 프레임 율(frame rate)이라고 한다. 예를 들어 1초에 24장의 프레임으로 구성되면 프레임 율은 24 fps(frame per second)가 된다. 프레임 율은 촬영자의 의도, 영상의 포맷, 카메라의 한계 등에 의하여 결정된다. 관찰자가 영상을 연속된 화면으로 느끼기 위해서는 어느 정도 이상의 프레임 율이 필요하고 이보다 낮을 경우 움직임이 부드럽지 않아 보인다. 이 현상은 디스플레이의 크기, 조명, 시청 거리 등에 의해 달라질 수 있다. 이를 개선하기 위해 동영상의 프레임 율을 후처리에 의해 증가시키는 것을 동영상 프레임 율 변환이라고 한다.A moving picture is composed of a set of continuous still images. In video, a still image is called a frame, and the number of frames per unit time is called the frame rate of a video. For example, if 24 frames per second are configured, the frame rate is 24 fps (frame per second). The frame rate is determined by the intention of the photographer, the format of the image, and the limitations of the camera. In order for the observer to feel the image as a continuous screen, a certain frame rate is required, and if it is lower than this, the motion does not seem smooth. This phenomenon may vary depending on the size of the display, lighting, and viewing distance. To improve this, increasing the frame rate of a video by post-processing is called video frame rate conversion.

2) 종래 기술2) prior art

동영상 프레임 율을 증가시키는 가장 간단한 방법은 프레임을 반복하는 것이다. 예를 들어 30fps 영상을 60 fps 영상으로 증가시킬 경우, 각 프레임마다 한 프레임을 반복하여 출력하는 것이다. 그러나 이 방법의 경우 동영상의 정보량은 동일하고 움직임에 대한 연속성은 변하지 않았으므로 관찰자가 느끼는 불편감은 동일하다. 이를 해결하기 위해 연속된 프레임들을 이용하여 가상의 프레임을 생성하는 기술이 개발되었다. 즉 t 초와 t+1 초 사이의 영상을 이용하여 t+0.5 초의 중간 영상을 새롭게 생성하며 이를 프레임 보간(frame interpolation) 기술이라고 한다.The simplest way to increase the video frame rate is to repeat the frames. For example, if a 30fps video is increased to a 60fps video, one frame is repeatedly output for each frame. However, in this method, since the amount of information in the video is the same and the continuity of the movement has not changed, the discomfort felt by the observer is the same. To solve this problem, a technology for generating a virtual frame using consecutive frames has been developed. That is, an intermediate image of t+0.5 second is newly generated by using an image between t seconds and t+1 second, and this is called a frame interpolation technique.

프레임 보간은 다양한 방법이 개발되었으며 일반적으로는 다음과 같은 두 단계 과정을 거친다. 첫 번째 단계는 움직임 또는 광학 흐름 지도를 획득하는 단계 이며 두 번째 단계는 움직임 정보를 바탕으로 중간 프레임을 생성(Synthesis) 하는 단계이다. 동영상에서 물체의 움직임이 부드럽게 보이려면 중간 영상은 물체의 움직임이 두 영상 사이의 중간에 해당되어야 한다. 따라서 물체의 움직임 정보를 가지고 있는 광학 흐름 지도를 정확하게 찾는 것이 매우 중요하다. 이에 기반한 다양한 기법들이 제안되어 왔다.Various methods have been developed for frame interpolation, and generally, it goes through a two-step process as follows. The first step is to acquire a motion or optical flow map, and the second step is to generate an intermediate frame based on motion information (Synthesis). In order for the motion of an object to appear smoothly in a video, the motion of the object must be in the middle between the two images. Therefore, it is very important to accurately find the optical flow map that contains the motion information of the object. Various techniques based on this have been proposed.

최근 딥러닝(Deep learning) 알고리즘이 등장하여 컴퓨터 비젼, 음성 인식 등 다양한 분야에서 사용되고 있으며 종래에 방법에 비해 월등한 성능을 보이고 있다. 이에 발맞추어 딥러닝을 사용한 다양한 프레임 보간 기법이 등장하였다. 이 기법들은 딥러닝을 이용하여 고품질의 광학 흐름 지도를 예측하여 종래의 방법보다 더욱 뛰어난 보간 결과를 보여줌에 따라 최근 지속적으로 연구되고 있다.Recently, deep learning algorithms have appeared and are used in various fields such as computer vision and speech recognition, and are showing superior performance compared to conventional methods. In line with this, various frame interpolation techniques using deep learning have emerged. These techniques are being studied continuously in recent years as they predict high-quality optical flow maps using deep learning and show more superior interpolation results than conventional methods.

3) 종래 기술 문제점3) Problems in the prior art

기존 방법의 문제는 네트워크 구조로 인한 문제로 4K(3840x2160) 해상도와 같이 큰 해상도에 대하여 GPU 메모리 부족으로 연산이 불가능 하거나, 매우 느린 연산 속도를 보여준다. 이와 같은 현상은 실시간 연산을 요구하는 상용 애플리케이션에 딥러닝을 이용한 프레임 보간 방법 적용을 어렵게한다. 또한, 고해상도 영상은 저해상도 영상들에 비하여 일반적으로 큰 움직임을 가진다. 기존 방법들은 이러한 큰 움직임에 대하여 저품질의 광학 흐름 지도를 생성하는 경향이 있으며, 이는 보간된 영상의 품질 저하를 야기하는 문제를 가진다.The problem of the existing method is a problem due to the network structure, and for a large resolution such as 4K (3840x2160) resolution, calculation is not possible due to insufficient GPU memory, or a very slow calculation speed is shown. This phenomenon makes it difficult to apply the frame interpolation method using deep learning to commercial applications that require real-time computation. In addition, high-resolution images generally have a larger motion than low-resolution images. Existing methods tend to generate low-quality optical flow maps for such large movements, which has a problem of causing deterioration of the interpolated image quality.

또한 기존 방법의 문제는 Fade-in/out, 조명 변화, 줌 영상을 고려하지 않아 이런 종류의 영상에 있어서는 광학 흐름 지도가 정확히 생성되지 않는 문제점이 발생한다. 따라서 이런 영상에서는 프레임 보간 성능이 급격히 저하되는 단점이 존재한다.Also, the problem of the existing method is that the optical flow map is not accurately generated for this type of image because fade-in/out, lighting change, and zoom image are not considered. Therefore, in such an image, there is a disadvantage that the frame interpolation performance is rapidly deteriorated.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 고해상도 비디오 영상에 대하여 고품질, 고속으로 프레임 보간을 수행하는 딥러닝에 기반 프레임 율 고속 변환 방법 및 장치를 제공함에 있다.The present invention has been conceived to solve the above problems, and an object of the present invention is to provide a frame rate fast conversion method and apparatus based on deep learning that performs frame interpolation with high quality and high speed on a high-resolution video image. .

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 데이터 변형 방법은 프레임 보간을 위한 중간 프레임을 생성하는 인공지능 모델의 학습에 이용될 연속하는 입력 영상들을 입력받는 단계; 입력된 입력 영상들의 영상값을 변경하여, 변형된 영상들을 생성하는 제1 생성단계; 입력된 입력 영상들의 중간 프레임의 영상값을 변경하여, 변형된 영상들의 중간 프레임을 생성하는 제2 생성단계;를 포함한다.In accordance with an embodiment of the present invention for achieving the above object, a data transformation method includes the steps of: receiving consecutive input images to be used for learning an artificial intelligence model that generates an intermediate frame for frame interpolation; A first generating step of generating transformed images by changing image values of input input images; And a second generation step of generating an intermediate frame of the transformed images by changing an image value of an intermediate frame of the input input images.

제1 생성 단계는, 입력 영상들 중 제1 입력 영상에 대해서는 조정 파라미터를 감산하여 제1 입력 영상을 변형하고, 입력 영상들 중 제2 입력 영상에 대해서는 조정 파라미터를 가산하여 제2 입력 영상을 변경하며, 제2 생성 단계는, 중간 프레임의 시간에 따라 가변하는 조정 파라미터를 가산하여 중간 프레임을 변형하는 것일 수 있다.In the first generating step, the first input image is modified by subtracting the adjustment parameter for the first input image among the input images, and the second input image is changed by adding the adjustment parameter for the second input image among the input images. In addition, the second generation step may be to modify the intermediate frame by adding an adjustment parameter that varies according to the time of the intermediate frame.

제1 생성 단계는, 입력 영상들 중 제1 입력 영상에 대해 조정 파라미터를 가산하여 제1 입력 영상을 변형하고, 입력 영상들 중 제2 입력 영상에 대해 조정 파라미터를 가산하여 제2 입력 영상을 변형하며, 제2 생성 단계는, 중간 프레임에 대해 조정 파라미터를 가산하여 중간 프레임을 변형하는 것일 수 있다.In the first generating step, the first input image is modified by adding an adjustment parameter to the first input image among the input images, and the second input image is modified by adding the adjustment parameter to the second input image among the input images. In addition, the second generation step may be to modify the intermediate frame by adding an adjustment parameter to the intermediate frame.

조정 파라미터는, 랜덤한 값으로 생성되는 것일 수 있다.The adjustment parameter may be generated with a random value.

조정 파라미터는, 밝기, 감마 함수, 컨트라스트, 휴(hue), 새츄레이션(saturation) 중 어느 하나를 조정하기 위한 파라미터일 수 있다.The adjustment parameter may be a parameter for adjusting any one of brightness, gamma function, contrast, hue, and saturation.

본 발명에 따른 데이터 변형 방법은 입력 영상들과 입력 영상들의 중간 프레임의 기준 패치를 결정하는 단계; 입력 영상들 중 제1 입력 영상의 기준 패치를 축소하는 단계; 입력 영상들 중 제2 입력 영상의 기준 패치를 확대하는 단계; 축소된 제1 입력 영상의 기준 패치를 원래 크기로 조정하는 제1 조정단계; 확대된 제2 입력 영상의 기준 패치를 원래 크기로 조정하는 제2 조정단계;를 더 포함할 수 있다.The data transformation method according to the present invention includes determining a reference patch of input images and an intermediate frame of the input images; Reducing a reference patch of a first input image among input images; Enlarging a reference patch of a second input image among the input images; A first adjustment step of adjusting a reference patch of the reduced first input image to an original size; It may further include a second adjustment step of adjusting the reference patch of the enlarged second input image to the original size.

결정 단계는, 기준 패치의 위치와 크기를 결정하는 것일 수 있다.The determining step may be to determine the location and size of the reference patch.

본 발명에 따른 데이터 변형 방법은 줌 파라미터를 랜덤하게 생성하는 단계;를 더 포함하고, 축소 단계는, 제1 입력 영상의 기준 패치를 줌 파라미터를 이용하여 축소하며, 확대 단계는, 제2 입력 영상의 기준 패치를 줌 파라미터를 이용하여 확대하는 것일 수 있다.The data transformation method according to the present invention further includes a step of randomly generating a zoom parameter, wherein the reducing step is to reduce the reference patch of the first input image by using the zoom parameter, and the enlargement step is the second input image The reference patch of may be enlarged using a zoom parameter.

제1 조정 단계 및 제2 조정 단계는, 확대된 제2 입력 영상의 기준 패치가 제2 입력 영상을 벗어나지 않은 것으로 판단된 경우에 수행되는 것일 수 있다.The first adjustment step and the second adjustment step may be performed when it is determined that the reference patch of the enlarged second input image does not deviate from the second input image.

본 발명의 다른 측면에 따르면, 프레임 보간을 위한 중간 프레임을 생성하는 인공지능 모델의 학습에 이용될 연속하는 입력 영상들을 입력받는 입력부; 입력된 입력 영상들의 영상값을 변경하여 변형된 영상들을 생성하고, 입력된 입력 영상들의 중간 프레임의 영상값을 변경하여 변형된 영상들의 중간 프레임을 생성하는 프로세서;를 포함하는 것을 특징으로 하는 데이터 변형 시스템이 제공된다.According to another aspect of the present invention, there is provided an input unit for receiving continuous input images to be used for training of an artificial intelligence model that generates an intermediate frame for frame interpolation; And a processor configured to generate transformed images by changing an image value of input input images, and to generate an intermediate frame of transformed images by changing an image value of an intermediate frame of the input input images. A system is provided.

본 발명의 또다른 측면에 따르면, 프레임 보간을 위한 중간 프레임을 생성하는 인공지능 모델의 학습에 이용될 연속하는 입력 영상들을 입력받는 단계; 입력된 입력 영상들의 영상값을 변경하여, 변형된 영상들을 생성하는 제1 생성단계; 입력된 입력 영상들의 중간 프레임의 영상값을 변경하여, 변형된 영상들의 중간 프레임을 생성하는 제2 생성단계; 제1 생성단계에서 생성된 변형된 영상들과 제2 생성단계에서 생성된 변형된 영상들의 중간 프레임으로 인공지능 모델을 학습시키는 단계;를 포함하는 것을 특징으로 하는 인공지능 모델 학습 방법이 제공된다.According to another aspect of the present invention, there is provided a method comprising: receiving continuous input images to be used for training of an artificial intelligence model that generates an intermediate frame for frame interpolation; A first generating step of generating transformed images by changing image values of input input images; A second generation step of generating an intermediate frame of the transformed images by changing an image value of an intermediate frame of the input input images; There is provided an artificial intelligence model training method comprising: training an artificial intelligence model with an intermediate frame between the transformed images generated in the first generating step and the transformed images generated in the second generating step.

본 발명의 또다른 측면에 따르면, 프레임 보간을 위한 중간 프레임을 생성하는 인공지능 모델의 학습에 이용될 연속하는 입력 영상들의 영상값을 변경하여 변형된 영상들을 생성하고, 입력된 입력 영상들의 중간 프레임의 영상값을 변경하여 변형된 영상들의 중간 프레임을 생성하는 데이터 변형 시스템; 및 프로세서에 의해 생성된 변형된 영상들과 변형된 영상들의 중간 프레임으로 인공지능 모델을 학습시키는 고속 프레임 보간 시스템;을 포함하는 것을 특징으로 하는 인공지능 모델 학습 시스템이 제공된다.According to another aspect of the present invention, transformed images are generated by changing image values of consecutive input images to be used for training of an artificial intelligence model that generates an intermediate frame for frame interpolation, and an intermediate frame of the input images A data transformation system for generating an intermediate frame of transformed images by changing an image value of; And a high-speed frame interpolation system for training an artificial intelligence model with an intermediate frame between the transformed images generated by the processor and the transformed images.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 4K 프레임과 같은 고해상도 영상을 고속으로 보간이 가능하며, Fade-in/out, 조명 변화, 줌 영상에 강인하게 학습 데이터 변형을 수행하여 이런 다양한 영상에서도 정확한 광학 흐름 지도를 생성해 낼 수 있게 하여, 고해상도 영상에 대한 고속의 강인한 프레임 보간이 가능해진다.As described above, according to the embodiments of the present invention, it is possible to interpolate high-resolution images such as 4K frames at high speed, and by robustly transforming learning data on fade-in/out, lighting changes, and zoom images, It is possible to generate an accurate optical flow map even in an image, enabling high-speed and robust frame interpolation for high-resolution images.

도 1은 프레임 보간 기술,
도 2는 고속 프레임 보간 시스템의 블럭도,
도 3은 피라미드 형태의 영상 표현 방법,
도 4은 저해상도 광학 흐름 예측부의 상세 블럭도,
도 5는 광학 흐름 해상도 향상부의 상세 블럭도,
도 6은 중간 프레임 시간 간격,
도 7은 중간 프레임 생성부의 상세 블럭도,
도 8은 중간 프레임 해상도 향상부의 상세 블럭도,
도 9는 학습 데이터 구성,
도 10은 데이터 변형된 학습 데이터 구성,
도 11은 밝기 조정 파라미터 결정 방법,
도 12는 학습을 위한 샘플 별 데이터 변형 과정,
도 13은 줌 데이터 변형 패치 생성 개념도,
도 14는 줌 데이터 변형 패치 생성 과정,
도 15는 줌 데이터 변형 패치 생성 순서도,
도 16은 데이터 변형 시스템의 블럭도이다.1 is a frame interpolation technique,
2 is a block diagram of a high-speed frame interpolation system;
3 is a method of representing an image in the form of a pyramid,
4 is a detailed block diagram of a low-resolution optical flow prediction unit;
5 is a detailed block diagram of an optical flow resolution improving unit;
6 is an intermediate frame time interval,
7 is a detailed block diagram of an intermediate frame generation unit;
8 is a detailed block diagram of an intermediate frame resolution enhancement unit;
9 is a training data configuration,
10 is a data-transformed training data configuration,
11 is a method of determining a brightness adjustment parameter,
12 is a data transformation process for each sample for learning,
13 is a conceptual diagram of generating a zoom data modification patch;
14 is a zoom data modification patch generation process;
15 is a flowchart of generating a zoom data modification patch;
16 is a block diagram of a data transformation system.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

본 발명의 실시예에서는, 첫 번째로 딥러닝 네트워크를 이용하여 고해상도 영상에 대하 고속 프레임 보간을 수행하는 방법을 제시하고, 두 번째로 조명 변화, 페이드 인/아웃, 줌 영상에 강인한 프레임 보간 기법으로 이는 학습시 데이터를 변형(argumentation) 하는 방법을 제시한다.In an embodiment of the present invention, first, a method of performing high-speed frame interpolation on a high-resolution image using a deep learning network is presented, and secondly, a frame interpolation technique that is robust to lighting changes, fade in/out, and zoom images is provided. This suggests a method of argumentation of data during learning.

1. 고속 프레임 보간1. High-speed frame interpolation

본 발명의 일 실시예에 따른 고속 프레임 보간 시스템은, 도 2에 도시된 바와 같이, 광학 흐름 예측부(110), 저해상도 광학 흐름 해상도 향상부(120), 중간 프레임 생성부(130), 중간 프레임 해상도 향상부(140)를 포함하여 구성된다.In the high-speed frame interpolation system according to an embodiment of the present invention, as shown in FIG. 2, an optical flow prediction unit 110, a low-resolution optical flow resolution enhancement unit 120, an intermediate frame generation unit 130, and an intermediate frame It is configured to include a resolution enhancement unit 140.

저해상도 광학 흐름 예측부(110)에서는 입력으로 주어진 시간적으로 연속된 프레임들의 양방향 광학 흐름을 저해상도에서 예측한다. 기존의 방법에서는 광학 흐름 예측을 원본 해상도에서 수행하기 때문에, 고해상도 영상을 입력으로 받는 경우 많은 양의 메모리를 요구하며, 제한된 하드웨어 환경을 가지는 시스템에서는 프레임 보간을 수행 할 수 없으며, 프레임 보간을 수행한다 하더라도, 매우 느린 동작 속도를 가지는 문제를 가진다. 또한 고해상도 영상의 경우 영상내의 객체들의 움직임이 매우 큰 편이며, 기존의 방법들은 이러한 경우에 blur나 ghost 열화를 발생시키는 문제를 갖는다. 본 발명의 실시예에서는 광학 흐름 예측을 저해상도에서 수행함으로써 상기 문제들을 효과적으로 해결한다.The low-resolution optical flow prediction unit 110 predicts a bidirectional optical flow of temporally consecutive frames given as inputs at a low resolution. In the conventional method, since optical flow prediction is performed at the original resolution, a large amount of memory is required when receiving a high-resolution image as an input, and frame interpolation cannot be performed in a system with a limited hardware environment, and frame interpolation is performed. Even so, it has a problem with a very slow operation speed. In addition, in the case of a high-resolution image, the movement of objects in the image is very large, and the existing methods have a problem of generating blur or ghost deterioration in this case. In an embodiment of the present invention, the above problems are effectively solved by performing optical flow prediction at a low resolution.

도 3은 본 발명의 실시예에에서 사용하는 피라미드 형태의 영상 표현 방법을 보여준다. Level 1 해상도는 원본 입력 영상의 해상도와 동일한 해상도를 가지며, Level 2의 영상들은 Level 1의 해상도를 2분의 1크기로 downsample한 해상도를 가진다. 마지막으로 Level 3 해상도는 Level 1 해상도의 4분의 1크기를 가진다. 즉 Level 1의 원본 입력 해상도를 W(너비) x H(높이) 라고 하면, Level2의 해상도는 W/2 x H/2, Level 3의 해상도는 W/4 x H/4 가 된다.3 shows a method of representing an image in the form of a pyramid used in an embodiment of the present invention. Level 1 resolution has the same resolution as that of the original input image, and Level 2 images have a resolution obtained by down-sampling the resolution of Level 1 to a size of 1/2. Finally, Level 3 resolution has a quarter size of Level 1 resolution. That is, if the original input resolution of Level 1 is W (width) x H (height), the resolution of Level 2 is W/2 x H/2, and the resolution of Level 3 is W/4 x H/4.

저해상도 광학 흐름 예측부(110)에서는 Level 3의 해상도에서 광학 흐름 지도를 생성하고, 광학 흐름 해상도 향상부(120)에서는 Level 3 해상도의 광학 흐름 지도를 Level 2 해상도로 복원한다. 중간 프레임 생성부(130)에서는 Level 2의 해상도를 갖는 광학 흐름과 Level 2 해상도의 입력 영상을 이용하여 Level 2 해상도의 중간 프레임을 생성한다. 중간 프레임 해상도 향상부(140)에서는 Level 2 해상도의 중간 프레임을 Level 1의 원해상도로 복원한다.The low-resolution optical flow prediction unit 110 generates an optical flow map at the level 3 resolution, and the optical flow resolution enhancement unit 120 restores the level 3 resolution optical flow map to the Level 2 resolution. The intermediate frame generator 130 generates an intermediate frame having a level 2 resolution by using an optical flow having a level 2 resolution and an input image having a level 2 resolution. The intermediate frame resolution improving unit 140 restores the intermediate frame of Level 2 resolution to the original resolution of Level 1.

저해상도 광학 흐름 예측부(110)에서는 입력 두 프레임 사이의 광학 흐름 지도를 산출한다. 일반적으로 고품질의 광학 흐름 지로를 생성하기 위해서는 합성곱 신경망에서 충분한 크기의 receptive field가 요구된다. 하지만 receptive field가 커질수록 합성곱 신경망의 알고리즘 계산 복잡도가 증가하게 되는 문제를 가진다. 기존의 방법들에서는 원본 영상의 해상도와 동일한 크기를 가지는 광학 흐름 지도를 생성하기 때문에 4K 프레임과 같은 고해상도 영상에 대해서 느린 동작 속도를 가지는 문제를 가진다. The low-resolution optical flow prediction unit 110 calculates an optical flow map between the two input frames. In general, a receptive field of sufficient size is required in a convolutional neural network to generate high-quality optical flow paths. However, as the receptive field increases, the algorithmic computation complexity of the convolutional neural network increases. Existing methods have a problem of having a slow motion speed for a high-resolution image such as a 4K frame because an optical flow map having the same size as the resolution of the original image is generated.

본 발명의 실시예에서는 도 4에 도시된 바와 같이, 해상도 감소부(111)가 원본 입력 영상의 해상도를 Level 3 해상도로 줄인 후에, 광학 흐름 예측부(112)가 광학 흐름을 예측한다.In an embodiment of the present invention, as shown in FIG. 4, after the resolution reduction unit 111 reduces the resolution of the original input image to Level 3 resolution, the optical flow prediction unit 112 predicts the optical flow.

해상도 감소부(111)에서는 Level 1 해상도의 원본 입력 영상을 Level 3 해상도로 줄인다. 줄이는 방법은 bilinear 또는 bicubic 필터 등을 사용할 수 있으며 특정 방법에 한정되지 않는다. 도 3에서

,

의 L1은 Level 1의 해상도를,

,

의 L3는 Level 3의 해상도를 각각 나타낸다. The resolution reducing unit 111 reduces the original input image of Level 1 resolution to Level 3 resolution. The reduction method may use a bilinear or bicubic filter, and is not limited to a specific method. In Figure 3

,

L1 of is the resolution of Level 1,

,

L3 of represents the resolution of Level 3, respectively.

광학 흐름 예측부(112)에서는 Level 3 저해상도로 영성으로 입력 두 프레임 사이의 광학 흐름 지도를 산출한다. 이를 통하여 비교적 작은 크기의 receptive field를 통해서도 영상 내의 객체의 움직임을 쉽게 다룰 수 있고, 동시에 계산 연산에 필요한 메모리와 동작시간을 크게 감소시킬 수 있다.The optical flow prediction unit 112 calculates an optical flow map between the two input frames in a level 3 low resolution spiritually. Through this, it is possible to easily handle the movement of an object in an image even through a relatively small receptive field, and at the same time, it is possible to greatly reduce the memory and operation time required for calculation operations.

광학 흐름 예측부(112)에서는 원본 입력 해상도의 4분의 1의 크기(즉 Level 1의 크기) 를 갖는 광학 흐름 지도

와

를 생성한다. 광학 흐름을 생성하는 방법은 다양한 딥러닝 네트워크 및 기존 방법들이 사용될 수 있으며 특정한 방법에 한정되지 않는다.The optical flow prediction unit 112 is an optical flow map having a size of a quarter of the original input resolution (that is, the size of Level 1).

Wow

Create A method of generating the optical flow may be used various deep learning networks and existing methods, and is not limited to a specific method.

광학 흐름 해상도 향상부(120)는 Level 3 광학 흐름 지도를 Level 2의 광학 흐름 지도로 해상도를 개선시키는 장치로 도 5와 같이 구성된다. The optical flow resolution improving unit 120 is a device for improving the resolution from a Level 3 optical flow map to a Level 2 optical flow map, and is configured as shown in FIG. 5.

해상도 증가부1(121)은 Level 3 해상도의 광학 흐름 지도를 Level 2의 해상도의 광학 흐름 지도로 증가시킨다. 증가 방법은 bilinear 또는 bicubic 필터 등을 사용할 수 있으며 특정 방법에 한정되지 않는다.The resolution increasing unit 1 121 increases the level 3 resolution optical flow map to the level 2 resolution optical flow map. The increment method may use a bilinear or bicubic filter, and is not limited to a specific method.

광학 흐름 재조정부(123)는 입력 시간 t 에 대한 광학 흐름 지도로 재조정한다. 도 6과 같이 t는

,

사이에 생성할 중간 프레임의 시간을 의미한다.

의 의미는

에서

로 진행할 때의 광학 흐름 지도를

은 그 역을 나타낸다. 도 5의

는 임의의 t에서 보간된 프레임인

에서

으로 진행되는 광학 흐름 지도로 볼 수 있고

도 마찬가지이다.

와

는 다음과 같이 산출 될 수 있다.The optical flow readjustment unit 123 readjusts the optical flow map for the input time t. As shown in Figure 6, t is

,

It means the time of the intermediate frame to be created between.

Means

in

The optical flow map when proceeding to

Represents the reverse. Of Fig. 5

Is the interpolated frame at any t

in

It can be viewed as an optical flow map that proceeds to

The same goes for it.

Wow

Can be calculated as follows.

(1)

(One)

해상도 감소부2(112)에서는 Level 1 해상도를 갖는 입력 원영상을 Level 2의 해상도로 감소시켜

,

를 생성한다. 증가 방법은 bilinear 또는 bicubic 필터 등을 사용할 수 있으며 특정 방법에 한정되지 않는다.The resolution reduction unit 2 (112) reduces the input original image with Level 1 resolution to Level 2 resolution.

,

Create The increment method may use a bilinear or bicubic filter, and is not limited to a specific method.

와핑부 1(124)에서는 광학 흐름 지도

와

와 Level 2의 입력 영상

,

를 이용하여 임시 중간 프레임

와

를 생성한다.

와

는 식 (2)와 같이 구할 수 있으며, 여기서

은 back-warping 연산을 의미한다.In the warping unit 1 (124), the optical flow map

Wow

And Level 2 input video

,

Temporary Intermediate Frame Using

Wow

Create

Wow

Can be obtained as Equation (2), where

Means back-warping operation.

(2)

광학 흐름 향상부(125)에서는 와핑부1(124)의 출력

,

, 광학 흐름 재조정부(123)의 출력

,

, 해상도 감소부2(112)의 출력

,

을 이용하여 광학 흐름을 개선한다. 즉, 위의 입력들을

로 묶을 수 있고 광학 흐름 향상부(125)의 입력이 된다. 광학 흐름 향상부(125)는 합성곱 신경망으로 구성되며 개선된 광학 흐름 지도

,

를 생성한다.In the optical flow improving unit 125, the output of the warping unit 1 124

,

, The output of the optical flow readjustment unit 123

,

, The output of the resolution reduction unit 2 (112)

,

To improve optical flow. In other words, the above inputs

It can be bundled with and becomes the input of the optical flow enhancer 125. The optical flow enhancement unit 125 is composed of a convolutional neural network, and an improved optical flow map

,

Create

중간 프레임 생성부(130)는 광학 흐름 해상도 향상부(120)의 출력인

,

를 이용하여 Level 2 해상도의 중간 프레임을 생성한다. 도 7은 중간 프레임 생성부(130)의 상세 구성을 나타낸다. 도시된 바와 같이, 중간 프레임 생성부(130)는 와핑부2(131)와 중간 프레임 합성부(132)로 구성된다.The intermediate frame generation unit 130 is an output of the optical flow resolution enhancement unit 120

,

Create an intermediate frame of Level 2 resolution using. 7 shows a detailed configuration of the intermediate frame generation unit 130. As shown, the intermediate frame generating unit 130 includes a warping unit 2 131 and an intermediate frame combining unit 132.

와핑부2(131)에서는 광학 흐름 지도

,

와 Level 2의 입력 영상

,

를 이용하여 임시 중간 프레임

와

를 생성한다. 이는 식 (3)과 같이 구할 수 있다.In the warping unit 2 (131), the optical flow map

,

And Level 2 input video

,

Temporary Intermediate Frame Using

Wow

Create This can be calculated as Equation (3).

(3)

중간 프레임 합성부(132)에서는 생성된 임시 중간 프레임을 이용하여 Level 2 해상도의 최종 중간 프레임을 다음과 같이 생성한다.The intermediate frame synthesis unit 132 generates a final intermediate frame of Level 2 resolution by using the generated temporary intermediate frame as follows.

여기서 ⊙는 element-wise 곱셈 연산을 의미하며 B는 입력 영상의 blending parameter를 의미한다. 예를 들어, pixel의 p가 두 개의 입력 영상

과

에 모두 존재한다면 B는 0.5로 설정되며, 만약 p가

에는 존재하지만

에 존재하지 않는다면 B=1 로 설정된다. B는 학습을 통해서 선택 될 수도 있고 고정된 값으로 결정될 수 있다.Here, ⊙ means element-wise multiplication and B means the blending parameter of the input image. For example, the pixel p is the two input images

and

B is set to 0.5 if it is present in both, and if p is

Exists in

If it does not exist, B=1 is set. B can be selected through learning or can be determined as a fixed value.

중간 프레임 해상도 향상부(140)는 Level 2 해상도의 최종 중간 프레임을

를 원영상 해상도인 Level1 해상도로 복원한다. 도 8에 도시된 바와 같이, 중간 프레임 해상도 향상부(140)는 해상도 증가부2(141과 142), 와핑부3(143), 해상도 개선부(144)로 구성된다.The intermediate frame resolution enhancement unit 140 selects the final intermediate frame of Level 2 resolution.

Is restored to the original image resolution, Level1 resolution. As shown in FIG. 8, the intermediate frame resolution improving unit 140 includes a resolution increasing unit 2 141 and 142, a warping unit 3 143, and a resolution improving unit 144.

해상도 증가부2(141과 142)에서는

와

,

를 Level 2 해상도에서 Level 1 해상도로 증가시켜

와

,

를 생성한다. 증가 방법은 bilinear 또는 bicubic 필터 등을 사용할 수 있으며 특정 방법에 한정되지 않는다.In the resolution increase section 2 (141 and 142),

Wow

,

From Level 2 resolution to Level 1 resolution

Wow

,

해상도 개선부(144)에서는 고해상도 영상을 입력으로 넣기 위해 원영상 정보를 입력으로 사용한다. 원영상을 바로 넣을 경우

,

의 입력 간에 시간 차이가 존재하여 개선 효율이 저하된다. 이를 방지하기 위해

과

는 광학 흐름 지도를 이용하여 와핑부3(143)을 거쳐 t시간의 프레임을 각각 생성하도록 한다. 이는 식 (4)와 같이 표현할 수 있다.The resolution improving unit 144 uses the original image information as an input to input a high-resolution image as an input. If you put the original image right away

,

There is a time difference between the inputs of, and the improvement efficiency is lowered. To prevent this

and

Is to generate a frame of time t through the warping unit 3 (143) using an optical flow map. This can be expressed as Equation (4).

(4)

최종적으로 해상도 개선부(144)는

,

를 묶어

을 입력으로 하는 합성곱 신경망으로 구성된다. 해상도 개선부(144)의 출력은 Level 1의 해상도를 갖는

이 전체 네트워크의 최종 출력이 된다.Finally, the resolution improving unit 144

,

Tie it up

It is composed of a convolutional neural network with as input. The output of the resolution improving unit 144 is

This becomes the final output of the entire network.

위에서 제시한 방법으로 중간 프레임을 생성할 경우 고해상도 영상의 빠른 움직임을 고속으로 정확하게 찾을 수 있어 정확한 프레임 보간이 가능하다.When intermediate frames are generated by the method presented above, accurate frame interpolation is possible because fast movements of high-resolution images can be accurately found at high speed.

2. 데이터 변형2. Data transformation

전술하였듯이 일반적인 영상은 빈번하게 플래쉬, 조명 변화, 페이드 인/아웃이 발생하게 된다. 이런 영상에 있어서는 광학 흐름 지도가 정확하게 산출되기 어렵다. 이런 원인이 발생되는 주된 이유는 학습시 데이터에 조명 변화가 발생하는 샘플이 많이 포함되지 않아서이다. 조명 변화가 있는 샘플을 많이 포함하기 위해서는 이런 데이터를 많이 수집해야 하는데 이는 많은 비용과 노력이 필요하다. 본 발명의 실시예에서는 학습 시 데이터 변형을 통해 이를 해결하도록 한다.As described above, a general image frequently flashes, changes lighting, and fades in/out. In such an image, it is difficult to accurately calculate the optical flow map. The main reason for this cause is that there are not many samples in which lighting changes occur in the data during training. In order to include a large number of samples with varying lighting, it is necessary to collect a lot of this data, which requires a lot of cost and effort. In an embodiment of the present invention, this is solved through data transformation during learning.

일반적으로 딥러닝을 이용한 프레임 보간 학습은, 도 9와 같이 두 개의 입력 영상

,

과 한개의 Ground Truth(GT) 영상

으로 구성된다. 딥러닝 네트워크에서 두 개의 입력 영상은 도 2에서의 입력 영상이 되고, 출력 영상은

가 된다. 학습 시

와

의 차이가 최소가 되게 하도록 네트워크를 갱신하게 된다. In general, frame interpolation learning using deep learning is performed using two input images as shown in FIG. 9.

,

And one Ground Truth(GT) video

It consists of. In the deep learning network, the two input images become the input images in FIG. 2, and the output images are

Becomes. When learning

Wow

The network is updated to minimize the difference between the two.

도 9에서 보듯이 대부분의 영상에서는 학습 데이터 간 즉,

,

간에 조명 변화가 일어나지 않는다. 조명 변화가 있는 학습 데이터 샘플이 부족하므로 조명 변화가 발생할 경우 영상이 정확히 보간되지 않게 된다.As shown in Fig. 9, in most of the images, between training data, that is,

,

There is no change in lighting between them. Since the training data samples with lighting change are insufficient, the image is not accurately interpolated when lighting change occurs.

이 문제를 해결하기 위해 도 10과 같이 데이터 변형(argumentation)을 통해 강제로 조명 변화가 발생하는 학습 데이터를 생성시킨다. 밝기 조절을 위해 밝기 및 감마(gamma) 함수 등이 사용될 수 있다. 밝기 조절의 강도는 밝기 조정 파라미터 b에 의해 결정되어 진다. 이와 같을 경우 다음과 같이 데이터의 변형을 이루도록 한다.In order to solve this problem, as shown in FIG. 10, training data in which lighting changes are forcibly generated through data transformation (argumentation) are generated. Brightness and gamma functions may be used to adjust the brightness. The intensity of the brightness adjustment is determined by the brightness adjustment parameter b. In this case, transform the data as follows.

(5)

GT 영상은 t의 위치에 따라 데이터 변형을 수행하도록 하며 밝기 변형은 선형적으로 변한다고 가정한다. 이는 도 11과 같은 그래프로 표현될 수 있다. X축은 GT가 위치한 시간을 Y축은 밝기 조정 파라미터 값을 나타낸다. t의 범위는

과

사이로 제한되므로 0과 1을 포함하지 않는 0과 1 사이의 값이다. 예를 들어 t=0.5 이면 GT의 밝기 조정 파라미터는 0이 된다. GT의 밝기 조정 파라미터는 식 (6)과 같이 일반화 할 수 있다. 밝기 조정 파라미터의 산출은 항상 선형식일 필요는 없으며 다양한 방식으로 계산이 가능하다.The GT image performs data transformation according to the position of t, and it is assumed that the brightness transformation changes linearly. This can be represented by a graph as shown in FIG. 11. The X-axis represents the time at which the GT is located, and the Y-axis represents the brightness adjustment parameter value. the range of t is

and

It is a value between 0 and 1, not including 0 and 1 because it is limited to between. For example, if t=0.5, the GT's brightness adjustment parameter is 0. The brightness adjustment parameter of GT can be generalized as Equation (6). The calculation of the brightness adjustment parameter does not always have to be linear and can be calculated in various ways.

(6)

영상의 신호 범위를 0과 1사이로 정규화 했다고 가정하면, b의 범위는 -1과 1사이로 매 배치(batch)마다 랜덤하게 선택된다. 모든 학습 샘플에서 데이터 변형을 수행하지 않고 일정 확률 P 만큼 랜덤하게 데이터 변형을 수행하게 된다. 즉 데이터 변형 횟수와 강도는 모두 랜덤하게 선택되도록 한다.Assuming that the signal range of the image is normalized between 0 and 1, the range of b is randomly selected for every batch, between -1 and 1. Data transformation is performed randomly by a certain probability P without performing data transformation in all training samples. That is, the number and intensity of data transformation are all selected randomly.

또한 데이터 변형을 입력 영상에 동일하게 하는 방법을 적용할 수 있다. 이는 특정 밝기 영상의 샘플이 부족할 때 성능이 떨어지는 점을 보완할 수 있다. 이럴 경우 데이터 변형은 식 (7)과 같이 이루어진다.In addition, a method of equalizing data transformation to the input image can be applied. This can compensate for the deterioration of performance when samples of a specific brightness image are insufficient. In this case, data transformation is performed as in Equation (7).

(7)

도 12는 위의 매 학습 샘플마다 이루어지는 데이터 변형 과정을 보여준다. 0과 1사이의 값을 갖는 랜덤 변수

을 생성시킨다(S210). 랜덤 변수

이

보다 클 경우에만(S220-Yes), 데이터 변형을 수행하도록 한다(박스 점선 부분).

는 0과 1사이의 값으로 데이터 변형이 적용되는 빈도를 결정한다. 만약 데이터 변형을 하도록 결정되면, 조정 파라미터 b를 생성한다(S230). 조정 파라미터 b는 데이터 변형 강도를 결정하게 된다. 조정 파라미터 범위 a는 영상의 범위 데이터 변형 종류 등에 따라 다양하게 결정될 수 있다. 그 다음 조명 변화 데이터 변환을 할지를 랜덤 변수

를 사용하여 결정하고(S240), 조명 변화 학습 변형을 할 경우에는(S250-Yes), 식(5,6)을 사용하여 데이터 변형을 수행하고(S260), 그렇지 않을 경우에는(S250-No), 식 (7)을 사용하여 데이터 변형을 수행한다(S270).12 shows a data transformation process performed for each learning sample above. Random variable with values between 0 and 1

Generates (S210). Random variable

this

If it is larger than (S220-Yes), data transformation is performed (a dotted line in the box).

Is a value between 0 and 1 that determines the frequency at which data transformation is applied. If it is determined to perform data transformation, an adjustment parameter b is generated (S230). The adjustment parameter b will determine the strength of the data transformation. The adjustment parameter range a may be variously determined according to the type of image range data transformation. Then, whether to convert the lighting change data to a random variable

Determine using (S240), and in case of learning transformation of lighting change (S250-Yes), perform data transformation using equations (5,6) (S260), otherwise (S250-No) , Data transformation is performed using Equation (7) (S270).

위와 같은 방법은 감마 함수, 컨트라스트, 휴(hue), 새츄레이션(saturation) 등 다양한 데이터 변형 함수에 대하여 적용이 가능하다.The above method can be applied to various data transformation functions such as gamma function, contrast, hue, and saturation.

다른 실시 예로 줌(zoom) 영상을 효과적으로 대응하기 위한 데이터 변형 방법을 제시한다. 줌 영상은 시간이 진행함에 따라 물체의 크기가 변하게 된다. 이런 영상에 대하여 효율적으로 보간하기 위하여, 도 13와 같이 줌을 고려한 데이터 변형 패치를 생성하도록 한다.As another example, a data transformation method for effectively responding to a zoom image is presented. In the zoomed image, the size of the object changes as time progresses. In order to interpolate efficiently with respect to such an image, a data modification patch considering zoom is generated as shown in FIG. 13.

일반적으로 학습을 진행 할 시 영상의 일부분을 패치로 사용하도록 한다. 예를 들어 영상의 크기를

로 하고 패치의 크기를

라고 하자. 입력 영상과 GT 영상의 패치의 위치는 동일하다. 도 13에서와 같이 한 개의 입력은 크기를 감소시키고 다른 한 개의 입력은 크기를 증가시키면서 줌 영상에 대응하는 패치를 생성할 수 있다.In general, a part of the video is used as a patch when learning is conducted. For example, the size of the video

And the size of the patch

Let's say. The location of the patch of the input image and the GT image is the same. As shown in FIG. 13, a patch corresponding to the zoom image may be generated while one input decreases the size and the other input increases the size.

도 14와 도 15는 구체적인 줌 데이터 생성 패치 과정을 보여준다. 위의 데이터 변형 과정과 마찬가지로 0과 1사이의 값을 갖는 랜덤 변수

을 생성시킨다(S310). 랜덤 변수

가

보다 클 경우에만(S320-Yes) 줌 데이터 변형을 수행하도록 한다.14 and 15 show a detailed zoom data generation patch process. Like the data transformation process above, a random variable with a value between 0 and 1

It generates (S310). Random variable

end

If it is larger than (S320-Yes), the zoom data transformation is performed.

줌 데이터 변형을 수행하기로 결정되었다면 줌 파라미터 s를 랜덤하게 설정한다(S330). 줌 파라미터 s는 줌 배율의 강도를 의미하며 1이면 원래의 패치 크기와 같게 된다. 줌 파라미터가 너무 클 경우 패치 크기가 영상 크기를 벗어날 수 있는 경우가 많이 발생함으로 1과 1.2 사이의 값을 권장한다.If it is determined to perform zoom data transformation, the zoom parameter s is randomly set (S330). The zoom parameter s means the intensity of the zoom magnification, and if it is 1, it becomes the same as the original patch size. If the zoom parameter is too large, a value between 1 and 1.2 is recommended because there are many cases where the patch size may deviate from the image size.

다음으로는 도 14과 같이 기준 패치의 위치

를 결정한다(S340). 패치의 크기는

이다. 기준 패치의 위치, 패치의 크기 및 줌 파라미터를 이용하여 입력 영상

,

에서 사용될 위치를 식 (8~11)과 같이 결정한다. 식(8)과 식(10)에서

는

를 넘지 않는 가장 큰 짝수를 의미한다.

영상에서 사용될 패치의 크기는 식 (8)과 같이 산출된다. 재계산된 패치 크기인

를 기반으로

의 패치 영역을 식 (9)와 같이 계산한다. 이는 도 14의

영상의 빨간 박스 영역을 의미한다.

영상에서 사용될 패치의 크기는 식 (10)과 같이 산출된다. 재계산된 패치 크기인

를 기반으로

의 패치 영역을 식 (11)와 같이 계산한다. 이는 도 14의

영상의 빨간 박스 영역을 의미한다.

과

의 위치는 랜덤하게 바뀔 수 있다. Next, as shown in Fig. 14, the position of the reference patch

It is determined (S340). The size of the patch is

to be. Input image using the location of the reference patch, the size of the patch, and the zoom parameter

,

Determine the location to be used in the equation (8~11). In equations (8) and (10)

Is

It means the largest even number not exceeding.

The size of the patch to be used in the image is calculated as Equation (8). The recalculated patch size

Based on

The patch area of is calculated as in Equation (9). This is the

Refers to the red boxed area of the image.

The size of the patch to be used in the image is calculated as Equation (10). The recalculated patch size

Based on

The patch area of is calculated as in Equation (11). This is the

Refers to the red boxed area of the image.

and

The position of can be changed randomly.

(8)

(9)

(10)

(11)

만약 식(9)에서 계산된 영역이 영상의 범위를 벗어난다면 줌 데이터 변형 과정을 수행하지 않는다. 만약 경계 조건을 만족한다면(S360-Yes), 식(9)에서 계산된 좌표를 이용하여

의 크기를 갖는

를 생성한다(S370).

는

로 해상도를 조정하여 줌 데이터 변형된 패치인

를 생성한다. 마찬가지로 식 (10)에서 계산된 좌표를 이용하여

의 크기를 갖는

를 생성하고

로 해상도를 조정하여 줌 데이터 변형된 패치인

를 생성한다(S340). 최종적으로

를 갖는

,

를 이용하여 학습을 수행하도록 한다.If the area calculated in Equation (9) is out of the range of the image, the zoom data transformation process is not performed. If the boundary condition is satisfied (S360-Yes), using the coordinates calculated in Equation (9)

Having the size of

Generates (S370).

Is

Zoom data transformed patch by adjusting the resolution with

Create Similarly, using the coordinates calculated in Equation (10)

Having the size of

And create

Zoom data transformed patch by adjusting the resolution with

Generates (S340). Finally

Having

,

Use to perform learning.

지금까지 설명한 데이터 변형을 수행할 수 있는 시스템에 대해, 이하에서 16을 참조하여 상세히 설명한다. 도 16은 본 발명의 다른 실시예에 따른 데이터 변형 시스템의 블록도이다.A system capable of performing the data transformation described so far will be described in detail with reference to 16 below. 16 is a block diagram of a data transformation system according to another embodiment of the present invention.

본 발명의 실시예에 따른 데이터 변형 시스템은, 도 16에 도시된 바와 같이, 입력부(210), 프로세서(220), 출력부(230) 및 저장부(240)를 포함하는 컴퓨팅 시스템으로 구현 가능하다.The data transformation system according to an embodiment of the present invention can be implemented as a computing system including an input unit 210, a processor 220, an output unit 230, and a storage unit 240, as shown in FIG. 16. .

입력부(210)는 외부 저장매체, 외부 기기, 통신망 등을 통해 영상들을 입력받는 수단이고, 프로세서(220)는 입력된 영상 데이터들을 변형하여 학습 영상들을 증가시킨다.The input unit 210 is a means for receiving images through an external storage medium, an external device, a communication network, or the like, and the processor 220 increases the training images by modifying the input image data.

영상 데이터를 변형함에 있어, 프로세서(220)는 전술한 실시예에서 제시한 방법을 이용한다. 저장부(240)는 프로세서(220)가 영상 데이터들을 변형함에 있어 필요한 저장공간을 제공하는 내부 저장매체이다.In transforming the image data, the processor 220 uses the method suggested in the above-described embodiment. The storage unit 240 is an internal storage medium that provides a storage space necessary for the processor 220 to transform image data.

출력부(230)는 프로세서(220)에서 추가된 학습 영상들을 외부 저장매체, 외부 기기, 통신망 등으로 출력한다. The output unit 230 outputs the training images added by the processor 220 to an external storage medium, an external device, a communication network, or the like.

3. 변형예3. Modification

지금까지, 딥러닝 네트워크를 이용하여 고해상도 영상에 대하 고속 프레임 보간을 수행하는 방법과 조명 변화, 페이드 인/아웃, 줌 영상에 강인한 프레임 보간 기법으로 이는 학습시 데이터를 변형하는 방법에 대해, 바람직한 실시예들을 들어 상세히 설명하였다.Until now, a method of performing high-speed frame interpolation on high-resolution images using a deep learning network and a robust frame interpolation technique for lighting changes, fade in/out, and zoom images. This is a preferred implementation for a method of transforming data during training. It has been described in detail with examples.

고해상도 영상을 입력시, 원본 해상도와 동일한 해상도를 가지는 광학 흐름 지도를 생성하여 프레임 보간을 수행하여 많은 양의 메모리를 요구하며, 매우 느린 보간 속도를 갖는 종래 방법과 달리, 본 발명의 실시예에서는 입력 고해상도 영상을 저해상도 영상으로 변환하여 고속으로 광학 흐름 지도를 생성하고 이를 원본 고해상도로 복원하여, 4K 프레임과 같은 고해상도 영상을 고속으로 보간이 가능하게 하였따.When inputting a high-resolution image, an optical flow map having the same resolution as the original resolution is generated and frame interpolation is performed, which requires a large amount of memory. Unlike the conventional method having a very slow interpolation speed, the input in the embodiment of the present invention By converting a high-resolution image to a low-resolution image, an optical flow map was created at high speed, and it was restored to the original high resolution, enabling high-resolution images such as 4K frames to be interpolated at high speed.

또한, 본 발명의 실시예에서는, Fade-in/out, 조명 변화, 줌 영상에 강인하게 학습 데이터 변형을 수행하여 이런 다양한 영상에서도 정확한 광학 흐름 지도를 생성해 낼 수 있게 하였다.In addition, in the embodiment of the present invention, it is possible to generate an accurate optical flow map even in such a variety of images by performing robust training data transformation on fade-in/out, lighting change, and zoom images.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.Meanwhile, it goes without saying that the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing functions of the apparatus and method according to the present embodiment. Further, the technical idea according to various embodiments of the present invention may be implemented in the form of a computer-readable code recorded on a computer-readable recording medium. The computer-readable recording medium can be any data storage device that can be read by a computer and can store data. For example, a computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, a computer-readable code or program stored in a computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention claimed in the claims. In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be understood individually from the technical spirit or prospect of the present invention.

110 : 광학 흐름 예측부
120 : 저해상도 광학 흐름 해상도 향상부
130 : 중간 프레임 생성부
140 : 중간 프레임 해상도 향상부110: optical flow prediction unit
120: low-resolution optical flow resolution enhancement unit
130: intermediate frame generation unit
140: intermediate frame resolution enhancement unit

Claims

Receiving consecutive input images to be used for training of an artificial intelligence model that generates an intermediate frame for frame interpolation;
A first generating step of generating transformed images by changing image values of input input images;
A second generation step of generating an intermediate frame of the transformed images by changing an image value of an intermediate frame of the input input images;
Determining a reference patch of the input images and an intermediate frame of the input images;
Reducing a reference patch of a first input image among input images;
Enlarging a reference patch of a second input image among the input images;
A first adjustment step of adjusting a reference patch of the reduced first input image to an original size;
And a second adjustment step of adjusting the reference patch of the enlarged second input image to its original size.

The method according to claim 1,
The first generation step,
For the first input image among the input images, the first input image is modified by subtracting the adjustment parameter,
For the second input image among the input images, the second input image is changed by adding an adjustment parameter,
The second generation step is,
A data transformation method, characterized in that the intermediate frame is transformed by adding an adjustment parameter that varies according to the time of the intermediate frame.

The method according to claim 2,
The first generation step,
The first input image is modified by adding an adjustment parameter to the first input image among the input images,
The second input image is modified by adding an adjustment parameter to the second input image among the input images,
The second generation step is,
A data transformation method, characterized in that the intermediate frame is transformed by adding an adjustment parameter to the intermediate frame.

The method according to claim 2 or 3,
The adjustment parameters are,
Data transformation method, characterized in that generated as a random value.

The method of claim 4,
The adjustment parameters are,
A data transformation method, characterized in that it is a parameter for adjusting any one of brightness, gamma function, contrast, hue, and saturation.

delete

The method according to claim 1,
The decision step is,
A data transformation method, characterized in that determining the location and size of the reference patch.

The method according to claim 1,
Randomly generating a zoom parameter; further comprising,
The reduction step is,
Reduce the reference patch of the first input image using the zoom parameter,
The magnification step is,
A data transformation method, characterized in that the reference patch of the second input image is enlarged using a zoom parameter.

The method according to claim 1,
The first adjustment step and the second adjustment step,
A data transformation method, characterized in that it is performed when it is determined that the reference patch of the enlarged second input image does not deviate from the second input image.

An input unit that receives continuous input images to be used for training of an artificial intelligence model that generates an intermediate frame for frame interpolation;
A processor for generating transformed images by changing the image values of the input input images, and generating an intermediate frame of the transformed images by changing the image value of the intermediate frames of the input input images,
The processor is
Determine the reference patch of the input images and the intermediate frame of the input images,
Reduce the reference patch of the first input image among the input images,
Enlarging the reference patch of the second input image among the input images,
Adjust the reference patch of the reduced first input image to its original size,
A data transformation system comprising adjusting a reference patch of the enlarged second input image to an original size.

Receiving consecutive input images to be used for training of an artificial intelligence model that generates an intermediate frame for frame interpolation;
A first generating step of generating transformed images by changing image values of input input images;
A second generation step of generating an intermediate frame of the transformed images by changing an image value of an intermediate frame of the input input images;
Training an artificial intelligence model with an intermediate frame between the transformed images generated in the first generating step and the transformed images generated in the second generating step;
Determining a reference patch of the input images and an intermediate frame of the input images;
Reducing a reference patch of a first input image among input images;
Enlarging a reference patch of a second input image among the input images;
A first adjustment step of adjusting a reference patch of the reduced first input image to an original size;
A second adjustment step of adjusting the reference patch of the enlarged second input image to an original size;
Including; training an artificial intelligence model using the reference patch of the first input image adjusted in the first adjustment step, the reference patch of the second input image adjusted in the second adjustment step, and the reference patch of the intermediate frame determined in the determination step; Artificial intelligence model learning method, characterized in that.

The transformed image is generated by changing the image value of the continuous input images to be used for the training of the artificial intelligence model that generates the intermediate frame for frame interpolation, and the transformed image by changing the image value of the intermediate frame of the input input images. A data transformation system for generating an intermediate frame of the data; And
Including; a high-speed frame interpolation system that trains an artificial intelligence model with an intermediate frame between the transformed images generated by the processor and the transformed images; and
The data transformation system,
Determine the reference patch of the input images and the intermediate frame of the input images,
Reduce the reference patch of the first input image among the input images,
Enlarging the reference patch of the second input image among the input images,
Adjust the reference patch of the reduced first input image to its original size,
Adjust the reference patch of the enlarged second input image to its original size,
High-speed frame interpolation system,
An artificial intelligence model learning system, characterized in that the artificial intelligence model is trained using the adjusted reference patch of the first input image, the adjusted reference patch of the second input image, and the determined reference patch of the intermediate frame.