KR20200002036A

KR20200002036A - Optical Flow Estimation for Motion Compensated Prediction in Video Coding

Info

Publication number: KR20200002036A
Application number: KR1020197035698A
Authority: KR
Inventors: 야오우 쑤; 보한 리; 징닝 한
Original assignee: 구글 엘엘씨
Priority date: 2017-08-22
Filing date: 2018-05-10
Publication date: 2020-01-07
Anticipated expiration: 2038-05-10
Also published as: CN118055253A; KR20210109049A; CN110741640B; KR102295520B1; KR102400078B1; EP3673655A1; CN110741640A; WO2019040134A1; JP6905093B2; JP2020522200A

Abstract

비디오 시퀀스에서 현재 프레임의 블록들의 인터 예측을 위해 사용될 수 있는 광흐름 레퍼런스 프레임 부분(예를 들면, 블록 또는 전체 프레임)이 생성된다. 순방향 레퍼런스 프레임과 역방향 레퍼런스 프레임이 현재 프레임의 픽셀들에 대한 각각의 모션 필드를 생성하는 광흐름 추정에 사용된다. 모션 필드들은 레퍼런스 프레임들의 일부 또는 모든 픽셀들을 현재 프레임의 픽셀들로 워핑하는 데 사용된다. 워핑된 레퍼런스 프레임 픽셀들은 블렌딩되어 광흐름 레퍼런스 프레임 부분을 형성한다. 인터 예측은 현재 프레임의 인코딩 또는 디코딩 부분들의 일부로서 수행될 수 있다.A portion of the light flow reference frame (eg, block or entire frame) is generated that can be used for inter prediction of blocks of the current frame in the video sequence. The forward reference frame and the reverse reference frame are used for the light flow estimation to generate respective motion fields for the pixels of the current frame. Motion fields are used to warn some or all of the pixels of the reference frames to the pixels of the current frame. The warped reference frame pixels are blended to form a lightflow reference frame portion. Inter prediction may be performed as part of the encoding or decoding portions of the current frame.

Description

Optical Flow Estimation for Motion Compensated Prediction in Video Coding

[0001] 디지털 비디오 스트림들은 일련의 프레임들 또는 스틸 이미지들(still images)을 사용하여 비디오를 나타낼 수 있다. 디지털 비디오는 예를 들면, 화상 회의, 고해상도 비디오 엔터테인먼트, 비디오 광고들, 또는 사용자 생성 비디오들의 공유를 포함한 다양한 적용들에 사용될 수 있다. 디지털 비디오 스트림은 대량의 데이터를 포함할 수 있고 비디오 데이터의 처리, 전송, 또는 저장을 위해 컴퓨팅 디바이스의 상당량의 컴퓨팅 또는 통신 자원들을 소비할 수 있다. 비디오 스트림들에서 데이터의 양을 저감하기 위해 압축 및 다른 인코딩 기술들을 포함한 다양한 접근법들이 제안되고 있다.[0001] Digital video streams can represent video using a series of frames or still images. Digital video can be used in a variety of applications, including, for example, video conferencing, high resolution video entertainment, video advertisements, or sharing of user generated videos. Digital video streams may contain large amounts of data and may consume a significant amount of computing or communication resources of a computing device for processing, transmitting, or storing video data. Various approaches have been proposed, including compression and other encoding techniques, to reduce the amount of data in video streams.

[0002] 압축을 위한 한 가지 기술은 인코딩될 현재 블록에 대응하는 예측 블록을 생성하기 위해 레퍼런스 프레임을 사용한다. 인코딩된 데이터의 양을 저감하기 위해 현재 블록 자체의 값들 대신에 예측 블록과 현재 블록 사이의 차이들이 인코딩될 수 있다.[0002] One technique for compression uses a reference frame to generate a predictive block corresponding to the current block to be encoded. The differences between the prediction block and the current block may be encoded instead of the values of the current block itself to reduce the amount of encoded data.

[0003] 본 발명은 일반적으로 비디오 데이터를 인코딩 및 디코딩하는 것에 관한 것이며, 보다 세부적으로는 비디오 압축에서 모션 보상 예측을 위해 블록 기반의 광흐름 추정(optical flow estimation)을 이용하는 것에 관한 것이다. 비디오 압축에서 모션 보상 예측을 위해 병치된 레퍼런스 프레임을 보간할 수 있는 프레임 레벨 기반의 광흐름 추정도 또한 기재되어 있다.[0003] FIELD OF THE INVENTION The present invention generally relates to encoding and decoding video data, and more particularly to the use of block-based optical flow estimation for motion compensation prediction in video compression. Frame level based light flow estimation is also described, which can interpolate collocated reference frames for motion compensated prediction in video compression.

[0004] 본 발명은 인코딩 및 디코딩 방법들 및 장치를 기술한다. 본 발명의 일 구현예에 따른 방법은 예측될 제1 프레임의 제1 프레임 부분을 결정하는 단계 ― 제1 프레임은 비디오 시퀀스에 있음 ― , 제1 프레임의 순방향 인터 예측(forward inter prediction)을 위해 비디오 시퀀스로부터 제1 레퍼런스 프레임(reference frame)을 결정하는 단계, 제1 프레임의 역방향 인터 예측을 위해 비디오 시퀀스로부터 제2 레퍼런스 프레임을 결정하는 단계, 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임을 사용하여 광흐름 추정(optical flow estimation)을 수행함으로써, 제1 프레임 부분의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 생성하는 단계, 및 광흐름 레퍼런스 프레임 부분을 사용하여 제1 프레임 부분에 대한 예측 프로세스를 수행하는 단계를 포함한다. 제1 프레임 부분 및 광흐름 레퍼런스 프레임 부분은, 예를 들면, 블록 또는 전체 프레임일 수 있다.[0004] The present invention describes encoding and decoding methods and apparatus. The method according to an embodiment of the present invention comprises the steps of determining a first frame portion of a first frame to be predicted, the first frame being in a video sequence, video for forward inter prediction of the first frame. Determining a first reference frame from the sequence, determining a second reference frame from the video sequence for reverse inter prediction of the first frame, light flow using the first reference frame and the second reference frame By performing optical flow estimation, generating a light flow reference frame portion for inter prediction of the first frame portion, and performing a prediction process for the first frame portion using the light flow reference frame portion. It includes. The first frame portion and the light flow reference frame portion may be, for example, blocks or entire frames.

[0005] 본 발명의 일 구현예에 따른 장치는 비일시적 저장 매체 또는 메모리 및 프로세서를 포함한다. 매체는 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하고, 방법은 비디오 시퀀스에서 예측될 제1 프레임을 결정하는 단계, 및 제1 프레임의 순방향 인터 예측을 위한 제1 레퍼런스 프레임의 이용 가능성 및 제1 프레임의 역방향 인터 예측을 위한 제2 레퍼런스 프레임의 이용 가능성을 결정하는 단계를 포함한다. 방법은 또한, 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임 양자 모두의 이용 가능성을 결정하는 것에 응답하여, 제1 레퍼런스 프레임과 제2 레퍼런스 프레임을 광흐름 추정 프로세스에 대한 입력으로 사용하여, 제1 프레임 부분의 픽셀들에 대한 각각의 모션 필드를 생성하는 단계, 제1 워핑된 레퍼런스 프레임 부분을 형성하기 위해, 모션 필드들을 사용하여 제1 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계 ― 제1 레퍼런스 프레임 부분은 제1 프레임 부분의 픽셀들과 병치된 제1 레퍼런스 프레임의 픽셀들을 포함함 ― , 제2 워핑된 레퍼런스 프레임 부분을 형성하기 위해, 모션 필드들을 사용하여 제2 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계 ― 제2 레퍼런스 프레임 부분은 제1 프레임 부분의 픽셀들과 병치된 제2 레퍼런스 프레임의 픽셀들을 포함함 ― , 및 제1 프레임의 적어도 하나의 블록의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 형성하기 위해, 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분을 블렌딩하는 단계를 포함한다. [0005] An apparatus according to an embodiment of the present invention includes a non-transitory storage medium or memory and a processor. The medium includes instructions executable by a processor to perform a method, the method comprising determining a first frame to be predicted in a video sequence, and availability of the first reference frame for forward inter prediction of the first frame; Determining availability of a second reference frame for reverse inter prediction of the first frame. The method also uses, in response to determining the availability of both the first reference frame and the second reference frame, using the first reference frame and the second reference frame as inputs to the light flow estimation process, the first frame portion. Generating respective motion fields for the pixels of, warping the first reference frame portion to the first frame portion using the motion fields to form a first warped reference frame portion, the first reference frame The portion includes pixels of a first reference frame in parallel with the pixels of the first frame portion—using the motion fields to form a second reference frame portion to form a second warped reference frame portion. Warping to the second reference frame portion wherein the second reference frame portion is in parallel with the pixels of the first frame portion. Comprising pixels of the frame, and blending the first warped reference frame portion and the second warped reference frame portion to form a lightflow reference frame portion for inter prediction of at least one block of the first frame. It includes a step.

[0006] 본 발명의 일 구현예에 따른 다른 장치는 또한 비일시적 저장 매체 또는 메모리 및 프로세서를 포함한다. 매체는 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하고, 방법은, 광흐름 추정을 위해 제1 처리 레벨에서 제1 프레임 부분의 픽셀들에 대한 모션 필드들을 초기화함으로써 ― 제1 처리 레벨은 제1 프레임 부분 내의 다운스케일링된 모션을 나타내고 다수의 레벨들 중 하나의 레벨을 포함함 ― , 비디오 시퀀스로부터의 제1 레퍼런스 프레임 부분 및 비디오 시퀀스의 제2 레퍼런스 프레임 부분을 사용하여 비디오 시퀀스의 제1 프레임의 블록의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 생성하는 단계, 및 다수의 레벨들 중 각각의 레벨에 대해, 제1 워핑된 레퍼런스 프레임 부분을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 제2 워핑된 레퍼런스 프레임 부분을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 광흐름 추정을 사용하여 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분 사이의 모션 필드들을 추정하는 단계, 및 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분 사이의 모션 필드들을 사용하여 제1 프레임 부분의 픽셀들에 대한 모션 필드들을 업데이트하는 단계를 포함한다. 방법은 또한, 다수의 레벨들 중 최종 레벨에 대해: 최종의 제1 워핑된 레퍼런스 프레임 부분을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제1 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 최종의 제2 워핑된 레퍼런스 프레임 부분을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제2 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 및 광흐름 레퍼런스 프레임 부분을 형성하기 위해 최종의 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분을 블렌딩하는 단계를 포함한다. [0006] Another apparatus according to one embodiment of the invention also includes a non-transitory storage medium or memory and a processor. The medium includes instructions executable by a processor to perform a method, the method comprising: initializing motion fields for pixels of a first frame portion at a first processing level for light flow estimation, the first processing level being Indicating a downscaled motion within the first frame portion and including one of a plurality of levels—a first reference frame portion from the video sequence and a first reference frame portion of the video sequence Generating a lightflow reference frame portion for inter prediction of a block of frames, and for each of the plurality of levels, a first reference frame using motion fields to form a first warped reference frame portion; Warping the portion to the first frame portion, forming a second warped reference frame portion Warping the second reference frame portion to the first frame portion using motion fields to estimate motion fields between the first warped reference frame portion and the second warped reference frame portion using light flow estimation. And updating the motion fields for the pixels of the first frame portion using the motion fields between the first warped reference frame portion and the second warped reference frame portion. The method also includes: for the last of the plurality of levels: warping the first reference frame portion to the first frame portion using the updated motion fields to form the final first warped reference frame portion, the final Warping the second reference frame portion to the first frame portion using the updated motion fields to form a second warped reference frame portion of the second warped reference frame portion, and a final first warped to form the lightflow reference frame portion. Blending the reference frame portion and the second warped reference frame portion.

[0007] 본 발명의 일 구현예에 따른 다른 장치는 또한 비일시적 저장 매체 또는 메모리 및 프로세서를 포함한다. 매체는 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하고, 방법은, 예측될 제1 프레임의 제1 프레임 부분을 결정하는 단계 ― 제1 프레임은 비디오 시퀀스 내에 있음 ― , 제1 프레임의 순방향 인터 예측을 위해 비디오 시퀀스로부터 제1 레퍼런스 프레임을 결정하는 단계, 제1 프레임의 역방향 인터 예측을 위해 비디오 시퀀스로부터 제2 레퍼런스 프레임을 결정하는 단계, 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임을 사용하여 광흐름 추정을 수행함으로써, 제1 프레임 부분의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 생성하는 단계, 및 광흐름 레퍼런스 프레임 부분을 사용하여 제1 프레임 부분에 대한 예측 프로세스를 수행하는 단계를 포함한다.[0007] Another apparatus according to one embodiment of the invention also includes a non-transitory storage medium or memory and a processor. The medium includes instructions executable by a processor to perform a method, the method comprising: determining a first frame portion of a first frame to be predicted, wherein the first frame is in a video sequence; Determining a first reference frame from the video sequence for inter prediction, determining a second reference frame from the video sequence for reverse inter prediction of the first frame, using the first reference frame and the second reference frame Generating a light flow reference frame portion for inter prediction of the first frame portion by performing flow estimation, and performing a prediction process for the first frame portion using the light flow reference frame portion.

[0008] 본 발명의 이들 및 다른 양태들이 이하의 실시예들의 상세한 설명, 첨부된 청구항들, 및 수반된 도면들에 개시되어 있다.[0008] These and other aspects of the invention are disclosed in the following detailed description of the embodiments, the appended claims, and the accompanying drawings.

[0009] 본 명세서의 설명은 아래에 기재된 첨부 도면들을 참조하며, 달리 언급되지 않는 한 여러 도면들에 걸쳐 동일 참조 번호는 동일 부분을 지칭한다.
[0010] 도 1은 비디오 인코딩 및 디코딩 시스템의 개략도이다.
[0011] 도 2는 송신 스테이션 또는 수신 스테이션을 구현할 수 있는 컴퓨팅 디바이스의 일례의 블록도이다.
[0012] 도 3은 인코딩되고 후속적으로 디코딩될 전형적인 비디오 스트림의 다이어그램이다.
[0013] 도 4는 본 발명의 구현예들에 따른 인코더의 블록도이다.
[0014] 도 5는 본 발명의 구현예들에 따른 디코더의 블록도이다.
[0015] 도 6은 레퍼런스 프레임 버퍼의 일례의 블록도이다.
[0016] 도 7은 비디오 시퀀스의 디스플레이 순서의 프레임들의 그룹의 다이어그램이다.
[0017] 도 8은 도 7의 프레임들의 그룹에 대한 코딩 순서의 일례의 다이어그램이다.
[0018] 도 9는 본 명세서의 교시들에 따른 모션 필드(motion field)의 선형 투영을 설명하는데 사용되는 다이어그램이다.
[0019] 도 10은 광흐름 추정을 사용하여 생성된 레퍼런스 프레임의 적어도 일부를 사용하는 비디오 프레임의 모션 보상 예측을 위한 프로세스의 흐름도이다.
[0020] 도 11은 광흐름 레퍼런스 프레임 부분을 생성하기 위한 프로세스의 흐름도이다.
[0021] 도 12는 광흐름 레퍼런스 프레임 부분을 생성하기 위한 다른 프로세스의 흐름도이다.
[0022] 도 13은 도 11과 도 12의 프로세스들을 예시하는 다이어그램이다.
[0023] 도 14는 객체 폐색(object occlusion)을 예시하는 다이어그램이다.
[0024] 도 15는 디코더를 최적화하기 위한 기술을 도시하는 다이어그램.The description herein refers to the accompanying drawings described below, wherein like reference numerals refer to like parts throughout the various figures unless otherwise noted.
1 is a schematic diagram of a video encoding and decoding system.
2 is a block diagram of an example of a computing device that may implement a transmitting station or a receiving station.
3 is a diagram of a typical video stream to be encoded and subsequently decoded.
4 is a block diagram of an encoder in accordance with implementations of the invention.
5 is a block diagram of a decoder in accordance with implementations of the invention.
6 is a block diagram of an example of a reference frame buffer.
FIG. 7 is a diagram of a group of frames of a display order of a video sequence. FIG.
FIG. 8 is a diagram of an example of coding order for a group of frames of FIG. 7.
FIG. 9 is a diagram used to describe the linear projection of a motion field in accordance with the teachings herein.
10 is a flowchart of a process for motion compensated prediction of a video frame using at least a portion of a reference frame generated using light flow estimation.
11 is a flowchart of a process for generating a lightflow reference frame portion.
12 is a flowchart of another process for generating a lightflow reference frame portion.
FIG. 13 is a diagram illustrating the processes of FIGS. 11 and 12.
FIG. 14 is a diagram illustrating object occlusion. FIG.
FIG. 15 is a diagram illustrating a technique for optimizing a decoder. FIG.

[0025] 비디오 스트림은 비디오 스트림을 전송 또는 저장하는데 필요한 대역폭을 감소시키기 위해 다양한 기술들에 의해 압축될 수 있다. 비디오 스트림은 압축을 수반하는 비트스트림으로 인코딩될 수 있고, 이는 그리고 나서 비디오 스트림을 디코딩 또는 압축 해제하여 이를 시청(viewing) 또는 추가 처리를 위해 준비할 수 있는 디코더로 전송된다. 비디오 스트림의 압축은 종종 공간 및/또는 모션 보상 예측을 통한 비디오 신호들의 공간 및 시간 상관(spatial and temporal correlation)을 이용한다. 인터 예측(inter-prediction)은 예를 들면, 이전에 인코딩 및 디코딩된 픽셀들을 사용하여 인코딩될 현재 블록과 유사한 블록(예측 블록이라고도 함)을 생성하기 위해 하나 이상의 모션 벡터들을 사용한다. 모션 벡터(들) 및 두 블록들 사이의 차(差)를 인코딩함으로써, 인코딩된 신호를 수신하는 디코더는 현재 블록을 재생성할 수 있다. 인터 예측은 모션 보상 예측으로도 지칭될 수 있다.[0025] The video stream may be compressed by various techniques to reduce the bandwidth required to transmit or store the video stream. The video stream can be encoded into a bitstream that involves compression, which is then sent to a decoder that can decode or decompress the video stream and prepare it for viewing or further processing. Compression of video streams often uses spatial and temporal correlation of video signals through spatial and / or motion compensated prediction. Inter-prediction uses one or more motion vectors to generate, for example, a block (also called a prediction block) similar to the current block to be encoded using previously encoded and decoded pixels. By encoding the motion vector (s) and the difference between the two blocks, the decoder receiving the encoded signal can regenerate the current block. Inter prediction may also be referred to as motion compensated prediction.

[0026] 인터 예측 프로세스에서 예측 블록을 생성하는데 사용된 각각의 모션 벡터는 현재 프레임 이외의 프레임, 즉 레퍼런스 프레임을 참조할 수 있다. 레퍼런스 프레임들은 비디오 스트림의 시퀀스에서 현재 프레임의 앞 또는 뒤에 위치될 수 있으며, 레퍼런스 프레임으로 사용되기 전에 재구성된 프레임들일 수 있다. 몇몇 경우들에서는, 비디오 시퀀스의 현재 프레임의 블록들을 인코딩 또는 디코딩하는데 사용되는 3 개의 레퍼런스 프레임들이 있을 수 있다. 하나는 골든 프레임으로 지칭될 수 있는 프레임이다. 다른 하나는 가장 최근에 인코딩 또는 디코딩된 프레임이다. 마지막 것은 시퀀스에서는 하나 이상의 프레임들 이전에 인코딩 또는 디코딩되지만 출력 디스플레이 순서에서는 이들 프레임들 이후에 디스플레이되는 대체 레퍼런스 프레임이다. 이러한 방식으로, 대체 레퍼런스 프레임은 역방향 예측에 사용 가능한 레퍼런스 프레임이다. 하나 이상의 순방향 및/또는 역방향 레퍼런스 프레임들이 블록을 인코딩 또는 디코딩하는데 사용될 수 있다. 현재 프레임 내에서 블록을 인코딩 또는 디코딩하는데 사용될 때 레퍼런스 프레임의 유효성은 결과적인 신호대 잡음비 또는 레이트-왜곡(rate-distortion)의 다른 측정치들에 기초하여 측정될 수 있다.[0026] Each motion vector used to generate the predictive block in the inter prediction process may refer to a frame other than the current frame, that is, a reference frame. The reference frames may be located before or after the current frame in the sequence of the video stream, and may be frames reconstructed before being used as the reference frame. In some cases, there may be three reference frames used to encode or decode blocks of the current frame of the video sequence. One is a frame, which may be referred to as a golden frame. The other is the most recently encoded or decoded frame. The last is an alternate reference frame that is encoded or decoded before one or more frames in the sequence but displayed after these frames in the output display order. In this way, the replacement reference frame is a reference frame usable for backward prediction. One or more forward and / or backward reference frames may be used to encode or decode the block. When used to encode or decode a block within the current frame, the validity of the reference frame may be measured based on the resulting signal-to-noise ratio or other measurements of rate-distortion.

[0027] 이 기술에서는, 예측 블록들을 형성하는 픽셀들이 이용 가능한 레퍼런스 프레임들 중 하나 이상으로부터 직접 획득된다. 레퍼런스 픽셀 블록들 또는 그 선형 조합들은 현재 프레임에서 주어진 코딩 블록의 예측에 사용된다. 이 직접적인 블록 기반의 예측은 레퍼런스 프레임들로부터 이용 가능한 진정한 모션 활동을 캡처하지 않는다. 이러한 이유로, 모션 보상 예측 정확도가 저하될 수 있다.[0027] In this technique, the pixels that form the predictive blocks are obtained directly from one or more of the available reference frames. Reference pixel blocks or their linear combinations are used for prediction of a given coding block in the current frame. This direct block based prediction does not capture the true motion activity available from the reference frames. For this reason, motion compensation prediction accuracy may be degraded.

[0028] 이용 가능한 양방향 레퍼런스 프레임들(예를 들면, 하나 이상의 순방향 및 하나 이상의 역방향 레퍼런스 프레임들)로부터의 모션 정보를 보다 온전히 활용하기 위해, 본 명세서의 교시들의 구현예들은 비디오 신호에서의 진정한 모션 활동들을 추정하기 위해 광흐름에 의해 계산된 픽셀당 모션 필드(per-pixel motion field)를 사용하는 현재 코딩 프레임 부분들과 병치된 레퍼런스 프레임 부분들을 기술한다. 레퍼런스 프레임들로부터 직접 결정되는 종래의 블록 기반의 모션 보상 예측의 능력을 넘어서는 복잡한 비병진(non-translational) 모션 활동의 추적을 가능케 하는 레퍼런스 프레임 부분들이 보간된다. 이러한 레퍼런스 프레임 부분들의 사용은 예측 품질을 향상시킬 수 있다. 본 명세서에 사용되는, 프레임 부분은 블록, 슬라이스, 또는 전체 프레임과 같은 프레임 중 일부 또는 모두를 지칭한다. 하나의 프레임의 프레임 부분은 이 프레임 부분과 다른 프레임의 프레임 부분이 동일한 치수들을 갖고 각각의 프레임의 치수들 내에서 동일한 픽셀 위치들에 있으면 다른 프레임의 프레임 부분과 병치된다.[0028] To more fully utilize the motion information from the available bidirectional reference frames (eg, one or more forward and one or more reverse reference frames), implementations of the teachings herein estimates true motion activities in a video signal. To illustrate the reference frame portions collocated with current coding frame portions using a per-pixel motion field calculated by the light flow. Reference frame portions are interpolated that allow tracking of complex non-translational motion activity beyond the ability of conventional block-based motion compensated prediction determined directly from reference frames. Use of such reference frame portions can improve prediction quality. As used herein, a frame portion refers to some or all of a frame, such as a block, slice, or entire frame. The frame portion of one frame is juxtaposed with the frame portion of another frame if this frame portion and the frame portion of another frame have the same dimensions and are at the same pixel positions within the dimensions of each frame.

[0029] 비디오 압축 및 재구성에 사용하기 위해 레퍼런스 프레임 부분들을 보간하기 위해 광흐름 추정을 사용하는 것에 대한 추가 상세 내용은 본 명세서의 교시들이 구현될 수 있는 시스템에 대한 최초 참조와 함께 본 명세서에서 설명된다.[0029] Further details on using light flow estimation to interpolate reference frame portions for use in video compression and reconstruction are described herein with an initial reference to a system in which the teachings herein can be implemented.

[0030] 도 1은 비디오 인코딩 및 디코딩 시스템(100)의 개략도이다. 송신 스테이션(102)은 예를 들면, 도 2에 기술된 것과 같은 하드웨어의 내부 구성을 갖는 컴퓨터일 수 있다. 하지만, 송신 스테이션(102)의 다른 적절한 구현예들도 가능하다. 예를 들어, 송신 스테이션(102)의 처리는 다수의 디바이스들 간에 분산될 수 있다.[0030] 1 is a schematic diagram of a video encoding and decoding system 100. The transmitting station 102 may be, for example, a computer having an internal configuration of hardware as described in FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 may be distributed among multiple devices.

[0031] 네트워크(104)는 비디오 스트림의 인코딩 및 디코딩을 위해 송신 스테이션(102)과 수신 스테이션(106)을 연결할 수 있다. 구체적으로, 비디오 스트림은 송신 스테이션(102)에서 인코딩될 수 있고 인코딩된 비디오 스트림은 수신 스테이션(106)에서 디코딩될 수 있다. 네트워크(104)는 예를 들면, 인터넷일 수 있다. 네트워크(104)는 또한 근거리 통신망(local area network: LAN), 광역 통신망(wide area network: WAN), 가상 사설망(virtual private network: VPN), 셀룰러 전화 네트워크, 또는 이 예에서는 송신 스테이션(102)으로부터 수신 스테이션(106)으로 비디오 스트림을 전송하는 임의의 다른 수단일 수도 있다.[0031] The network 104 may connect the transmitting station 102 and the receiving station 106 for encoding and decoding of the video stream. In particular, the video stream may be encoded at the transmitting station 102 and the encoded video stream may be decoded at the receiving station 106. Network 104 may be, for example, the Internet. The network 104 may also be from a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or in this example from the transmitting station 102. It may be any other means for transmitting the video stream to the receiving station 106.

[0032] 수신 스테이션(106)은 일례에서, 도 2에 기술된 것과 같은 하드웨어의 내부 구성을 갖는 컴퓨터일 수 있다. 하지만, 수신 스테이션(106)의 다른 적절한 구현예들도 가능하다. 예를 들어, 수신 스테이션(106)의 처리는 다수의 디바이스들 간에 분산될 수 있다.[0032] Receiving station 106 may be, in one example, a computer having an internal configuration of hardware as described in FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of receiving station 106 may be distributed among multiple devices.

[0033] 비디오 인코딩 및 디코딩 시스템(100)의 다른 구현예들도 가능하다. 예를 들어, 하나의 구현예는 네트워크(104)를 생략할 수 있다. 다른 구현예에서, 비디오 스트림은 인코딩된 후 나중에 수신 스테이션(106) 또는 비일시적 저장 매체 또는 메모리를 갖는 임의의 다른 디바이스로 전송하기 위해 저장될 수 있다. 일 구현예에서, 수신 스테이션(106)은 (예를 들면, 네트워크(104), 컴퓨터 버스, 및/또는 소정의 통신 경로를 통해) 인코딩된 비디오 스트림을 수신하고 나중에 디코딩하기 위해 비디오 스트림을 저장한다. 예시적인 구현예에서, 네트워크(104)를 통해 인코딩된 비디오의 전송을 위해 실시간 전송 프로토콜(RTP)이 사용된다. 다른 구현예에서는, RTP 이외의 전송 프로토콜, 예를 들면, HTTP(Hypertext Transfer Protocol) 기반의 비디오 스트리밍 프로토콜이 사용될 수도 있다.[0033] Other implementations of the video encoding and decoding system 100 are possible. For example, one implementation may omit the network 104. In another implementation, the video stream may be encoded and stored for later transmission to the receiving station 106 or any other device having a non-transitory storage medium or memory. In one implementation, the receiving station 106 receives the encoded video stream (eg, via the network 104, the computer bus, and / or a predetermined communication path) and stores the video stream for later decoding. . In an example implementation, Real Time Transmission Protocol (RTP) is used for the transmission of encoded video over network 104. In other implementations, a transport protocol other than RTP may be used, for example, a video streaming protocol based on Hypertext Transfer Protocol (HTTP).

[0034] 화상 회의 시스템에 사용될 때, 예를 들면, 송신 스테이션(102) 및/또는 수신 스테이션(106)은 후술하는 바와 같이 비디오 스트림을 인코딩 및 디코딩하는 능력을 포함할 수 있다. 예를 들어, 수신 스테이션(106)은 화상 회의 서버(예를 들면, 송신 스테이션(102))로부터 인코딩된 비디오 비트스트림을 수신하여 디코딩 및 시청하며 또한 그 자신의 비디오 비트스트림을 인코딩하여 다른 참가자들에 의한 디코딩 및 시청을 위해 화상 회의 서버로 전송하는 화상 회의 참가자일 수 있다.[0034] When used in a video conferencing system, for example, the transmitting station 102 and / or the receiving station 106 may include the ability to encode and decode the video stream as described below. For example, the receiving station 106 receives, decodes and views the encoded video bitstream from the video conferencing server (eg, the transmitting station 102) and also encodes its own video bitstream to other participants. The video conferencing participant may be sent to a video conferencing server for decoding and viewing.

[0035] 도 2는 송신 스테이션 또는 수신 스테이션을 구현할 수 있는 컴퓨팅 디바이스(200)의 일례의 블록도이다. 예를 들면, 컴퓨팅 디바이스(200)는 도 1의 송신 스테이션(102)과 수신 스테이션(106) 중 어느 하나 또는 양자 모두를 구현할 수 있다. 컴퓨팅 디바이스(200)는 다수의 컴퓨팅 디바이스들을 포함하는 컴퓨팅 시스템의 형태, 또는 하나의 컴퓨팅 디바이스의 형태, 예를 들면, 휴대 전화, 태블릿 컴퓨터, 랩탑 컴퓨터, 노트북 컴퓨터, 데스크탑 컴퓨터 등일 수 있다.[0035] 2 is a block diagram of an example of a computing device 200 that may implement a transmitting station or a receiving station. For example, computing device 200 may implement either or both of transmit station 102 and receive station 106 of FIG. 1. Computing device 200 may be in the form of a computing system including multiple computing devices, or in the form of one computing device, for example, a mobile phone, tablet computer, laptop computer, notebook computer, desktop computer, or the like.

[0036] 컴퓨팅 디바이스(200) 내의 CPU(202)는 중앙 처리 장치일 수 있다. 대안적으로, CPU(202)는 현재 존재하거나 앞으로 개발되는 정보를 조작 또는 처리할 수 있는 임의의 다른 유형의 디바이스 또는 다수의 디바이스들일 수 있다. 개시된 구현예들은 도시된 바와 같이 하나의 프로세서, 예를 들면, CPU(202)로 실시될 수도 있으나, 2 개 이상의 프로세서를 사용하면 속도 및 효율에 있어서 이점이 달성될 수 있다.[0036] CPU 202 in computing device 200 may be a central processing unit. Alternatively, the CPU 202 may be any other type of device or multiple devices capable of manipulating or processing information that is present or developed in the future. The disclosed embodiments may be implemented with one processor, for example, the CPU 202, as shown, but using two or more processors may benefit from speed and efficiency.

[0037] 컴퓨팅 디바이스(200)의 메모리(204)는 구현예에서 읽기 전용 메모리(ROM) 디바이스 또는 랜덤 액세스 메모리(RAM) 디바이스일 수 있다. 임의의 다른 적합한 유형의 스토리지 디바이스 또는 비일시적 저장 매체가 메모리(204)로서 사용될 수도 있다. 메모리(204)는 버스(212)를 사용하여 CPU(202)에 의해 액세스되는 코드 및 데이터(206)를 포함할 수 있다. 메모리(204)는 운영 체제(OS)(208)와 애플리케이션 프로그램들(210)을 더 포함할 수 있으며, 애플리케이션 프로그램들(210)은 CPU(202)가 본 명세서에 기술된 방법들을 수행할 수 있게 하는 적어도 하나의 프로그램을 포함한다. 예를 들면, 애플리케이션 프로그램들(210)은 애플리케이션들 1 내지 N을 포함할 수 있으며, 애플리케이션들 1 내지 N은 본 명세서에 기술된 방법들을 수행하는 비디오 코딩 애플리케이션을 더 포함한다. 컴퓨팅 디바이스(200)는 예를 들면, 모바일 컴퓨팅 디바이스와 함께 사용되는 메모리 카드일 수 있는 2 차 스토리지(214)를 또한 포함할 수 있다. 비디오 통신 세션들은 상당한 양의 정보를 포함할 수 있기 때문에, 이들은 2 차 스토리지(214)에 전체적으로 또는 부분적으로 저장될 수 있고 처리를 위해 필요에 따라 메모리(204)에 로딩될 수 있다.[0037] The memory 204 of the computing device 200 may be a read-only memory (ROM) device or a random access memory (RAM) device in an implementation. Any other suitable type of storage device or non-transitory storage medium may be used as the memory 204. Memory 204 may include code and data 206 that are accessed by CPU 202 using bus 212. The memory 204 may further include an operating system (OS) 208 and application programs 210, which may allow the CPU 202 to perform the methods described herein. It includes at least one program. For example, application programs 210 may include applications 1 through N, which further include a video coding application that performs the methods described herein. Computing device 200 may also include secondary storage 214, which may be, for example, a memory card used with a mobile computing device. Because video communication sessions can contain a significant amount of information, they can be stored in whole or in part in secondary storage 214 and loaded into memory 204 as needed for processing.

[0038] 컴퓨팅 디바이스(200)는 또한 디스플레이(218)와 같은, 하나 이상의 출력 디바이스들을 포함할 수 있다. 디스플레이(218)는 일례에서, 터치 입력들을 감지하도록 작동 가능한 터치 감지 요소와 디스플레이를 결합한 터치 감지 디스플레이일 수 있다. 디스플레이(218)는 버스(212)를 통해 CPU(202)에 결합될 수 있다. 사용자가 컴퓨팅 디바이스(200)를 프로그래밍하거나 달리 사용할 수 있게 하는 다른 출력 디바이스들이 디스플레이(218)에 추가적으로 또는 대체로서 제공될 수 있다. 출력 디바이스가 디스플레이이거나 디스플레이를 포함하는 경우, 디스플레이는 액정 디스플레이(LCD), 음극선관(CRT) 디스플레이, 또는 유기 LED(OLED) 디스플레이와 같은 발광 다이오드(LED) 디스플레이에 의한 것을 포함하여, 다양한 방식들로 구현될 수 있다.[0038] Computing device 200 may also include one or more output devices, such as display 218. Display 218 may be, in one example, a touch sensitive display that combines the display with a touch sensitive element operable to sense touch inputs. Display 218 may be coupled to CPU 202 via bus 212. Other output devices may be provided in addition to or as an alternative to the display 218 to allow a user to program or otherwise use the computing device 200. If the output device is a display or includes a display, the display may be in various ways, including by a light emitting diode (LED) display such as a liquid crystal display (LCD), a cathode ray tube (CRT) display, or an organic LED (OLED) display. It can be implemented as.

[0039] 컴퓨팅 디바이스(200)는 이미지 감지 디바이스(220), 예를 들면 카메라, 또는 컴퓨팅 디바이스(200)를 조작하고 있는 사용자의 이미지와 같은 이미지를 감지할 수 있는 현재 존재하거나 앞으로 개발되는 임의의 다른 이미지 감지 디바이스(220)를 또한 포함하거나 이와 통신할 수 있다. 이미지 감지 디바이스(220)는 컴퓨팅 디바이스(200)를 조작하는 사용자 쪽으로 지향되도록 위치될 수 있다. 일례에서, 이미지 감지 디바이스(220)의 위치와 광축(optical axis)은 시야가 디스플레이(218)에 바로 인접하고 디스플레이(218)가 보이는 영역을 포함하도록 구성될 수 있다.[0039] Computing device 200 is an image sensing device 220, such as a camera, or any other image sensing currently or future developed that can sense an image, such as an image of a user who is operating computing device 200. Device 220 may also include or be in communication with it. The image sensing device 220 may be positioned to be directed toward the user who operates the computing device 200. In one example, the position and optical axis of the image sensing device 220 can be configured to include an area where the field of view is immediately adjacent to the display 218 and the display 218 is visible.

[0040] 컴퓨팅 디바이스(200)는 사운드 감지 디바이스(222), 예를 들면 마이크로폰, 또는 컴퓨팅 디바이스(200) 근처의 사운드를 감지할 수 있는 현재 존재하거나 앞으로 개발되는 임의의 다른 사운드 감지 디바이스를 또한 포함하거나 이와 통신할 수 있다. 사운드 감지 디바이스(222)는 컴퓨팅 디바이스(200)를 조작하는 사용자 쪽으로 지향되도록 위치될 수 있고, 사용자가 컴퓨팅 디바이스(200)를 조작하고 있는 동안 사용자가 내는 사운드, 예를 들면 음성 또는 다른 발성들(utterances)을 수신하도록 구성될 수 있다.[0040] Computing device 200 also includes or is in communication with a sound sensing device 222, such as a microphone, or any other sound sensing device presently or developed in the future capable of sensing sound near computing device 200. can do. The sound sensing device 222 may be positioned to be directed toward the user operating the computing device 200, and the sound produced by the user, such as voice or other utterances, while the user is operating the computing device 200. utterances).

[0041] 도 2는 컴퓨팅 디바이스(200)의 CPU(202) 및 메모리(204)가 하나의 유닛으로 통합된 것으로 도시하고 있으나, 다른 구성들이 이용될 수도 있다. CPU(202)의 동작들은 직접 또는 로컬 영역 또는 다른 네트워크에 걸쳐 결합될 수 있는 다수의 기계(개별 기계들은 하나 이상의 프로세서들을 가질 수 있음)에 걸쳐서 분산될 수 있다. 메모리(204)는 네트워크 기반 메모리 또는 컴퓨팅 디바이스(200)의 동작들을 수행하는 다수의 기계들의 메모리와 같이 다수의 기계들에 걸쳐 분산될 수 있다. 여기서는 하나의 버스로 도시되어 있으나, 컴퓨팅 디바이스(200)의 버스(212)는 다수의 버스들로 구성될 수도 있다. 또한, 2 차 스토리지(214)는 컴퓨팅 디바이스(200)의 다른 컴포넌트들에 직접 결합될 수 있거나 네트워크를 통해 액세스될 수 있고 메모리 카드와 같은 통합 유닛 또는 다수의 메모리 카드들과 같은 다수의 유닛들을 포함할 수 있다. 컴퓨팅 디바이스(200)는 그래서 다양한 구성들로 구현될 수 있다.[0041] 2 illustrates the CPU 202 and memory 204 of the computing device 200 integrated into one unit, however other configurations may be used. The operations of the CPU 202 may be distributed over multiple machines (individual machines may have one or more processors) that may be coupled directly or across a local area or other network. Memory 204 may be distributed across multiple machines, such as network-based memory or memory of multiple machines performing the operations of computing device 200. Although shown here as one bus, the bus 212 of the computing device 200 may consist of multiple buses. In addition, secondary storage 214 can be directly coupled to other components of computing device 200 or can be accessed via a network and includes an integrated unit such as a memory card or multiple units such as multiple memory cards. can do. Computing device 200 may thus be implemented in a variety of configurations.

[0042] 도 3은 인코딩되고 후속적으로 디코딩될 비디오 스트림(300)의 일례의 다이어그램이다. 비디오 스트림(300)은 비디오 시퀀스(302)를 포함한다. 다음 레벨에서, 비디오 시퀀스(302)는 다수의 인접 프레임들(304)을 포함한다. 3 개의 프레임들이 인접 프레임들(304)로서 도시되어 있으나, 비디오 시퀀스(302)는 임의의 개수의 인접 프레임들(304)을 포함할 수 있다. 그 후, 인접 프레임들(304)은 개별 프레임들, 예를 들면 프레임(306)으로 더욱 세분될 수 있다. 다음 레벨에서, 프레임(306)은 일련의 평면들 또는 세그먼트들(308)로 분할될 수 있다. 세그먼트들(308)은 예를 들면, 병렬 처리를 가능케 하는 프레임들의 부분집합(서브세트)일 수 있다. 세그먼트들(308)은 또한 비디오 데이터를 별개의 컬러들로 분리할 수 있는 프레임들의 부분집합일 수 있다. 예를 들어, 컬러 비디오 데이터의 프레임(306)은 휘도 평면과 2 개의 색차 평면들을 포함할 수 있다. 세그먼트들(308)은 상이한 해상도들로 샘플링될 수 있다.[0042] 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. Video stream 300 includes video sequence 302. At the next level, video sequence 302 includes a number of adjacent frames 304. Although three frames are shown as adjacent frames 304, video sequence 302 may include any number of adjacent frames 304. Adjacent frames 304 may then be further subdivided into individual frames, eg, frame 306. In the next level, frame 306 may be divided into a series of planes or segments 308. Segments 308 may be, for example, a subset (subset) of frames that enable parallel processing. Segments 308 may also be a subset of frames that may separate video data into separate colors. For example, frame 306 of color video data may include a luminance plane and two chrominance planes. Segments 308 may be sampled at different resolutions.

[0043] 프레임(306)이 세그먼트들(308)로 분할되는지 여부에 관계없이, 프레임(306)은 예를 들면, 프레임(306)의 16x16 픽셀들에 대응하는 데이터를 포함할 수 있는 블록들(310)로 더욱 세분될 수 있다. 블록들(310)은 픽셀 데이터의 하나 이상의 세그먼트들(308)로부터의 데이터를 포함하도록 또한 배열될 수 있다. 블록들(310)은 또한 4x4 픽셀들, 8x8 픽셀들, 16x8 픽셀들, 8x16 픽셀들, 16x16 픽셀들, 또는 그 이상과 같은 임의의 다른 적절한 크기일 수 있다. 달리 언급되지 않는 한, 블록 및 매크로블록이라는 용어들은 본 명세서에서 상호 교환적으로 사용된다.[0043] Regardless of whether frame 306 is divided into segments 308, frame 306 may include, for example, blocks 310 that may include data corresponding to 16 × 16 pixels of frame 306. It can be further subdivided. Blocks 310 may also be arranged to include data from one or more segments 308 of pixel data. Blocks 310 may also be any other suitable size, such as 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or more. Unless stated otherwise, the terms block and macroblock are used interchangeably herein.

[0044] 도 4는 본 발명의 구현예들에 따른 인코더(400)의 블록도이다. 인코더(400)는 전술한 바와 같이, 예컨대 메모리, 예를 들면 메모리(204)에 저장된 컴퓨터 소프트웨어 프로그램을 제공함으로써, 송신 스테이션(102)에서 구현될 수 있다. 컴퓨터 소프트웨어 프로그램은, CPU(202)와 같은 프로세서에 의해 실행될 때 송신 스테이션(102)이 도 4에 기술된 방식으로 비디오 데이터를 인코딩하게 하는 기계 명령들을 포함할 수 있다. 인코더(400)는 또한, 예를 들면 송신 스테이션(102)에 포함된 특수 하드웨어로서 구현될 수도 있다. 하나의 특히 바람직한 구현예에서, 인코더(400)는 하드웨어 인코더이다.[0044] 4 is a block diagram of an encoder 400 in accordance with implementations of the invention. Encoder 400 may be implemented at transmission station 102, as described above, for example, by providing a computer software program stored in a memory, for example memory 204. The computer software program may include machine instructions that, when executed by a processor such as the CPU 202, cause the transmitting station 102 to encode the video data in the manner described in FIG. 4. Encoder 400 may also be implemented as special hardware included in transmission station 102, for example. In one particularly preferred embodiment, the encoder 400 is a hardware encoder.

[0045] 인코더(400)는 비디오 스트림(300)을 입력으로 사용하여 인코딩되거나 압축된 비트스트림(420)을 생성하기 위해 (실선 연결 라인들로 도시된) 순방향 경로에서 다양한 기능들을 수행하기 위해 다음의 단계들을 갖는다: 인트라/인터 예측 단계(402), 변환 단계(404), 양자화 단계(406), 및 엔트로피 인코딩 단계(408). 인코더(400)는 장래 블록들(future blocks)의 인코딩을 위한 프레임을 재구성하기 위해 (점선 연결 라인들로 도시된) 재구성 경로를 또한 포함할 수 있다. 도 4에서, 인코더(400)는 재구성 경로에서 다양한 기능들을 수행하기 위해 다음의 단계들을 갖는다: 역양자화 단계(410), 역변환 단계(412), 재구성 단계(414) 및 루프 필터링 단계(416). 비디오 스트림(300)을 인코딩하기 위해 인코더(400)의 다른 구조적 변형들이 사용될 수도 있다.[0045] Encoder 400 performs the following steps to perform various functions in the forward path (shown as solid connection lines) to generate encoded or compressed bitstream 420 using video stream 300 as input. Have: intra / inter prediction step 402, transform step 404, quantization step 406, and entropy encoding step 408. The encoder 400 may also include a reconstruction path (shown with dashed connection lines) to reconstruct the frame for encoding of future blocks. In FIG. 4, encoder 400 has the following steps to perform various functions in the reconstruction path: inverse quantization step 410, inverse transform step 412, reconstruction step 414, and loop filtering step 416. Other structural variations of encoder 400 may be used to encode video stream 300.

[0046] 비디오 스트림(300)이 인코딩을 위해 제시될 때, 프레임(306)과 같은 각각의 프레임들(304)은 블록들의 단위들로 처리될 수 있다. 인트라/인터 예측 단계(402)에서, 각각의 블록들은 인트라-프레임 예측(인트라 예측이라고도 함) 또는 인터-프레임 예측(인터 예측이라고도 함)을 사용하여 인코딩될 수 있다. 어떤 경우에도, 예측 블록이 형성될 수 있다. 인트라 예측의 경우, 이전에 인코딩 및 재구성된 현재 프레임의 샘플들로부터 예측 블록이 형성될 수 있다. 인터 예측의 경우, 하나 이상의 이전에 구성된 레퍼런스 프레임들의 샘플들로부터 예측 블록이 형성될 수 있다. 블록들의 그룹들에 대한 레퍼런스 프레임들의 지정은 아래에서 더욱 상세히 논의된다.[0046] When video stream 300 is presented for encoding, each frame 304, such as frame 306, may be processed in units of blocks. In intra / inter prediction step 402, each block may be encoded using intra-frame prediction (also called intra prediction) or inter-frame prediction (also called inter prediction). In any case, prediction blocks may be formed. In the case of intra prediction, a prediction block may be formed from samples of a current frame previously encoded and reconstructed. In the case of inter prediction, a prediction block may be formed from samples of one or more previously configured reference frames. The designation of reference frames for groups of blocks is discussed in more detail below.

[0047] 다음으로, 여전히 도 4를 참조하면, 잔차 블록(또한 잔차(residual)라고도 함)을 생성하기 위해 예측 블록은 인트라/인터 예측 단계(402)에서 현재 블록으로부터 차감될 수 있다. 변환 단계(404)는 예를 들면, 블록 기반 변환들을 사용하여 주파수 영역에서 잔차를 변환 계수들로 변환한다. 양자화 단계(406)는 양자화기 값(quantizer value) 또는 양자화 레벨을 사용하여 변환 계수들을 양자화 변환 계수들로 지칭되는 이산 양자 값들(discrete quantum values)로 변환한다. 예를 들면, 변환 계수들은 양자화기 값으로 나뉘어지거나 트렁케이션(truncation)될 수 있다. 양자화된 변환 계수들은 그리고 나서 엔트로피 인코딩 단계(408)에 의해 엔트로피 인코딩된다. 엔트로피 인코딩된 계수들은 그 다음에, 예를 들면, 사용된 예측 유형, 변환 유형, 모션 벡터들, 및 양자화기 값을 포함할 수 있는 블록을 디코딩하는데 사용된 다른 정보와 함께, 압축된 비트스트림(420)으로 출력된다. 압축된 비트스트림(420)은 가변 길이 코딩(VLC) 또는 산술 코딩과 같은 다양한 기술들을 사용하여 포맷될 수 있다. 압축된 비트스트림(420)은 인코딩된 비디오 스트림 또는 인코딩된 비디오 비트스트림으로도 또한 지칭될 수 있으며, 그래서 이들 용어들은 본 명세서에서 상호 교환적으로 사용될 것이다.[0047] Next, still referring to FIG. 4, the prediction block may be subtracted from the current block in an intra / inter prediction step 402 to produce a residual block (also referred to as a residual). Transform step 404 transforms the residuals into transform coefficients in the frequency domain using, for example, block-based transforms. Quantization step 406 converts the transform coefficients into discrete quantum values, referred to as quantization transform coefficients, using a quantizer value or quantization level. For example, the transform coefficients can be divided or truncated into quantizer values. The quantized transform coefficients are then entropy encoded by entropy encoding step 408. The entropy encoded coefficients are then combined with the compressed bitstream, along with other information used to decode the block, which may include, for example, the prediction type used, transform type, motion vectors, and quantizer value. 420). The compressed bitstream 420 may be formatted using various techniques such as variable length coding (VLC) or arithmetic coding. Compressed bitstream 420 may also be referred to as an encoded video stream or an encoded video bitstream, so these terms will be used interchangeably herein.

[0048] (점선 연결 라인들로 도시된) 도 4의 재구성 경로는 인코더(400) 및 디코더(500)(후술됨)가 압축된 비트스트림(420)을 디코딩하기 위해 동일한 레퍼런스 프레임들을 사용하는 것을 보장하기 위해 사용될 수 있다. 재구성 경로는 미분 잔차 블록(미분 잔차(derivative residual)라고도 함)을 생성하기 위해 양자화된 변환 계수들을 역양자화 단계(410)에서 역양자화하는 것 및 역양자화된 변환 계수들을 역변환 단계(412)에서 역변환하는 것을 포함하여, 아래에서 보다 상세히 논의되는 디코딩 프로세스 중에 발생하는 기능들과 유사한 기능들을 수행한다. 재구성 단계(414)에서, 인트라/인터 예측 단계(402)에서 예측된 예측 블록은 미분 잔차에 추가되어 재구성된 블록을 생성할 수 있다. 블로킹 아티팩트들(blocking artifacts)과 같은 왜곡을 감소시키기 위해 루프 필터링 단계(416)가 재구성된 블록에 적용될 수 있다.[0048] The reconstruction path of FIG. 4 (shown with dashed connection lines) to ensure that the encoder 400 and decoder 500 (described below) use the same reference frames to decode the compressed bitstream 420. Can be used. The reconstruction path inversely quantizes the quantized transform coefficients in inverse quantization step 410 and inversely transforms the inverse quantized transform coefficients in inverse transform step 412 to produce a differential residual block (also called a derivative residual). Performing functions similar to those occurring during the decoding process discussed in more detail below. In reconstruction step 414, the prediction block predicted in intra / inter prediction step 402 may be added to the differential residual to generate a reconstructed block. Loop filtering step 416 may be applied to the reconstructed block to reduce distortion, such as blocking artifacts.

[0049] 인코더(400)의 다른 변형들이 압축된 비트스트림(420)을 인코딩하는데 사용될 수 있다. 예를 들면, 비변환 기반 인코더가 특정 블록들 또는 프레임들에 대한 변환 단계(404)없이 직접 잔차 신호를 양자화할 수 있다. 다른 구현예에서, 인코더는 양자화 단계(406) 및 역양자화 단계(410)를 공통 단계로 결합할 수 있다.[0049] Other variations of encoder 400 may be used to encode compressed bitstream 420. For example, a non-transformation based encoder can quantize the residual signal directly without the transform step 404 for certain blocks or frames. In another implementation, the encoder may combine the quantization step 406 and the dequantization step 410 into a common step.

[0050] 도 5는 본 발명의 구현예들에 따른 디코더(500)의 블록도이다. 디코더(500)는 예를 들면, 메모리(204)에 저장된 컴퓨터 소프트웨어 프로그램을 제공함으로써 수신 스테이션(106)에서 구현될 수 있다. 컴퓨터 소프트웨어 프로그램은 CPU(202)와 같은 프로세서에 의해 실행될 때 수신 스테이션(106)이 도 5에 기술된 방식으로 비디오 데이터를 디코딩하게 하는 기계 명령들을 포함할 수 있다. 디코더(500)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)에 포함된 하드웨어에서도 또한 구현될 수 있다.[0050] 5 is a block diagram of a decoder 500 in accordance with implementations of the invention. Decoder 500 may be implemented at receiving station 106, for example, by providing a computer software program stored in memory 204. The computer software program may include machine instructions that, when executed by a processor such as the CPU 202, cause the receiving station 106 to decode the video data in the manner described in FIG. 5. Decoder 500 may also be implemented, for example, in hardware included in transmitting station 102 or receiving station 106.

[0051] 디코더(500)는 위에서 논의된 인코더(400)의 재구성 경로와 유사하게, 일례에서 압축된 비트스트림(420)으로부터 출력 비디오 스트림(516)을 생성하기 위한 다양한 기능들을 수행하기 위해 다음의 단계들을 포함한다: 엔트로피 디코딩 단계(502), 역양자화 단계(504), 역변환 단계(506), 인트라/인터 예측 단계(508), 재구성 단계(510), 루프 필터링 단계(512), 및 디블로킹 필터링 단계(514). 압축된 비트스트림(420)을 디코딩하기 위해 디코더(500)의 다른 구조적 변형들이 사용될 수도 있다.[0051] Decoder 500 includes the following steps to perform various functions for generating output video stream 516 from compressed bitstream 420 in one example, similar to the reconstruction path of encoder 400 discussed above. Entropy decoding step 502, inverse quantization step 504, inverse transform step 506, intra / inter prediction step 508, reconstruction step 510, loop filtering step 512, and deblocking filtering step ( 514). Other structural variations of the decoder 500 may be used to decode the compressed bitstream 420.

[0052] 압축된 비트스트림(420)이 디코딩을 위해 제시될 때, 압축된 비트스트림(420) 내의 데이터 요소들은 엔트로피 디코딩 단계(502)에 의해 디코딩되어 한 세트의 양자화된 변환 계수들을 생성할 수 있다. 역양자화 단계(504)는 (예를 들면, 양자화된 변환 계수들에 양자화기 값을 곱함으로써) 양자화된 변환 계수들을 역양자화하고, 역변환 단계(506)는 역양자화된 변환 계수들을 역변환하여 인코더(400)에서의 역변환 단계(412)에 의해 생성된 것과 동일할 수 있는 미분 잔차를 생성한다. 압축된 비트스트림(420)으로부터 디코딩된 헤더 정보를 사용하여, 디코더(500)는 인트라/인터 예측 단계(508)를 사용하여 인코더(400)에서, 예를 들면 인트라/인터 예측 단계(402)에서 생성된 것과 동일한 예측 블록을 생성할 수 있다. 재구성 단계(510)에서, 예측 블록은 미분 잔차에 추가되어 재구성된 블록을 생성할 수 있다. 루프 필터링 단계(512)는 블록킹 아티팩트들을 저감하기 위해 재구성된 블록에 적용될 수 있다.[0052] When the compressed bitstream 420 is presented for decoding, data elements within the compressed bitstream 420 may be decoded by entropy decoding step 502 to produce a set of quantized transform coefficients. Inverse quantization step 504 inverse quantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by the quantizer value), and inverse transform step 506 inverse transforms the inverse quantized transform coefficients to obtain Generate a differential residual, which may be the same as produced by inverse transform step 412 at 400. Using the header information decoded from the compressed bitstream 420, the decoder 500 uses the intra / inter prediction step 508 to perform an encoder 400, for example at an intra / inter prediction step 402. The same prediction block as that generated can be generated. At reconstruction step 510, the predictive block may be added to the differential residual to generate a reconstructed block. Loop filtering step 512 can be applied to the reconstructed block to reduce blocking artifacts.

[0053] 재구성된 블록에는 다른 필터링이 적용될 수도 있다. 이 예에서는, 블로킹 왜곡을 저감하기 위해 디블로킹 필터링 단계(514)가 재구성된 블록에 적용되며, 결과는 출력 비디오 스트림(516)으로 출력된다. 출력 비디오 스트림(516)은 디코딩된 비디오 스트림으로도 또한 지칭될 수 있으며, 그래서 이들 용어들은 본 명세서에서 상호 교환적으로 사용될 것이다. 디코더(500)의 다른 변형들이 압축된 비트스트림(420)을 디코딩하는데 사용될 수도 있다. 예를 들어, 디코더(500)는 디블로킹 필터링 단계(514)없이도 출력 비디오 스트림(516)을 생성할 수 있다.[0053] Other filtering may be applied to the reconstructed block. In this example, deblocking filtering step 514 is applied to the reconstructed block to reduce the blocking distortion, and the result is output to the output video stream 516. The output video stream 516 may also be referred to as a decoded video stream, so these terms will be used interchangeably herein. Other variations of the decoder 500 may be used to decode the compressed bitstream 420. For example, decoder 500 may generate output video stream 516 without deblocking filtering step 514.

[0054] 도 6은 레퍼런스 프레임 버퍼(600)의 일례의 블록도이다. 레퍼런스 프레임 버퍼(600)는 비디오 시퀀스의 프레임들의 블록들을 인코딩 또는 디코딩하는데 사용되는 레퍼런스 프레임들을 저장한다. 이 예에서, 레퍼런스 프레임 버퍼(600)는 마지막 프레임 LAST_FRAME(602), 골든 프레임 GOLDEN_FRAME (604), 및 대체 레퍼런스 프레임 ALTREF_FRAME(606)으로 식별되는 레퍼런스 프레임들을 포함한다. 레퍼런스 프레임의 프레임 헤더는 레퍼런스 프레임이 저장된 레퍼런스 프레임 버퍼 내의 위치에 대한 가상 인덱스를 포함할 수 있다. 레퍼런스 프레임 매핑은 레퍼런스 프레임의 가상 인덱스를 레퍼런스 프레임이 저장된 메모리의 물리적 인덱스에 매핑할 수 있다. 2 개의 레퍼런스 프레임들이 동일한 프레임인 경우, 이들 레퍼런스 프레임들은 상이한 가상 인덱스들을 갖는다 하더라도 동일한 물리적 인덱스를 갖게 된다. 사용된 레퍼런스 프레임 버퍼(600) 내의 레퍼런스 위치들의 개수, 유형들, 및 명칭들 단지 예들일 뿐이다.[0054] 6 is a block diagram of an example of a reference frame buffer 600. Reference frame buffer 600 stores reference frames used to encode or decode blocks of frames of a video sequence. In this example, the reference frame buffer 600 includes reference frames identified by the last frame LAST_FRAME 602, the golden frame GOLDEN_FRAME 604, and the replacement reference frame ALTREF_FRAME 606. The frame header of the reference frame may include a virtual index for a position in the reference frame buffer in which the reference frame is stored. Reference frame mapping may map a virtual index of a reference frame to a physical index of a memory in which the reference frame is stored. If two reference frames are the same frame, these reference frames will have the same physical index even if they have different virtual indices. The number, types, and names of reference positions in the reference frame buffer 600 used are merely examples.

[0055] 레퍼런스 프레임 버퍼(600)에 저장된 레퍼런스 프레임들은 인코딩 또는 디코딩될 프레임들의 블록들을 예측하기 위한 모션 벡터들을 식별하는데 사용될 수 있다. 현재 프레임의 현재 블록을 예측하기 위해 사용되는 예측의 유형에 따라 상이한 레퍼런스 프레임들이 사용될 수 있다. 예를 들면, 양방향 예측(bi-prediction)에서, 현재 프레임의 블록들은 LAST_FRAME(602) 또는 GOLDEN_FRAME(604)로서 저장된 프레임을 사용하여 순방향 예측될 수 있고, ALTREF_FRAME(606)로서 저장된 프레임을 사용하여 역방향 예측될 수 있다.[0055] Reference frames stored in reference frame buffer 600 may be used to identify motion vectors for predicting blocks of frames to be encoded or decoded. Different reference frames may be used depending on the type of prediction used to predict the current block of the current frame. For example, in bi-prediction, blocks of the current frame may be forward predicted using the frame stored as LAST_FRAME 602 or GOLDEN_FRAME 604, and reversed using the frame stored as ALTREF_FRAME 606. Can be predicted.

[0056] 레퍼런스 프레임 버퍼(600) 내에 저장될 수 있는 유한 개수의 레퍼런스 프레임들이 있을 수 있다. 도 6에 도시된 바와 같이, 레퍼런스 프레임 버퍼(600)는 최대 8 개의 레퍼런스 프레임들을 저장할 수 있으며, 각각의 저장된 레퍼런스 프레임은 레퍼런스 프레임 버퍼의 상이한 가상 인덱스와 연관될 수 있다. 레퍼런스 프레임 버퍼(600)에서 8 개의 공간들 중 3 개가 LAST_FRAME(602), GOLDEN_FRAME(604), 및 ALTREF_FRAME(606)으로 지정된 프레임들에 의해 사용되지만, 5 개의 공간들은 다른 레퍼런스 프레임을 저장하기 위해 이용 가능한 상태로 유지된다. 예를 들면, 레퍼런스 프레임 버퍼(600)에서 하나 이상의 이용 가능한 공간들은 추가 레퍼런스 프레임들, 특히 본 명세서에 기술된 보간된 레퍼런스 프레임의 일부 또는 전부를 저장하는 데 사용될 수 있다. 레퍼런스 프레임 버퍼(600)는 최대 8 개의 레퍼런스 프레임들을 저장할 수 있는 것으로 도시되어 있으나, 레퍼런스 프레임 버퍼(600)의 다른 구현예들은 추가 또는 더 적은 수의 레퍼런스 프레임들을 저장할 수도 있다.[0056] There may be a finite number of reference frames that can be stored in the reference frame buffer 600. As shown in FIG. 6, the reference frame buffer 600 may store up to eight reference frames, and each stored reference frame may be associated with a different virtual index of the reference frame buffer. Three of the eight spaces in the reference frame buffer 600 are used by frames designated as LAST_FRAME 602, GOLDEN_FRAME 604, and ALTREF_FRAME 606, but five spaces are used to store other reference frames. It stays as possible. For example, one or more available spaces in the reference frame buffer 600 may be used to store additional reference frames, particularly some or all of the interpolated reference frames described herein. Although the reference frame buffer 600 is shown capable of storing up to eight reference frames, other implementations of the reference frame buffer 600 may store additional or fewer reference frames.

[0057] 몇몇 구현예들에서, ALTREF_FRAME(606)으로 지정된 대체 레퍼런스 프레임은 디스플레이 순서에서는 현재 프레임으로부터 멀리 있지만 그 디스플레이되는 것보다 더 일찍 인코딩 또는 디코딩되는 비디오 시퀀스의 프레임일 수 있다. 예를 들면, 대체 레퍼런스 프레임은 디스플레이 순서에서 현재 프레임 이후의 10 개, 12 개, 또는 그 이상(또는 미만) 프레임들일 수 있다. 추가의 대체 레퍼런스 프레임들은 디스플레이 순서에서 현재 프레임에 더 가까이 위치된 프레임들일 수 있다.[0057] In some implementations, the replacement reference frame designated as ALTREF_FRAME 606 can be a frame of a video sequence that is far from the current frame in display order but encoded or decoded earlier than that displayed. For example, the replacement reference frame may be ten, twelve, or more (or less) frames after the current frame in display order. Additional replacement reference frames may be frames located closer to the current frame in display order.

[0058] 대체 레퍼런스 프레임은 시퀀스 중의 프레임에 직접 대응하지 않을 수 있다. 대신에, 대체 레퍼런스 프레임은 필터링이 적용되거나, 함께 결합되거나, 또는 함께 결합될 뿐만 아니라 필터링된 프레임들 중 하나 이상을 사용하여 생성될 수도 있다. 대체 레퍼런스 프레임은 디스플레이되지 않을 수도 있다. 대신에, 대체 레퍼런스 프레임은 예측 프로세스에서만 사용할 목적으로 생성 및 전송되는 프레임 또는 프레임의 일부일 수 있다(즉, 디코딩된 시퀀스가 디스플레이될 때는 생략된다).[0058] The replacement reference frame may not correspond directly to a frame in the sequence. Instead, alternative reference frames may be applied using filtering, combining together, or combining together, as well as generated using one or more of the filtered frames. The replacement reference frame may not be displayed. Instead, the replacement reference frame may be part of a frame or frame that is generated and transmitted for use only in the prediction process (ie, it is omitted when the decoded sequence is displayed).

[0059] 도 7은 비디오 시퀀스의 디스플레이 순서에 있어서의 프레임들의 그룹의 다이어그램이다. 이 예에서, 프레임들의 그룹은 키 프레임(key frame) 또는 몇몇 경우들에서는 오버레이 프레임으로 지칭될 수 있는 프레임(700)이 선행되고, 8 개의 프레임(702-716)을 포함한다. 프레임(700) 내의 어떤 블록도 프레임들의 그룹의 레퍼런스 프레임들을 사용하여 인터 예측되지 않는다. 프레임(700)은 이 예에서 키(인트라 예측된 프레임이라고도 함)이며, 이는 프레임 내의 예측된 블록들이 인트라 예측을 사용해서만 예측되는 그 상태를 지칭한다. 하지만, 프레임(700)은 오버레이 프레임일 수도 있는데, 이는 이전 그룹의 프레임들의 재구성된 프레임일 수 있는 인터 예측된 프레임이다. 인터 예측된 프레임에서, 예측된 블록들 중 적어도 일부는 인터 예측을 사용하여 예측된다. 각각의 프레임들의 그룹을 형성하는 프레임들의 개수는 비디오의 공간/시간 특성 및 예를 들면, 랜덤 액세스 또는 에러 회복성(error resilience)을 위해 선택된 키 프레임 간격과 같은 다른 인코딩된 구성들에 따라 변할 수 있다.[0059] 7 is a diagram of a group of frames in the display order of a video sequence. In this example, the group of frames is preceded by a frame 700, which may be referred to as a key frame or in some cases an overlay frame, and includes eight frames 702-716. No block in frame 700 is inter predicted using the reference frames of the group of frames. Frame 700 is a key (also referred to as an intra predicted frame) in this example, which refers to the state in which predicted blocks within the frame are predicted only using intra prediction. However, frame 700 may be an overlay frame, which is an inter predicted frame, which may be a reconstructed frame of a previous group of frames. In an inter predicted frame, at least some of the predicted blocks are predicted using inter prediction. The number of frames forming each group of frames may vary depending on the spatial / temporal nature of the video and other encoded configurations such as, for example, a key frame interval selected for random access or error resilience. have.

[0060] 각 프레임들의 그룹의 코딩 순서는 디스플레이 순서와 다를 수 있다. 이는 비디오 시퀀스에서 현재 프레임의 뒤에 위치된 프레임이 현재 프레임을 인코딩하기 위한 레퍼런스 프레임으로 사용될 수 있게 한다. 디코더(500)와 같은 디코더는 공통의 그룹 코딩 구조를 인코더(400)와 같은 인코더와 공유할 수 있다. 그룹 코딩 구조는 그룹 내의 각각의 프레임들이 레퍼런스 버프(예를 들면, 마지막 프레임, 대체 레퍼런스 프레임 등)에서 행할 수 있는 상이한 역할들을 할당하며 그룹 내의 프레임들에 대한 코딩 순서를 정의하거나 나타낸다.[0060] The coding order of a group of frames may be different from the display order. This allows a frame located after the current frame in the video sequence to be used as a reference frame for encoding the current frame. A decoder such as decoder 500 may share a common group coding structure with an encoder such as encoder 400. The group coding structure assigns different roles that each frame in the group can play in the reference buff (eg, the last frame, the replacement reference frame, etc.) and defines or represents the coding order for the frames in the group.

[0061] 도 8은 도 7의 프레임들의 그룹에 대한 코딩 순서의 일례의 다이어그램이다. 도 8의 코딩 순서는, 그룹의 각각의 프레임에 대해 단일의 역방향 레퍼런스 프레임이 이용 가능한 제1 그룹 코딩 구조와 관련된다. 인코딩 및 디코딩 순서는 동일하기 때문에, 도 8에 도시된 순서는 본 명세서에서 일반적으로 코딩 순서로 지칭된다. 키 또는 오버레이 프레임(700)은 레퍼런스 프레임 버퍼(600)에서의 GOLDEN_FRAME(604)과 같이, 레퍼런스 프레임 버퍼 내의 골든 프레임으로 지정된다. 프레임(700)은 이 예에서는 인트라 예측되며, 그래서 레퍼런스 프레임을 필요로 하지 않지만, 이전 그룹으로부터 재구성된 프레임인 프레임(700)으로서 오버레이 프레임도 또한 현재 프레임들의 그룹의 레퍼런스 프레임을 사용하지 않는다. 그룹 내의 마지막 프레임(716)은 레퍼런스 프레임 버퍼(600) 내의 ALTREF_FRAME(606)과 같이, 레퍼런스 프레임 버퍼 내의 대체 레퍼런스 프레임으로 지정된다. 이 코딩 순서에서, 프레임(716)은 나머지 프레임들(702 내지 714) 각각에 대한 역방향 레퍼런스 프레임을 제공하기 위해 프레임(700) 다음에 디스플레이 순서를 벗어나서 코딩된다. 프레임(716)의 블록들을 코딩함에 있어서, 프레임(700)은 프레임(716)의 블록들에 대한 이용 가능한 레퍼런스 프레임 역할을 한다. 도 8은 프레임들의 그룹에 대한 코딩 순서의 일례일 뿐이다. 다른 그룹 코딩 구조들은 순방향 및/또는 역방향 예측을 위해 하나 이상의 상이한 또는 추가 프레임들을 지정할 수 있다.[0061] 8 is a diagram of an example coding order for the group of frames of FIG. 7. The coding order of FIG. 8 relates to a first group coding structure in which a single backward reference frame is available for each frame of the group. Since the encoding and decoding order is the same, the order shown in FIG. 8 is generally referred to herein as the coding order. The key or overlay frame 700 is designated as a golden frame in the reference frame buffer, such as GOLDEN_FRAME 604 in the reference frame buffer 600. Frame 700 is intra predicted in this example, so it does not require a reference frame, but overlay frame as frame 700, which is a frame reconstructed from the previous group, also does not use the reference frame of the group of current frames. The last frame 716 in the group is designated an alternate reference frame in the reference frame buffer, such as ALTREF_FRAME 606 in the reference frame buffer 600. In this coding order, frame 716 is coded out of display order after frame 700 to provide a reverse reference frame for each of the remaining frames 702-714. In coding the blocks of frame 716, frame 700 serves as an available reference frame for the blocks of frame 716. 8 is just one example of a coding order for a group of frames. Other group coding structures may specify one or more different or additional frames for forward and / or backward prediction.

[0062] 위에서 간략하게 언급된 바와 같이, 이용 가능한 레퍼런스 프레임 부분은 광흐름 추정을 사용하여 보간되는 레퍼런스 프레임 부분일 수 있다. 레퍼런스 프레임 부분은 예를 들면, 블록, 슬라이스, 또는 전체 프레임일 수 있다. 본 명세서에서 설명되는 바와 같이 프레임 레벨의 광흐름 추정이 수행될 때, 결과적인 레퍼런스 프레임은 그 치수들이 현재 프레임과 동일하기 때문에 본 명세서에서는 병치된(col-located) 레퍼런스 프레임으로 지칭된다. 이 보간된 레퍼런스 프레임은 본 명세서에서 광흐름 레퍼런스 프레임으로도 지칭될 수 있다.[0062] As briefly mentioned above, the available reference frame portion may be a reference frame portion that is interpolated using light flow estimation. The reference frame portion may be, for example, a block, a slice, or an entire frame. When frame level light flow estimation is performed as described herein, the resulting reference frame is referred to herein as a col-located reference frame because its dimensions are the same as the current frame. This interpolated reference frame may also be referred to herein as a lightflow reference frame.

[0063] 도 9는 본 명세서의 교시들에 따른 모션 필드의 선형 투영을 설명하기 위해 사용되는 다이어그램이다. 계층적 코딩 프레임워크 내에서, 현재 프레임의 광흐름(모션 필드라고도 함)은 현재 프레임의 전후에서 가장 가까운 이용 가능한 재구성된(예를 들면, 레퍼런스) 프레임을 사용하여 추정될 수 있다. 도 9에서, 레퍼런스 프레임 1은 현재 프레임(900)의 순방향 예측에 사용될 수 있는 레퍼런스 프레임인 반면, 레퍼런스 프레임 2는 현재 프레임(900)의 역방향 예측에 사용될 수 있는 레퍼런스 프레임이다. 예시를 위해 도 6 내지 도 8의 예를 이용하면, 현재 프레임(900)이 프레임(706)이면, 직전 또는 마지막 프레임(704)(예를 들면, LAST_FRAME(602)으로 레퍼런스 프레임 버퍼(600)에 저장된 재구성된 프레임)이 레퍼런스 프레임 1로 사용될 수 있는 한편, 프레임(716)(예를 들면, ALTREF_FRAME(606)으로 레퍼런스 프레임 버퍼(600)에 저장된 재구성된 프레임)은 레퍼런스 프레임 2로 사용될 수 있다.[0063] 9 is a diagram used to illustrate the linear projection of a motion field in accordance with the teachings herein. Within the hierarchical coding framework, the light flow of the current frame (also called the motion field) can be estimated using the closest available reconstructed (eg, reference) frame before and after the current frame. In FIG. 9, reference frame 1 is a reference frame that can be used for forward prediction of the current frame 900, while reference frame 2 is a reference frame that can be used for backward prediction of the current frame 900. 6-8 for illustrative purposes, if the current frame 900 is a frame 706, the previous or last frame 704 (e.g., LAST_FRAME 602) to the reference frame buffer 600 The stored reconstructed frame) may be used as reference frame 1, while frame 716 (eg, a reconstructed frame stored in reference frame buffer 600 with ALTREF_FRAME 606) may be used as reference frame 2.

[0064] 현재 및 레퍼런스 프레임들의 디스플레이 인덱스들을 알면, 모션 필드가 시간적으로 선형이라고 가정하면, 레퍼런스 프레임 1과 레퍼런스 프레임 2의 픽셀들 사이에서 현재 프레임(900)의 픽셀들로 모션 벡터들이 투영될 수 있다. 도 6 내지 도 8과 관련하여 설명된 간단한 예에서, 현재 프레임(900)에 대한 인덱스는 3이고, 레퍼런스 프레임 1에 대한 인덱스는 0이며, 레퍼런스 프레임 2에 대한 인덱스는 716이다. 도 9에는, 현재 프레임(900)의 픽셀(902)에 대한 투영된 모션 벡터(904)가 도시되어 있다. 설명에 있어서 이전 예를 이용하면, 도 7의 프레임들의 그룹의 디스플레이 인덱스들은 프레임(704)이 프레임(716)보다 프레임(706)에 시간적으로 더 가깝다는 것을 보여주게 된다. 따라서, 도 9에 도시된 단일의 모션 벡터(904)는 레퍼런스 프레임 2와 현재 프레임(900) 사이와는 다른 레퍼런스 프레임 1과 현재 프레임(900) 사이의 모션의 양을 나타낸다. 그럼에도 불구하고, 투영된 모션 필드(906)는 레퍼런스 프레임 1, 현재 프레임(900), 및 레퍼런스 프레임 2 사이에서 선형이다.[0064] Knowing the display indices of the current and reference frames, assuming that the motion field is linear in time, motion vectors can be projected to the pixels of the current frame 900 between the pixels of the reference frame 1 and the reference frame 2. In the simple example described in connection with FIGS. 6-8, the index for the current frame 900 is 3, the index for reference frame 1 is 0, and the index for reference frame 2 is 716. In FIG. 9, the projected motion vector 904 for pixel 902 of the current frame 900 is shown. Using the previous example in the description, the display indices of the group of frames of FIG. 7 show that frame 704 is closer in time to frame 706 than to frame 716. Thus, the single motion vector 904 shown in FIG. 9 represents the amount of motion between reference frame 1 and current frame 900 that is different from reference frame 2 and current frame 900. Nevertheless, the projected motion field 906 is linear between reference frame 1, current frame 900, and reference frame 2.

[0065] 가장 가까운 이용 가능한 재구성된 순방향 및 역방향 레퍼런스 프레임들을 선택하고 시간적으로 선형인 현재 프레임의 각각의 픽셀들에 대한 모션 필드를 가정하는 것은, 추가 정보를 전송함이 없이 광흐름 추정을 사용한 보간된 레퍼런스 프레임의 생성이 인코더 및 디코더 양자 모두에서 (예를 들면, 인트라/인터 예측 단계(402) 및 인트라/인터 예측 단계(508)에서) 수행될 수 있게 한다. 가장 근접한 이용 가능한 재구성된 레퍼런스 프레임들 대신에, 상이한 프레임들이 인코더와 디코더 사이에 선험적으로(a priori) 지정된 것으로 사용될 수 있는 것도 가능하다. 몇몇 구현예들에서는, 광흐름 추정에 사용되는 프레임들의 식별이 전송될 수 있다. 보간된 프레임의 생성은 아래에서 더 상세하게 논의된다.[0065] Selecting the closest available reconstructed forward and reverse reference frames and assuming a motion field for each pixel of the current frame that is linear in time, is an interpolated reference frame using light flow estimation without transmitting additional information. The generation of may be performed at both the encoder and the decoder (eg, in the intra / inter prediction step 402 and the intra / inter prediction step 508). Instead of the closest available reconstructed reference frames, it is also possible that different frames can be used as specified a priori between the encoder and the decoder. In some implementations, an identification of the frames used for lightflow estimation can be sent. The generation of interpolated frames is discussed in more detail below.

[0066] 도 10은 광흐름 추정을 사용하여 생성된 레퍼런스 프레임의 적어도 일부를 사용하는 비디오 시퀀스의 프레임의 모션 보상 예측을 위한 방법 또는 프로세스(1000)의 흐름도이다. 레퍼런스 프레임 부분은 예를 들면, 블록, 슬라이스, 또는 전체 레퍼런스 프레임일 수 있다. 광흐름 레퍼런스 프레임 부분은 본 명세서에서 병치된 레퍼런스 프레임 부분으로도 또한 지칭될 수도 있다. 프로세스(1000)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)과 같은 컴퓨팅 디바이스들에 의해 실행될 수 있는 소프트웨어 프로그램으로 구현될 수 있다. 예를 들면, 소프트웨어 프로그램은 메모리(204) 또는 2 차 스토리지(214)와 같은 메모리에 저장될 수 있고 CPU(202)와 같은 프로세서에 의해 실행될 때 컴퓨팅 디바이스가 프로세스(1000)를 수행하게 할 수 있는 기계 판독 가능 명령들을 포함할 수 있다. 프로세스(1000)는 특수 하드웨어 또는 펌웨어를 사용하여 구현될 수 있다. 몇몇 컴퓨팅 디바이스들은 다수의 메모리들 또는 프로세서들을 가질 수 있고, 프로세스(1000)에 기술된 동작들은 다수의 프로세서들, 메모리들, 또는 양자 모두를 사용하여 분산될 수 있다.[0066] 10 is a flowchart of a method or process 1000 for motion compensated prediction of a frame of a video sequence using at least a portion of a reference frame generated using light flow estimation. The reference frame portion may be, for example, a block, slice, or the entire reference frame. The lightflow reference frame portion may also be referred to herein as a juxtaposed reference frame portion. Process 1000 may be implemented as a software program that may be executed by computing devices such as, for example, transmitting station 102 or receiving station 106. For example, a software program may be stored in a memory such as memory 204 or secondary storage 214 and may cause the computing device to perform process 1000 when executed by a processor such as CPU 202. Machine-readable instructions. Process 1000 may be implemented using special hardware or firmware. Some computing devices may have multiple memories or processors, and the operations described in process 1000 may be distributed using multiple processors, memories, or both.

[0067] 1002에서, 예측될 현재 프레임이 결정된다. 프레임들은 도 8에 도시된 코딩 순서와 같은, 임의의 순서로 코딩될 수 있고 그래서 예측될 수 있다. 예측될 프레임들은 제1, 제2, 제3 등의 프레임으로도 또한 지칭될 수 있다. 제1, 제2 등의 라벨은 반드시 프레임들의 순서를 나타내는 것은 아니다. 대신에, 달리 언급되지 않는 한, 라벨은 본 명세서에서 하나의 현재 프레임을 다른 프레임과 구별하기 위해 사용된다. 인코더에서, 프레임은 래스터 스캔(raster scan) 순서와 같은, 블록 코딩 순서로 블록들의 단위들로 처리될 수 있다. 디코더에서, 프레임은 또한 인코딩된 비트스트림 내의 그 인코딩된 잔차들의 수신에 따라 블록들의 단위들로 처리될 수 있다.[0067] At 1002, the current frame to be predicted is determined. The frames can be coded in any order, such as the coding order shown in FIG. 8, and so can be predicted. Frames to be predicted may also be referred to as frames of the first, second, third, and the like. The first, second, etc. labels do not necessarily indicate the order of the frames. Instead, unless otherwise stated, labels are used herein to distinguish one current frame from another. In an encoder, a frame may be processed in units of blocks in a block coding order, such as a raster scan order. At the decoder, the frame may also be processed in units of blocks upon receipt of its encoded residuals in the encoded bitstream.

[0068] 1004에서, 순방향 및 역방향 레퍼런스 프레임들이 결정된다. 본 명세서에 설명된 예들에서, 순방향 및 역방향 레퍼런스 프레임들은 현재 프레임(900)과 같은, (예를 들면, 디스플레이 순서로) 현재 프레임의 전후의 가장 가까운 재구성된 프레임들이다. 도 10에 명시적으로 도시되지는 않았으나, 순방향 또는 역방향 레퍼런스 프레임이 존재하지 않으면, 프로세스(1000)는 종료된다. 그러면 현재 프레임은 광흐름을 고려하지 않고 처리된다.[0068] At 1004, forward and reverse reference frames are determined. In the examples described herein, the forward and backward reference frames are the closest reconstructed frames before and after the current frame (eg, in display order), such as current frame 900. Although not explicitly shown in FIG. 10, if there is no forward or backward reference frame, process 1000 ends. The current frame is then processed without considering the light flow.

[0069] 1004에서 순방향 및 역방향 레퍼런스 프레임들이 존재하는 경우, 1006에서 레퍼런스 프레임들을 사용하여 광흐름 레퍼런스 프레임 부분이 생성될 수 있다. 광흐름 레퍼런스 프레임 부분의 생성은 도 11 내지 도 14를 참조하여 보다 상세히 설명된다. 광흐름 레퍼런스 프레임 부분은 몇몇 구현예들에서, 레퍼런스 프레임 버퍼(600) 내의 정의된 위치에 저장될 수 있다. 초기에, 본 명세서의 교시들에 따른 광흐름 추정이 설명된다.[0069] If there are forward and reverse reference frames at 1004, the lightflow reference frame portion may be generated using the reference frames at 1006. Generation of the light flow reference frame portion is described in more detail with reference to FIGS. The lightflow reference frame portion may be stored at a defined location in the reference frame buffer 600 in some implementations. Initially, light flow estimation in accordance with the teachings herein is described.

[0070] 다음의 라그랑지 함수(1)를 최소화함으로써 현재 프레임 부분의 각각의 픽셀들에 대해 광흐름 추정이 수행될 수 있다:[0070] Light flow estimation can be performed for each pixel of the current frame portion by minimizing the following Lagrange function 1:

[0071]

(1)[0071]

(One)

[0072] 함수 1에서,

는 명도 항등성 가정(즉, 이미지의 작은 부분의 강도 값은 위치 변화에도 불구하고 시간 경과에 따라 변화하지 않고 유지된다는 가정)에 기초한 데이터 페널티이다.

은 모션 필드의 평활도(즉, 인접한 픽셀들은 이미지에서 동일한 객체 항목에 속할 가능성이 높고 그래서 실질적으로 동일한 이미지 모션을 초래한다는 특성)에 기초한 공간 페널티이다. 라그랑지 파라미터 λ는 모션 필드의 평활도의 중요성을 제어한다. 파라미터 λ의 큰 값은 모션 필드를 보다 평활하게 하고, 보다 큰 스케일의 모션을 더 잘 감안할 수 있다. 대조적으로, 파라미터 λ의 더 작은 값은 객체의 에지들 및 작은 객체들의 이동에 보다 효과적으로 적응할 수 있다.[0072] In function 1,

Is a data penalty based on the lightness identity assumption (i.e. the intensity value of a small portion of the image remains unchanged over time despite position changes).

Is a spatial penalty based on the smoothness of the motion field (ie, the property that adjacent pixels are likely to belong to the same object item in the image and thus result in substantially the same image motion). The Lagrange parameter λ controls the importance of the smoothness of the motion field. Larger values of the parameter λ can make the motion field smoother and better account for motion at larger scales. In contrast, smaller values of the parameter λ can more effectively adapt to the movement of the edges and small objects of the object.

[0073] 본 명세서의 교시들의 구현예에 따르면, 데이터 페널티는 데이터 페널티 함수로 표현될 수 있다:[0073] According to an implementation of the teachings herein, the data penalty can be expressed as a data penalty function:

[0074]

(2)[0074]

(2)

[0075] 현재 픽셀에 대한 모션 필드의 수평 성분은 u로 표현되는 한편, 모션 필드의 수직 성분은 v로 표현된다. 대략적으로 말하면, E_x, E_y, 및 E_t는 (예를 들면, 프레임 인덱스들에 의해 표시되는 바와 같은) 수평 축 x, 수직 축 y, 및 시간 t에 대한 레퍼런스 프레임 부분들의 픽셀 값들의 도함수들이다. 수평 축 및 수직 축은 현재 프레임(900)과 같은 현재 프레임 및 레퍼런스 프레임들 1 및 2와 같은 레퍼런스 프레임들을 형성하는 픽셀들의 어레이에 대해 정의된다.[0075] The horizontal component of the motion field for the current pixel is represented by u, while the vertical component of the motion field is represented by v. Roughly speaking, E _x , E _y , and E _t are derivatives of pixel values of reference frame portions for horizontal axis x, vertical axis y, and time t (eg, as indicated by the frame indices). admit. The horizontal axis and the vertical axis are defined for an array of pixels forming a current frame, such as current frame 900, and reference frames, such as reference frames 1 and 2.

[0076] 데이터 페널티 함수에서, 도함수들 E_x, E_y, 및 E_t는 다음의 함수들 (3), (4), 및 (5)에 따라 계산될 수 있다:In the data penalty function, the derivatives E _x , E _y , and E _t can be calculated according to the following functions (3), (4), and (5):

[0077]

[0078]

(3)[0078]

(3)

[0079]

[0080]

(4)[0080]

(4)

[0081]

(5)[0081]

(5)

[0082] 변수

은 인코딩되는 현재 프레임 내의 현재 픽셀 위치의 모션 필드에 기초한 레퍼런스 프레임 1 내의 투영된 위치에서의 픽셀 값이다. 유사하게, 변수

는 인코딩되는 현재 프레임 내의 현재 픽셀 위치의 모션 필드에 기초한 레퍼런스 프레임 2 내의 투영된 위치에서의 픽셀 값이다.[0082] variable

Is the pixel value at the projected position in reference frame 1 based on the motion field of the current pixel position in the current frame being encoded. Similarly, variable

Is the pixel value at the projected position in reference frame 2 based on the motion field of the current pixel position in the current frame being encoded.

[0083] 변수

은 레퍼런스 프레임 1의 디스플레이 인덱스이며, 여기서 프레임의 디스플레이 인덱스는 비디오 시퀀스의 디스플레이 순서에 있어서의 그 인덱스이다. 유사하게, 변수

는 레퍼런스 프레임 2의 디스플레이 인덱스이고, 변수

은 현재 프레임(900)의 디스플레이 인덱스이다.[0083] variable

Is the display index of reference frame 1, where the display index of the frame is that index in the display order of the video sequence. Similarly, variable

Is the display index of reference frame 2,

Is the display index of the current frame 900.

[0084] 변수

은 선형 필터를 사용하여 레퍼런스 프레임 1에서 계산된 수평 도함수(horizontal derivative)이다. 변수

는 선형 필터를 사용하여 레퍼런스 프레임 2에서 계산된 수평 도함수이다. 변수

은 선형 필터를 사용하여 레퍼런스 프레임 1에서 계산된 수직 도함수이다. 변수

는 선형 필터를 사용하여 레퍼런스 프레임 2에서 계산된 수직 도함수이다.[0084] variable

Is a horizontal derivative calculated in reference frame 1 using a linear filter. variable

Is the horizontal derivative calculated at reference frame 2 using a linear filter. variable

Is the vertical derivative calculated at reference frame 1 using the linear filter. variable

Is the vertical derivative calculated at reference frame 2 using a linear filter.

[0085] 본 명세서의 교시들의 구현예에서, 수평 도함수를 계산하는데 사용되는 선형 필터는 필터 계수 [-1/60, 9/60, -45/60, 0, 45/60, -9/60, 1/60]를 갖는 7-탭 필터(7-tap filter)이다. 필터는 상이한 주파수 프로파일, 상이한 개수의 탭들, 또는 양자 모두를 가질 수 있다. 수직 도함수들을 계산하는데 사용되는 선형 필터는 수평 도함수들을 계산하는데 사용되는 선형 필터와 동일하거나 상이할 수 있다.[0085] In an embodiment of the teachings herein, the linear filter used to calculate the horizontal derivative is a filter coefficient [-1/60, 9/60, -45/60, 0, 45/60, -9/60, 1/60. 7-tap filter with]. The filter may have a different frequency profile, different number of taps, or both. The linear filter used to calculate the vertical derivatives may be the same or different than the linear filter used to calculate the horizontal derivatives.

[0086] 공간 페널티는 공간 페널티 함수로 나타낼 수 있다:[0086] Spatial penalty can be expressed as a spatial penalty function:

[0087]

(6)[0087]

(6)

[0088] 공간 페널티 함수(6)에서, Δu는 모션 필드의 수평 성분 u의 라플라시안(Laplacian)이고, Δv는 모션 필드의 수직 성분 v의 라플라시안이다.[0088] In the spatial penalty function 6, Δu is Laplacian of the horizontal component u of the motion field, and Δv is Laplacian of the vertical component v of the motion field.

[0089] 도 11은 광흐름 레퍼런스 프레임 부분을 생성하기 위한 방법 또는 프로세스(1100)의 흐름도이다. 이 예에서, 광흐름 레퍼런스 프레임 부분은 전체 레퍼런스 프레임이다. 프로세스(1100)는 프로세스(1000)의 스텝 1006을 구현할 수 있다. 프로세스(1100)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)과 같은 컴퓨팅 디바이스들에 의해 실행될 수 있는 소프트웨어 프로그램으로 구현될 수 있다. 예를 들면, 소프트웨어 프로그램은 메모리(204) 또는 2 차 스토리지(214)와 같은 메모리에 저장될 수 있고 CPU(202)와 같은 프로세서에 의해 실행될 때 컴퓨팅 디바이스가 프로세스(1100)를 수행하게 할 수 있는 기계 판독 가능 명령들을 포함할 수 있다. 프로세스(1100)는 특수 하드웨어 또는 펌웨어를 사용하여 구현될 수 있다. 전술한 바와 같이, 다수의 프로세서들, 메모리들, 또는 양자 모두가 사용될 수도 있다.[0089] 11 is a flowchart of a method or process 1100 for generating a lightflow reference frame portion. In this example, the lightflow reference frame portion is the entire reference frame. Process 1100 may implement step 1006 of process 1000. Process 1100 may be implemented as a software program that may be executed by computing devices such as, for example, transmitting station 102 or receiving station 106. For example, a software program may be stored in a memory such as memory 204 or secondary storage 214 and may cause the computing device to perform process 1100 when executed by a processor such as CPU 202. Machine-readable instructions. Process 1100 may be implemented using special hardware or firmware. As noted above, multiple processors, memories, or both may be used.

[0090] 순방향 및 역방향 레퍼런스 프레임들은 서로 상대적으로 멀리 떨어져 있을 수 있기 때문에, 이들 사이에는 극적인 모션이 있을 수 있으며, 그래서 명도 항등성 가정의 정확도를 저하시킨다. 이 문제로부터 초래되는 픽셀의 모션에 있어서의 잠재적인 에러를 저감하기 위해, 현재 프레임으로부터 레퍼런스 프레임들로의 추정된 모션 벡터들이 현재 프레임에 대한 광흐름 추정을 초기화하는데 사용될 수 있다. 1102에서, 현재 프레임 내의 모든 픽셀들에는 초기화된 모션 벡터가 할당될 수 있다. 이들은 레퍼런스 프레임들 사이의 모션 길이들을 단축시키기 위해 제1 처리 레벨을 위해 레퍼런스 프레임들을 현재 프레임으로 워핑(warping)하는데 이용될 수 있는 초기 모션 필드들을 정의한다.[0090] Because forward and reverse reference frames can be relatively far apart from each other, there can be dramatic motion between them, thus degrading the accuracy of the brightness identity assumption. In order to reduce potential errors in motion of the pixel resulting from this problem, estimated motion vectors from the current frame to the reference frames can be used to initialize the light flow estimate for the current frame. At 1102, all pixels in the current frame can be assigned an initialized motion vector. These define initial motion fields that can be used to warp reference frames to the current frame for the first processing level to shorten the motion lengths between the reference frames.

[0091] 현재 픽셀의 모션 필드

은 다음의 함수에 따라, 현재 픽셀로부터 역방향 레퍼런스 프레임, 이 예에서는 레퍼런스 프레임 2를 가리키는 추정된 모션 벡터

와 현재 픽셀로부터 순방향 레퍼런스 프레임, 이 예에서는 레퍼런스 프레임 1을 가리키는 추정된 모션 벡터

사이의 차이를 나타내는 모션 벡터를 사용하여 초기화될 수 있다:[0091] Motion field of the current pixel

Is an estimated motion vector pointing from the current pixel to the reverse reference frame, in this example, reference frame 2.

And an estimated motion vector pointing from the current pixel to the forward reference frame, in this example, reference frame 1.

It can be initialized using a motion vector representing the difference between:

[0092]

[0093] 모션 벡터들 중 어느 하나가 이용 가능하지 않은 경우, 다음의 함수들 중 어느 하나에 따라 이용 가능한 모션 벡터를 사용하여 초기 모션을 외삽(extrapolate)하는 것이 가능하다:[0093] If either of the motion vectors is not available, it is possible to extrapolate the initial motion using the available motion vector according to any of the following functions:

[0094]

, 또는[0094]

, or

[0095]

.[0095]

.

[0096] 현재 픽셀이 이용 가능한 모션 벡터 레퍼런스를 갖지 않는 경우, 초기화된 모션 벡터를 갖는 하나 이상의 공간적 이웃들이 사용될 수 있다. 예를 들면, 이용 가능한 인접한 초기화된 모션 벡터들의 평균이 사용될 수 있다.[0096] If the current pixel does not have an available motion vector reference, one or more spatial neighbors with an initialized motion vector can be used. For example, an average of available adjacent initialized motion vectors can be used.

[0097] 1102에서 제1 처리 레벨을 위한 모션 필드를 초기화하는 예에서, 레퍼런스 프레임 2는 레퍼런스 프레임 1의 픽셀을 예측하기 위해 사용될 수 있는데, 여기서 레퍼런스 프레임 1은 현재 프레임이 코딩되기 전의 마지막 프레임이다. 도 9에 도시된 것과 유사한 방식으로 선형 투영을 사용하여 현재 프레임에 투영된 그 모션 벡터는, 픽셀 위치(902)에서의 모션 필드(906)와 같이, 교차하는 픽셀 위치에 모션 필드 mv_cur을 발생시킨다.In an example of initializing a motion field for a first processing level at 1102, reference frame 2 may be used to predict a pixel of reference frame 1, where reference frame 1 is the last frame before the current frame is coded. . The motion vector projected on the current frame using linear projection in a manner similar to that shown in FIG. 9 generates a motion field mv _cur at the intersecting pixel location, such as the motion field 906 at pixel location 902. Let's do it.

[0098] 도 11은 프로세스(1100)에는 바람직하게는 다수의 처리 레벨들이 있기 때문에 제1 처리 레벨을 위한 모션 필드를 초기화하는 것을 언급하고 있다. 이는 도 11의 프로세스(1100)(및 아래에서 논의되는 도 12의 프로세스(1200))를 예시하는 다이어그램인 도 13을 참조하면 볼 수 있다. 이하의 설명은 모션 필드라는 문구를 사용한다. 이 문구는 문맥으로부터 달리 분명하지 않은 한, 각각의 픽셀들에 대한 모션 필드들을 총괄적으로 지칭하기 위한 것이다. 따라서, 2 개 이상의 모션 필드를 언급할 때에는 "모션 필드들" 또는 "모션 필드" 라는 문구들이 상호 교환적으로 사용될 수 있다. 또한, 픽셀들의 이동을 지칭할 때에는 광흐름이라는 문구가 모션 필드라는 문구와 상호 교환적으로 사용될 수 있다.[0098] 11 refers to initializing a motion field for a first processing level because process 1100 preferably has multiple processing levels. This can be seen with reference to FIG. 13, which is a diagram illustrating the process 1100 of FIG. 11 (and the process 1200 of FIG. 12 discussed below). The following description uses the phrase motion field. This phrase is intended to collectively refer to the motion fields for each pixel, unless the context clearly dictates otherwise. Thus, the phrases "motion fields" or "motion fields" may be used interchangeably when referring to two or more motion fields. Also, when referring to the movement of pixels, the phrase light flow may be used interchangeably with the phrase motion field.

[0099] 프레임의 픽셀들에 대한 모션 필드/광흐름을 추정하기 위해, 피라미드 또는 다층 구조가 사용될 수 있다. 예를 들면, 하나의 피라미드 구조에서, 레퍼런스 프레임들은 하나 이상의 상이한 스케일들로 스케일 다운된다(scaled down). 그 다음에, 광흐름은 먼저 피라미드의 최고 레벨(제1 처리 레벨)에서, 즉 가장 스케일링된 레퍼런스 프레임들을 사용하여 모션 필드를 획득하기 위해 추정된다. 그 후에, 모션 필드는 업스케일링되어 다음 레벨에서 광흐름 추정을 초기화하는데 사용된다. 모션 필드를 업스케일링하고, 이를 사용하여 다음 레벨의 광흐름 추정을 초기화하며, 모션 필드를 획득하는 이 프로세스는 피라미드의 최하위 레벨에 도달할 때까지(즉, 전체 스케일의 레퍼런스 프레임 부분들에 대해 광흐름 추정이 완료될 때까지) 계속된다.[0099] In order to estimate the motion field / light flow for the pixels of the frame, a pyramid or multilayer structure can be used. For example, in one pyramid structure, the reference frames are scaled down to one or more different scales. Then, the light flow is first estimated to obtain a motion field at the highest level (first processing level) of the pyramid, ie using the most scaled reference frames. Thereafter, the motion field is upscaled and used to initialize the light flow estimate at the next level. This process of upscaling the motion field, using it to initialize the next level of light flow estimation, and obtaining the motion field is performed until the lowest level of the pyramid is reached (i.e., for reference frame portions at full scale). Until flow estimation is completed).

[00100] 이 프로세스의 논리적 근거는 이미지가 스케일 다운될 때 큰 모션을 캡처하는 것이 더 용이하다는 것이다. 하지만, 레퍼런스 프레임들 자체를 스케일링하기 위해 간단한 리스케일 필터들(rescale filters)을 사용하는 것은 레퍼런스 프레임 품질을 저하시킬 수 있다. 리스케일링으로 인한 세부 정보의 손실을 회피하기 위해, 광흐름을 추정하기 위해 레퍼런스 프레임들의 픽셀들 대신 도함수들을 스케일링하는 피라미드 구조. 이 피라미드 스킴(scheme)은 광흐름 추정을 위한 회귀 분석을 나타낸다. 이 스킴은 도 13에 도시되어 있으며, 도 11의 프로세스(1100) 및 도 12의 프로세스(1200)에 의해 구현된다.[00100] The logical basis of this process is that it is easier to capture large motion when the image is scaled down. However, using simple rescale filters to scale the reference frames themselves may degrade the reference frame quality. A pyramid structure that scales derivatives instead of pixels of reference frames to estimate light flow to avoid loss of detail due to rescaling. This pyramid scheme represents a regression analysis for light flow estimation. This scheme is shown in FIG. 13 and is implemented by process 1100 of FIG. 11 and process 1200 of FIG. 12.

[00101] 초기화 후에, 1104에서 라그랑지 함수(1)의 해를 구하기 위해 라그랑지 파라미터 λ가 설정된다. 바람직하게는, 프로세스(1100)는 라그랑지 파라미터 λ에 대해 다수의 값들을 사용한다. 1104에서 라그랑지 파라미터 λ가 설정되는 제1 값은 100과 같이 비교적 큰 값일 수 있다. 프로세스(1100)가 라그랑지 함수(1) 내에서 라그랑지 파라미터 λ에 대해 다수의 값들을 사용하는 것이 바람직하지만, 후술되는 프로세스(1200)에서 후술되는 바와 같이 단 하나의 값만이 사용되는 것도 가능하다.[00101] After initialization, the Lagrange parameter λ is set at 1104 to solve the Lagrange function 1. Preferably, process 1100 uses multiple values for the Lagrange parameter λ. The first value at which the Lagrange parameter λ is set at 1104 may be a relatively large value such as 100. Although the process 1100 preferably uses multiple values for the Lagrange parameter [lambda] within the Lagrange function 1, it is also possible that only one value is used as described below in the process 1200 described below. .

[00102] 1106에서, 레퍼런스 프레임들은 현재 처리 레벨에 대한 모션 필드에 따라 현재 프레임으로 워핑된다. 레퍼런스 프레임들을 현재 프레임으로 워핑하는 것은 서브픽셀 위치 라운딩(subpixel location rounding)을 사용하여 수행될 수 있다. 제1 처리 레벨에서 사용되는 모션 필드

는 워핑을 수행하기 전에 그 풀 해상도 값으로부터 레벨의 해상도로 다운스케일링된다는 점에 유의할 필요가 있다. 모션 필드의 다운스케일링은 아래에서 보다 상세히 논의된다.At 1106, the reference frames are warped to the current frame according to the motion field for the current processing level. Warping the reference frames to the current frame may be performed using subpixel location rounding. Motion Fields Used at First Processing Level

It should be noted that is downscaled from its full resolution value to the resolution of the level before performing warping. Downscaling of the motion field is discussed in more detail below.

[00103] 광흐름

을 알면, 레퍼런스 프레임 1을 워핑하기 위한 모션 필드는 다음과 같이 (예를 들면, 모션은 시간 경과에 따라 선형으로 투영된다는) 선형 투영 가정에 의해 추론된다:[00103] light flow

Knowing that, the motion field for warping reference frame 1 is inferred by the linear projection assumption (e.g., the motion is projected linearly over time) as follows:

[00104]

[00105] 워핑을 수행하기 위해, 모션 필드

의 수평 성분

및 수직 성분

은 Y 성분에 대해서는 1/8 픽셀 정밀도로 및 U 및 V 성분에 대해서는 1/16 픽셀 정밀도로 라운딩될 수 있다. 서브픽셀 위치 라운딩에 대한 다른 값들도 사용될 수 있다. 라운딩 후, 워핑된 이미지의 각각의 픽셀

은 모션 벡터

에 의해 주어진 참조된 픽셀로 계산된다. 종래의 서브픽셀 보간 필터를 사용하여 서브픽셀 보간이 수행될 수 있다.[00105] To perform warping, a motion field

Horizontal component of

And vertical components

Can be rounded to 1/8 pixel precision for the Y component and 1/16 pixel precision for the U and V components. Other values for subpixel position rounding can also be used. After rounding, each pixel of the warped image

Silver motion vector

Computed with the referenced pixel given by. Subpixel interpolation may be performed using conventional subpixel interpolation filters.

[00106] 레퍼런스 프레임 2에 대해서도 동일한 워핑 접근법이 수행되어 워핑된 이미지

가 얻어지는데, 여기서 모션 필드는 다음에 의해 계산된다:[00106] The same warping approach is performed for reference frame 2 to which the warped image

Is obtained, where the motion field is calculated by:

[00107]

[00108] 1106에서의 계산의 말미에는, 2 개의 워핑된 레퍼런스 프레임들이 존재한다. 2 개의 워핑된 레퍼런스 프레임들은 1108에서 이들 사이의 모션 필드를 추정하는데 사용된다. 1108에서 모션 필드를 추정하는 것은 다수의 스텝들을 포함할 수 있다.[00108] At the end of the calculation at 1106, there are two warped reference frames. Two warped reference frames are used to estimate the motion field between them at 1108. Estimating the motion field at 1108 may include a number of steps.

[00109] 먼저, 도함수들 E_x, E_y, 및 E_t가 함수들 (3), (4), 및 (5)를 사용하여 계산된다. 도함수들을 계산할 때, 가장 가까운 이용 가능한 픽셀을 복사함으로써 워핑된 레퍼런스 프레임의 프레임 경계부들이 확장될 수 있다. 이러한 방식으로, 투영된 위치들이 워핑된 레퍼런스 프레임의 외부에 있을 때 픽셀 값들(즉,

및/또는

)이 얻어질 수 있다. 그 다음에, 다수의 층들이 있으면, 도함수들은 현재 레벨로 다운스케일링된다. 도 13에 도시된 바와 같이, 레퍼런스 프레임들은 세부 사항들을 캡쳐하기 위해 원래 스케일에서 도함수를 계산하는 데 사용된다. 각각의 레벨 l에서 도함수들을 다운스케일링하는 것은 2¹ x 2¹ 블록 내에서 평균화함으로써 계산될 수 있다. 도함수들을 계산하는 것뿐만 아니라 도함수들을 평균화하여 다운스케일링하는 것은 모두 선형 연산들이기 때문에, 2 개의 연산들은 각각의 레벨 l에서 도함수들 계산하기 위해 단일의 선형 필터로 결합될 수 있다는 것이 주목된다. 이는 계산들의 복잡도를 낮출 수 있다.First, the derivatives E _x , E _y , and E _t are calculated using the functions (3), (4), and (5). When calculating the derivatives, the frame boundaries of the warped reference frame can be extended by copying the nearest available pixel. In this way, pixel values (i.e., when projected positions are outside of the warped reference frame)

And / or

) Can be obtained. Then, if there are multiple layers, the derivatives are downscaled to the current level. As shown in FIG. 13, reference frames are used to calculate the derivative at the original scale to capture details. Downscaling the derivatives at each level l can be calculated by averaging within 2 ¹ x 2 ¹ blocks. It is noted that the two operations can be combined into a single linear filter to calculate the derivatives at each level l, as well as calculating the derivatives as well as averaging downscaling the derivatives. This can lower the complexity of the calculations.

[00110] 도함수들이 현재 처리 레벨로 다운스케일링되면, 해당되는 경우, 라그랑지 함수(1)에 따라 광흐름 추정이 수행될 수 있다. 보다 구체적으로, 모션 필드의 수평 성분 u 및 모션 필드의 수직 성분 v에 대한 라그랑지 함수(1)의 도함수들을 0으로 설정함으로써(즉,

및

),

개의 선형 방정식들로 프레임의 모든 N 픽셀들에 대해 성분 u 및 v를 풀 수 있다. 이것은 라플라시안(Laplacians)이 2 차원(2D) 필터에 의해 근사된다는 사실에 기인한다. 정확하지만 매우 복잡한 선형 방정식들을 직접 푸는 대신에, 라그랑지 함수(1)를 최소화하기 위해 반복 접근법들이 사용되어 더 빠르지만 덜 정확한 결과를 얻을 수 있다.If the derivatives are downscaled to the current processing level, light flow estimation may be performed according to the Lagrange function 1, as appropriate. More specifically, by setting the derivatives of the Lagrange function 1 for the horizontal component u of the motion field and the vertical component v of the motion field to 0 (ie

And

),

The linear equations can solve components u and v for all N pixels of the frame. This is due to the fact that Laplacians are approximated by a two-dimensional (2D) filter. Instead of solving the exact but very complex linear equations directly, iterative approaches can be used to minimize the Lagrangian function (1) to achieve faster but less accurate results.

[00111] 1108에서, 현재 프레임의 픽셀들에 대한 모션 필드가 워핑된 레퍼런스 프레임들 사이의 추정된 모션 필드를 사용하여 업데이트되거나 개선된다. 예를 들면, 픽셀에 대한 현재 모션 필드는 픽셀별로 각각의 픽셀에 대해 추정된 모션 필드를 추가함으로써 업데이트될 수 있다.[00111] At 1108, the motion field for the pixels of the current frame is updated or improved using the estimated motion field between the warped reference frames. For example, the current motion field for a pixel can be updated by adding an estimated motion field for each pixel on a pixel-by-pixel basis.

[00112] 일단 1108에서 모션 필드가 추정되고 나면, 라그랑지 파라미터 λ에 대한 이용 가능한 추가 값들이 있는지 여부를 결정하기 위해 1110에서 질의가 이루어진다. 라그랑지 파라미터 λ의 보다 작은 값들은 보다 작은 스케일의 모션에 대처할 수 있다. 추가 값들이 존재하는 경우, 프로세스(1100)는 1104로 복귀하여 라그랑지 파라미터 λ에 대한 다음 값을 설정할 수 있다. 예를 들면, 프로세스(1100)는 각각의 반복(each iteration)에서 라그랑지 파라미터 λ를 절반으로 줄이면서 반복할 수 있다. 1108에서 업데이트된 모션 필드는 이 다음 반복에서 1106에서 레퍼런스 프레임들을 워핑하기 위한 현재 모션 필드이다. 그 다음에, 모션 필드는 1108에서 다시 추정된다. 1104, 1106, 및 1108에서의 처리는 1110에서의 모든 가능한 라그랑지 파라미터들이 처리될 때까지 계속된다. 일례에서, 도 13에 도시된 바와 같이 피라미드에는 3 개의 레벨들이 있으며, 그래서 일례에서 라그랑지 파라미터 λ의 최소값은 25이다. 라그랑지 파라미터를 변경하면서 이루어지는 이러한 반복 처리는 라그랑지 파라미터를 어닐링(annealing)하는 것으로 지칭될 수 있다.[00112] Once the motion field is estimated at 1108, a query is made at 1110 to determine whether there are additional values available for the Lagrange parameter λ. Smaller values of the Lagrange parameter λ can cope with motion of a smaller scale. If there are additional values, process 1100 can return to 1104 to set the next value for the Lagrange parameter [lambda]. For example, process 1100 may repeat while cutting the Lagrange parameter λ in half at each iteration. The motion field updated at 1108 is the current motion field to warp reference frames at 1106 in this next iteration. The motion field is then estimated again at 1108. Processing at 1104, 1106, and 1108 continues until all possible Lagrange parameters at 1110 have been processed. In one example, there are three levels in the pyramid as shown in FIG. 13, so in one example the minimum value of the Lagrange parameter λ is 25. This iterative process performed while changing the Lagrange parameter may be referred to as annealing the Lagrange parameter.

[00113] 일단 1110에서 라그랑지 파라미터 λ에 대해 남아있는 값이 없게 되면, 프로세스(1100)는 1112로 진행하여 처리할 처리 레벨들이 더 있는지 여부를 결정한다. 1112에서 추가 처리 레벨들이 존재하면, 프로세스(1100)는 1114로 진행하며, 여기서 1104에서 시작하여 라그랑지 파라미터 λ에 대해 이용 가능한 값들 각각을 사용하여 다음 층을 처리하기 전에 모션 필드가 업스케일링된다. 모션 필드의 업스케일링은 전술한 다운스케일링 계산들의 역(reverse)을 포함하는(그러나, 이에 제한되지 않음) 임의의 알려진 기술을 사용하여 수행될 수 있다.[00113] Once there is no remaining value for the Lagrange parameter [lambda] at 1110, process 1100 proceeds to 1112 to determine whether there are more processing levels to process. If there are additional processing levels at 1112, process 1100 proceeds to 1114 where the motion field is upscaled before processing the next layer using each of the available values for the Lagrange parameter λ starting at 1104. Upscaling of the motion field may be performed using any known technique, including but not limited to the inverse of the downscaling calculations described above.

[00114] 일반적으로, 광흐름은 피라미드의 최고 레벨에서 모션 필드을 획득하기 위해 먼저 추정된다. 그 후에, 모션 필드는 업스케일링되어 다음 레벨에서 광흐름 추정을 초기화하는데 사용된다. 모션 필드를 업스케일링하고, 이를 사용하여 다음 레벨의 광흐름 추정을 초기화하며, 모션 필드를 획득하는 이 프로세스는 1112에서 피라미드의 최하위 레벨에 도달할 때까지(즉, 풀 스케일로 계산된 도함수들에 대해 광흐름 추정이 완료될 때까지) 계속된다.[00114] In general, the light flow is first estimated to obtain a motion field at the highest level of the pyramid. Thereafter, the motion field is upscaled and used to initialize the light flow estimate at the next level. This process of upscaling the motion field, using it to initialize the next level of light flow estimation, and obtaining the motion field is performed at 1112 until the lowest level of the pyramid is reached (i.e., full-scaled derivatives). Until the light flow estimation is completed.

[00115] 일단 레벨이 레퍼런스 프레임들이 다운스케일링되지 않는 레벨(즉, 레퍼런스 프레임들이 그 원래 해상도에 있음)에 도달하면, 프로세스(1100)는 1116으로 진행한다. 예를 들면, 레벨들의 수는 도 13의 예에서와 같은, 3 개일 수 있다. 1116에서, 워핑된 레퍼런스 프레임들은 블렌딩되어 광흐름 레퍼런스 프레임

를 형성한다. 1116에서 블렌딩된 워핑된 레퍼런스 프레임들은 1108에서 추정된 모션 필드를 사용하여 1106에 기술된 프로세스에 따라 다시 워핑되는 풀 스케일 레퍼런스 프레임들일 수 있음에 유의하자. 다시 말하면, 풀 스케일의 레퍼런스 프레임들은 2 회 ― 이전 처리 층으로부터의 최초의 업스케일링된 모션 필드를 사용하여 한 번 및 모션 필드가 풀 스케일 레벨로 개선된 후에 다시 ― 워핑될 수 있다. 블렌딩은 다음과 같이 (예를 들면, 프레임들이 동일한 시간 간격들로 이격되어 있다는) 시간 선형성 가정을 이용하여 수행될 수 있다:[00115] Once the level reaches a level where the reference frames are not downscaled (ie, the reference frames are at their original resolution), process 1100 proceeds to 1116. For example, the number of levels can be three, such as in the example of FIG. 13. At 1116, the warped reference frames are blended to form a lightflow reference frame

To form. Note that the warped reference frames blended at 1116 may be full scale reference frames warped again in accordance with the process described at 1106 using the motion field estimated at 1108. In other words, the full scale reference frames may be warped twice-once using the first upscaled motion field from the previous processing layer and again after the motion field has been improved to the full scale level. Blending can be performed using the assumption of linearity of time (eg, frames are spaced at equal time intervals) as follows:

[00116]

[00116]

[00117] 몇몇 구현예들에서는, 블렌딩된 값보다는 워핑된 레퍼런스 프레임들 중 하나만의 픽셀을 선호하는 것이 바람직하다. 예를 들면, (

로 표시된) 레퍼런스 프레임 1의 레퍼런스 픽셀은 경계들을 벗어하는(예를 들면, 프레임의 치수들의 외부에 있음) 반면 레퍼런스 프레임 2의 레퍼런스 픽셀은 경계들을 벗어나지 않으면, 레퍼런스 프레임 2에서 발생하는 워핑된 이미지의 픽셀만 다음에 따라 사용된다:In some implementations, it is desirable to prefer a pixel of only one of the warped reference frames over the blended value. For example, (

The reference pixel of reference frame 1, which is indicated by, is outside the boundaries (eg, outside of the dimensions of the frame), while the reference pixel of reference frame 2 does not leave the boundaries, so that the reference pixel of the warped image Only pixels are used according to:

[00118]

[00119] 블렌딩의 일부로서 선택적인 폐색 검출(occlusion)이 수행될 수 있다. 객체들과 배경의 폐색은 비디오 시퀀스에서 흔히 발생하는데, 객체의 부분들은 하나의 레퍼런스 프레임에는 나타나지만 다른 레퍼런스 프레임에는 숨겨져 있다. 일반적으로, 전술한 광흐름 추정 방법은 명도 항등성 가정이 위반되기 때문에 이러한 상황에서는 객체의 모션을 추정할 수 없다. 폐색의 크기가 비교적 작은 경우, 평활도 페널티 함수는 모션을 꽤 정확하게 추정할 수 있다. 즉, 숨겨진 부분에서의 미정의(undefined) 모션 필드가 인접한 모션 벡터들에 의해 평활화되면, 전체 객체의 모션은 정확할 수 있다.[00119] Selective occlusion can be performed as part of the blending. Occlusion of objects and backgrounds is common in video sequences, where parts of an object appear in one reference frame but are hidden in another. In general, the light flow estimation method described above cannot estimate the motion of an object in such a situation because the brightness identity assumption is violated. If the size of the occlusion is relatively small, the smoothness penalty function can estimate the motion quite accurately. In other words, if an undefined motion field in the hidden part is smoothed by adjacent motion vectors, the motion of the entire object can be accurate.

[00120] 하지만, 이러한 경우에 조차도, 전술한 간단한 블렌딩 방법은 만족스러운 보간 결과를 제공하지 않을 수 있다. 이는 객체 폐색을 예시하는 다이어그램인 도 14를 참조하여 설명될 수 있다. 이 예에서 객체 A의 폐색된 부분은 레퍼런스 프레임 1에서는 표시되고 레퍼런스 프레임 2에서는 객체 B에 의해 숨겨진다. 객체 A의 숨겨진 부분이 레퍼런스 프레임 2에는 표시되지 않기 때문에, 레퍼런스 프레임 2로부터의 참조된 픽셀은 객체 B의 것이다. 이 경우에는, 레퍼런스 프레임 1로부터의 워핑된 픽셀만을 사용하는 것이 바람직하다. 따라서, 상기 블렌딩 대신에 또는 그에 부가하여 폐색들을 검출하는 기술을 사용하면 보다 나은 블렌딩 결과를 제공할 수 있고, 그래서 보다 나은 레퍼런스 프레임을 제공할 수 있다.[00120] However, even in this case, the simple blending method described above may not provide satisfactory interpolation results. This can be explained with reference to FIG. 14, which is a diagram illustrating object occlusion. In this example, the occluded portion of object A is visible in reference frame 1 and hidden by object B in reference frame 2. Since the hidden portion of object A is not displayed in reference frame 2, the referenced pixel from reference frame 2 is that of object B. In this case, it is preferable to use only warped pixels from reference frame one. Thus, using techniques to detect occlusions instead of or in addition to the above blending may provide better blending results and thus better reference frames.

[00121] 폐색의 검출과 관련하여, 도 14로부터 폐색이 발생하고 모션 필드가 꽤 정확한 경우, 객체 A의 폐색된 부분의 모션 벡터는 레퍼런스 프레임 2에서 객체 B를 가리킴이 관찰된다. 이는 다음과 같은 상황들을 초래할 수 있다. 첫 번째 상황은 워핑된 픽셀 값들

및

는 2 개의 상이한 객체들에서 온 것이기 때문에 매우 다르다는 것이다. 두 번째 상황은 객체 B의 픽셀들은 현재 프레임 내의 객체 B 및 현재 프레임 내의 객체 A의 폐색 부분에 대한 다수의 모션 벡터들에 의해 참조된다는 것이다.Regarding the detection of occlusion, if occlusion occurs from FIG. 14 and the motion field is quite accurate, it is observed that the motion vector of the occluded portion of object A points to object B in reference frame 2. This can lead to the following situations. The first situation is warped pixel values

And

Is very different because it comes from two different objects. The second situation is that the pixels of object B are referenced by a number of motion vectors for object B in the current frame and occlusion portion of object A in the current frame.

[00122] 이러한 관찰들에 의해,

에 대한

만의 폐색 및 사용을 결정하기 위해 다음의 조건들이 확립될 수 있으며,

에 대해

만 사용하는 경우에도 유사한 조건들이 적용된다:[00122] By these observations,

For

To determine the occlusion and use of the bay, the following conditions can be established,

About

Similar conditions apply for use only:

[00123]

이 문턱값

보다 크고; 그리고[00123]

This threshold

Greater than; And

[00124]

가 문턱값

보다 크다.[00124]

Fall threshold

Greater than

[00125]

는 레퍼런스 프레임 1의 참조된 픽셀이 현재의 병치된 프레임 내의 임의의 픽셀에 의해 참조되는 총 횟수이다. 전술한 서브픽셀 보간이 존재하는 경우, 레퍼런스 서브픽셀 위치가 관심있는 픽셀 위치의 1 픽셀 길이 내에 있을 때

가 카운트된다. 또한,

가 서브픽셀 위치를 가리키는 경우, 4 개의 인접한 픽셀들의 가중 평균

는 현재의 서브픽셀 위치에 대한 레퍼런스들의 총 수이다.

도 유사하게 정의할 수 있다.[00125]

Is the total number of times the referenced pixel of reference frame 1 is referenced by any pixel in the current collocated frame. When the aforementioned subpixel interpolation exists, when the reference subpixel position is within 1 pixel length of the pixel position of interest.

Is counted. Also,

Is a weighted average of four adjacent pixels,

Is the total number of references to the current subpixel location.

Similarly, it can be defined.

[00126] 따라서, 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스 프레임을 사용하여 제1 레퍼런스 프레임에서 폐색이 검출될 수 있다. 그 다음, 워핑된 레퍼런스 프레임들의 블렌딩은 제2 워핑된 레퍼런스 프레임으로부터의 픽셀 값들로 폐색에 대응하는 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 것(populating)을 포함할 수 있다. 유사하게, 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스 프레임을 사용하여 제2 레퍼런스 프레임에서 폐색이 검출될 수 있다. 그 다음, 워핑된 레퍼런스 프레임들의 블렌딩은 제1 워핑된 레퍼런스 프레임으로부터의 픽셀 값들로 폐색에 대응하는 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 것을 포함할 수 있다.[00126] Thus, occlusion may be detected in the first reference frame using the first warped reference frame and the second warped reference frame. The blending of the warped reference frames may then include populating pixel positions of the lightflow reference frame corresponding to the occlusion with pixel values from the second warped reference frame. Similarly, occlusion may be detected in the second reference frame using the first warped reference frame and the second warped reference frame. The blending of the warped reference frames may then include filling pixel locations of the lightflow reference frame corresponding to the occlusion with pixel values from the first warped reference frame.

[00127] 프로세스(1100)는 실질적인 압축 성능 이득들을 제공한다는 것이 실험적으로 제시된다. 이들 성능 이득들은 저해상도 프레임들의 세트의 경우 PSNR에서 2.5 % 및 SSIM에서 3.3 %, 및 중해상도 프레임들의 세트의 경우 PSNR에서 3.1 % 및 SSIM에서 4.0 %의 이득들을 포함한다. 하지만, 및 전술한 바와 같이, 라그랑지 함수(1)에 따라 수행되는 광흐름 추정은 프레임의 모든 N 픽셀들에 대한 모션 필드의 수평 성분 u 및 수직 성분 v를 풀기 위해

선형 방정식들을 이용한다. 다시 말해서, 광흐름 추정의 계산 복잡도는 프레임 크기의 다항식 함수이며, 이는 디코더의 복잡도에 부담을 준다. 따라서, 서브프레임 기반(예를 들면, 블록 기반)의 광흐름 추정이 다음에 설명되는데, 이는 도 11과 관련하여 설명된 프레임 기반 광흐름 추정보다 디코더의 복잡도를 저감시킬 수 있다.It is experimentally suggested that process 1100 provides substantial compression performance gains. These performance gains include gains of 2.5% in PSNR and 3.3% in SSIM for a set of low resolution frames, and 3.1% in PSNR and 4.0% in SSIM for a set of medium resolution frames. However, and as described above, the light flow estimation performed according to the Lagrange function 1 is used to solve the horizontal component u and vertical component v of the motion field for all N pixels of the frame.

Use linear equations. In other words, the computational complexity of the light flow estimation is a polynomial function of the frame size, which burdens the complexity of the decoder. Accordingly, subframe based (eg, block based) light flow estimation is described next, which may reduce the complexity of the decoder than the frame based light flow estimation described in connection with FIG. 11.

[00128] 도 12는 광흐름 레퍼런스 프레임 부분을 생성하기 위한 방법 또는 프로세스(1200)의 흐름도이다. 이 예에서, 광흐름 레퍼런스 프레임 부분은 전체 레퍼런스 프레임보다 더 작다. 이 예에서 병치된 프레임 부분들은 블록을 참조하여 설명되지만, 다른 프레임 부분들은 도 12에 따라 처리될 수 있다. 프로세스(1200)는 프로세스(1000)의 스텝 1006을 구현할 수 있다. 프로세스(1200)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)과 같은 컴퓨팅 디바이스들에 의해 실행될 수 있는 소프트웨어 프로그램으로 구현될 수 있다. 예를 들면, 소프트웨어 프로그램은 메모리(204) 또는 2 차 스토리지(214)와 같은 메모리에 저장될 수 있고 CPU(202)와 같은 프로세서에 의해 실행될 때 컴퓨팅 디바이스가 프로세스(1200)를 수행하게 할 수 있는 기계 판독 가능 명령들을 포함할 수 있다. 프로세스(1200)는 특수 하드웨어 또는 펌웨어를 사용하여 구현될 수 있다. 전술한 바와 같이, 다수의 프로세서들, 메모리들, 또는 양자 모두가 사용될 수도 있다.[00128] 12 is a flowchart of a method or process 1200 for generating a lightflow reference frame portion. In this example, the light flow reference frame portion is smaller than the entire reference frame. In this example the juxtaposed frame portions are described with reference to the block, but other frame portions can be processed according to FIG. 12. Process 1200 may implement step 1006 of process 1000. Process 1200 may be implemented as a software program that may be executed by computing devices such as, for example, transmitting station 102 or receiving station 106. For example, a software program may be stored in a memory such as memory 204 or secondary storage 214 and may cause the computing device to perform process 1200 when executed by a processor such as CPU 202. Machine-readable instructions. Process 1200 may be implemented using special hardware or firmware. As noted above, multiple processors, memories, or both may be used.

[00129] 1202에서, 현재 프레임 내의 모든 픽셀들에는 초기화된 모션 벡터가 할당된다. 이들은 레퍼런스 프레임들 사이의 모션 길이들을 단축시키기 위해 제1 처리 레벨을 위해 레퍼런스 프레임들을 현재 프레임으로 워핑하는데 이용될 수 있는 초기 모션 필드들을 정의한다. 1202에서의 초기화는 1102에서의 초기화와 관련하여 설명된 것과 동일한 처리를 이용하여 수행될 수 있으며, 그래서 여기서는 설명이 반복되지 않는다.[00129] At 1202, all pixels in the current frame are assigned an initialized motion vector. These define initial motion fields that can be used to warp reference frames to the current frame for the first processing level to shorten the motion lengths between the reference frames. Initialization at 1202 may be performed using the same processing as described with respect to initialization at 1102, so the description is not repeated here.

[00130] 1204에서, 레퍼런스 프레임들 ― 예컨대, 레퍼런스 프레임 1 및 레퍼런스 프레임 2 ― 은 1202에서 초기화된 모션 필드에 따라 현재 프레임으로 워핑된다. 1204에서의 워핑은 바람직하게는, 1202에서 초기화된 모션 필드 mv_cur가 레퍼런스 프레임들을 워핑하기 전에 그 풀 해상도 값으로부터 다운스케일링되지 않는 것을 제외하고는, 1106에서의 워핑과 관련하여 설명된 것과 동일한 처리를 이용하여 수행될 수 있다.At 1204, reference frames—eg, reference frame 1 and reference frame 2 — are warped to the current frame according to the motion field initialized at 1202. The warping at 1204 is preferably the same process as described with respect to warping at 1106, except that the motion field mv _cur initialized at 1202 is not downscaled from its full resolution value before warping the reference frames. It can be performed using.

[00131] 1204에서의 계산의 말미에는, 풀 해상도의 2 개의 워핑된 레퍼런스 프레임들이 존재한다. 프로세스(1100)와 같이, 프로세스(1200)는 도 13과 관련하여 설명된 것과 유사한 멀티-레벨 프로세스를 사용하여 2 개의 레퍼런스 프레임들 사이의 모션 필드를 추정할 수 있다. 대략적으로 말하면, 프로세스(1200)는 모든 레벨이 고려될 때까지 레벨에 대한 도함수들을 계산하고, 도함수들을 사용하여 광흐름 추정을 수행하며, 결과적인 모션 필드를 다음 레벨을 위해 업스케일링한다.[00131] At the end of the calculation at 1204, there are two warped reference frames in full resolution. Like process 1100, process 1200 may estimate a motion field between two reference frames using a multi-level process similar to that described with respect to FIG. 13. Roughly speaking, process 1200 calculates the derivatives for the level until all levels are taken into account, performs the light flow estimation using the derivatives, and upscales the resulting motion field for the next level.

[00132] 보다 구체적으로, 현재(또는 제1) 처리 레벨에서 블록에 대한 모션 필드

이 1206에서 초기화된다. 블록은 현재 프레임의 스캔 순서(예를 들면, 래스터 스캔 순서)에서 선택된 현재 프레임의 블록일 수 있다. 블록에 대한 모션 필드

은 블록의 각각의 픽셀들에 대한 모션 필드를 포함한다. 즉, 1206에서, 현재의 블록 내의 모든 픽셀들에는 초기화된 모션 벡터가 할당된다. 초기화된 모션 벡터들은 레퍼런스 프레임들의 레퍼런스 블록들 사이의 길이들을 단축시키기 위해 레퍼런스 블록들을 현재 블록으로 워핑하는데 사용된다.More specifically, the motion field for the block at the current (or first) processing level

This is initialized at 1206. The block may be a block of the current frame selected in the scan order (eg, raster scan order) of the current frame. Motion Fields for Blocks

Contains a motion field for each pixel of the block. That is, at 1206, all pixels in the current block are assigned an initialized motion vector. The initialized motion vectors are used to warp the reference blocks to the current block to shorten the lengths between the reference blocks of the reference frames.

[00133] 1206에서, 모션 필드

은 그 풀 해상도 값으로부터 레벨의 해상도로 다운스케일링된다. 다시 말하면, 1206에서의 초기화는 1202에서 초기화된 풀 해상도 값으로부터 블록의 각각의 픽셀들에 대한 모션 필드를 다운스케일링하는 것을 포함할 수 있다. 다운스케일링은 전술한 다운스케일링과 같은, 임의의 기술을 사용하여 수행될 수 있다.[00133] At 1206, a motion field

Is downscaled from its full resolution value to the resolution of the level. In other words, the initialization at 1206 can include downscaling the motion field for each pixel of the block from the full resolution value initialized at 1202. Downscaling may be performed using any technique, such as the downscaling described above.

[00134] 1208에서, 워핑된 레퍼런스 프레임들 각각의 모션 필드에 대응하는 병치된 레퍼런스 블록들이 현재 블록으로 워핑된다. 레퍼런스 블록들의 워핑은 1106에서의 프로세스(1100)와 유사하게 수행된다. 즉, 레퍼런스 프레임 1의 레퍼런스 블록의 픽셀들의 광흐름

를 알면, 워핑을 위한 모션 필드는 다음과 같이 (예를 들면, 모션은 시간 경과에 따라 선형으로 투영된다는) 선형 투영 가정에 의해 추론된다:At 1208, the collocated reference blocks corresponding to the motion field of each of the warped reference frames are warped into the current block. Warping of the reference blocks is performed similarly to process 1100 at 1106. That is, the light flow of pixels of the reference block of reference frame 1

Knowing that, the motion field for warping is inferred by the linear projection assumption (e.g., the motion is projected linearly over time) as follows:

[00135]

[00136] 워핑을 수행하기 위해, 모션 필드

의 수평 성분

및 수직 성분

은 Y 성분에 대해서는 1/8 픽셀 정밀도로 및 U 및 V 성분에 대해서는 1/16 픽셀 정밀도로 라운딩될 수 있다. 다른 값들도 사용될 수 있다. 라운딩 후, 워핑된 블록의 각각의 픽셀, 예를 들면

은 모션 벡터 mv_r1에 의해 주어진 참조된 픽셀로 계산된다. 종래의 서브픽셀 보간 필터를 사용하여 서브픽셀 보간이 수행될 수 있다.[00136] To perform warping, a motion field

Horizontal component of

And vertical components

Can be rounded to 1/8 pixel precision for the Y component and 1/16 pixel precision for the U and V components. Other values may be used. After rounding, each pixel of the warped block, for example

Is computed with the referenced pixel given by the motion vector mv _r1 . Subpixel interpolation may be performed using conventional subpixel interpolation filters.

[00137] 레퍼런스 프레임 2의 레퍼런스 블록에 대해서도 동일한 워핑 접근법이 수행되어 워핑된 블록, 예를 들면

가 얻어지는데, 여기서 모션 필드는 다음에 의해 계산된다:[00137] The same warping approach is also performed on the reference block of reference frame 2 to warped blocks, for example

Is obtained, where the motion field is calculated by:

[00138]

[00139] 1208에서의 계산의 말미에, 2 개의 워핑된 레퍼런스 블록들이 존재한다. 2 개의 워핑된 레퍼런스 블록들은 1210에서 이들 사이의 모션 필드를 추정하는 데 사용된다. 1210에서의 처리는 도 11의 1108에서의 처리와 관련하여 설명된 것과 유사할 수 있다.[00139] At the end of the calculation at 1208, there are two warped reference blocks. Two warped reference blocks are used at 1210 to estimate the motion field between them. The processing at 1210 may be similar to that described with respect to the processing at 1108 of FIG. 11.

[00140] 보다 구체적으로, 2 개의 워핑된 레퍼런스 블록들은 풀 해상도일 수 있다. 도 13의 피라미드 구조에 따르면, 도함수들 E_x, E_y, 및 E_t가 함수들 (3), (4), 및 (5)를 사용하여 계산된다. 프레임 레벨 추정을 위한 도함수들을 계산할 때, 프로세스(1100)와 관련하여 설명된 바와 같이 경계를 벗어난 픽셀 값들을 획득하기 위해 가장 가까운 이용 가능한 픽셀을 복사함으로써 프레임 경계부들이 확장될 수 있다. 하지만, 다른 프레임 부분들의 경우, 1204에서 워핑된 레퍼런스 프레임들에서 인접한 픽셀들이 종종 이용 가능하다. 예를 들어, 블록 기반 추정의 경우, 블록 자체가 프레임 경계부에 있지 않으면 워핑된 레퍼런스 프레임들에서 인접 블록들의 픽셀들이 이용 가능하다. 따라서, 워핑된 레퍼런스 프레임 부분에 대해 경계들을 벗어난 픽셀들의 경우에, 워핑된 레퍼런스 프레임의 인접 부분들의 픽셀들은, 해당되는 경우 픽셀 값들

및/또는

로 사용될 수 있다. 투영된 픽셀들이 프레임 경계부들의 외부에 있는 경우, 가장 가까운 이용 가능한(즉, 경계들 내의) 픽셀을 복사하는 것이 여전히 사용될 수 있다. 도함수들이 계산된 후, 이들은 현재 레벨로 다운스케일링될 수 있다. 각각의 레벨 l에서 다운스케일링된 도함수들은 앞서 논의된 바와 같이, 2¹ x 2¹ 블록 내에서 평균화함으로써 계산될 수 있다. 도함수들을 계산하고 평균화하는 두 가지 선형 연산들을 단일의 선형 필터로 결합함으로써 계산들의 복잡도를 낮출 수 있지만, 필수는 아니다.More specifically, the two warped reference blocks may be full resolution. According to the pyramid structure of FIG. 13, the derivatives E _x , E _y , and E _t are calculated using the functions (3), (4), and (5). When calculating derivatives for frame level estimation, frame boundaries can be extended by copying the nearest available pixel to obtain out of bounds pixel values as described in connection with process 1100. However, for other frame portions, adjacent pixels in the warped reference frames at 1204 are often available. For example, for block-based estimation, pixels of adjacent blocks are available in warped reference frames unless the block itself is at the frame boundary. Thus, in the case of pixels that are out of bounds for the warped reference frame portion, the pixels of adjacent portions of the warped reference frame are pixel values, if applicable.

And / or

Can be used as If the projected pixels are outside of the frame boundaries, copying the nearest available (ie within the boundaries) pixel can still be used. After the derivatives are calculated, they can be downscaled to the current level. The downscaled derivatives at each level l can be calculated by averaging within 2 ¹ x 2 ¹ blocks, as discussed above. The complexity of the calculations can be reduced, but not required, by combining two linear operations that calculate and average the derivatives into a single linear filter.

[00141] 1210에서의 처리를 계속하면, 다운스케일링된 도함수들은 워핑된 레퍼런스 부분들 사이의 모션 필드을 추정하기 위한 광흐름 추정을 수행하게 위해 라그랑지 함수(1)의 입력들로서 사용될 수 있다. 수평 성분 u 및 수직 성분 v에 대한 라그랑지 함수(1)의 도함수들을 0으로 설정하고(즉,

및

),

개의 선형 방정식들을 푸는 것에 의해, 부분, 여기서는 블록의 모든 N 픽셀들에 대해 모션 필드의 수평 성분 u 및 수직 성분 v가 결정될 수 있다. 이를 위해, 경계를 벗어난 모션 벡터들을 다루는 두 가지 선택적 방법이 있다. 하나의 방법은 인접한 블록들과 제로 상관(zero correlation)을 가정하고, 경계를 벗어난 모션 벡터가 경계를 벗어난 픽셀 위치에 가장 가까운 경계부 위치에 있는 모션 벡터와 동일하다고 가정하는 것이다. 다른 방법은 현재 픽셀에 대해 초기화된 모션 벡터(즉, 1206에서 초기화된 모션 필드)를 현재 픽셀에 대응하는 경계를 벗어난 픽셀 위치에 대한 모션 벡터로 사용하는 것이다.Continuing with the processing at 1210, the downscaled derivatives can be used as inputs of the Lagrange function 1 to perform light flow estimation to estimate the motion field between the warped reference portions. Set the derivatives of the Lagrange function 1 for the horizontal component u and the vertical component v to 0 (i.e.

And

),

By solving the linear equations, the horizontal component u and the vertical component v of the motion field can be determined for a portion, here all N pixels. To do this, there are two alternative ways of dealing with out of bounds motion vectors. One method is to assume zero correlation with adjacent blocks and to assume that the out of bounds motion vector is the same as the motion vector at the border position closest to the out of bound pixel position. Another method is to use the motion vector initialized for the current pixel (ie, the motion field initialized at 1206) as the motion vector for the off-boundary pixel position corresponding to the current pixel.

[00142] 모션 필드가 추정되고 난 후에, 레벨에 대한 현재 모션 필드가 워핑된 레퍼런스 블록들 사이의 추정 모션 필드를 사용하여 업데이트되거나 개선되어, 1210에서의 처리를 완료한다. 예를 들면, 픽셀에 대한 현재 모션 필드는 픽셀별로 각각의 픽셀에 대해 추정된 모션 필드를 추가함으로써 업데이트될 수 있다.[00142] After the motion field is estimated, the current motion field for the level is updated or improved using the estimated motion field between the warped reference blocks to complete the processing at 1210. For example, the current motion field for a pixel can be updated by adding an estimated motion field for each pixel on a pixel-by-pixel basis.

[00143] 프로세스(1100)에서, 각각의 레벨에서 모션 필드가 라그랑지 파라미터 λ에 대해 점점 더 작은 값들을 사용하여 추정되고 개선되도록, 라그랑지 파라미터 λ에 대해 감소하는 값들(decreasing values)을 설정하기 위해 추가 루프가 포함된다. 프로세스(1200)에서, 이 루프는 생략된다. 즉, 도시된 바와 같은 프로세스(1200)에서는, 라그랑지 파라미터 λ에 대해 단 하나의 값만이 현재 처리 레벨에서 모션 필드를 추정하는데 사용된다. 이는 25와 같이, 비교적 작은 값일 수 있다. 예를 들면, 모션의 평활도, 이미지 해상도, 또는 다른 변수들에 따라, 라그랑지 파라미터 λ에 대해 다른 값들도 가능하다.[00143] In process 1100, an additional loop to set decreasing values for Lagrange parameter λ such that at each level the motion field is estimated and improved using smaller and smaller values for Lagrange parameter λ. Included. In process 1200, this loop is omitted. That is, in the process 1200 as shown, only one value for the Lagrange parameter [lambda] is used to estimate the motion field at the current processing level. This may be a relatively small value, such as 25. For example, other values are possible for the Lagrange parameter λ, depending on the smoothness of the motion, image resolution, or other variables.

[00144] 다른 구현예들에서, 프로세스(1200)는 라그랑지 파라미터 λ를 변경하기 위한 추가 루프를 포함할 수 있다. 이러한 루프가 포함되는 구현예에서, 프로세스(1100)에서 1104 및 1110에서의 처리와 관련하여 설명된 바와 같이, 1208에서 레퍼런스 블록들을 워핑하고 1210에서 모션 필드를 추정 및 업데이트하는 것이 라그랑지 파라미터 λ에 대한 모든 값들이 사용될 때까지 반복되도록, 라그랑지 파라미터 λ는 1210에서 모션 필드를 추정하기 전에 설정될 수 있다.[00144] In other implementations, process 1200 can include an additional loop to change the Lagrange parameter λ. In implementations in which such a loop is included, warping the reference blocks at 1208 and estimating and updating the motion field at 1210, as described with respect to the processing at 1104 and 1110 in process 1100, is applied to the Lagrange parameter λ. The Lagrange parameter λ may be set before estimating the motion field at 1210 so that all values for are repeated until used.

[00145] 프로세스(1200)는 1210에서 모션 필드를 추정 및 업데이트한 후에 1212의 질의로 진행한다. 이는 라그랑지 파라미터 λ에 단일 값이 사용될 때 1210의 레벨에서 첫 번째이면서 유일한 모션 필드 추정 및 업데이트의 후에 수행된다. 라그랑지 파라미터 λ에 대한 다수의 값들이 처리 레벨에서 수정될 때, 프로세스(1200)는 라그랑지 파라미터 λ의 최종 값을 사용하여 1210에서 모션 필드를 추정 및 업데이트한 후에 1212의 질의로 진행한다.[00145] Process 1200 proceeds to query 1212 after estimating and updating the motion field at 1210. This is done after the first and only motion field estimation and update at the level of 1210 when a single value is used for the Lagrange parameter λ. When multiple values for Lagrange parameter λ are modified at the processing level, process 1200 proceeds to query 1212 after estimating and updating the motion field at 1210 using the final value of Lagrange parameter λ.

[00146] 1212에서 질의에 응답하여 추가 처리 레벨들이 있는 경우, 프로세스(1200)는 1214로 진행하며, 여기서 모션 필드는 1206에서 시작하여 다음 층을 처리하기 전에 업스케일링된다. 업스케일링은 임의의 알려진 기술에 따라 수행될 수 있다.[00146] If there are additional processing levels in response to the query at 1212, process 1200 proceeds to 1214, where the motion field starts up at 1206 and is upscaled before processing the next layer. Upscaling can be performed according to any known technique.

[00147] 일반적으로, 광흐름은 먼저 피라미드의 최고 레벨에서 모션 필드을 획득하기 위해 추정된다. 그 후, 모션 필드는 업스케일링되어 다음 레벨에서 광흐름 추정을 초기화하는데 사용된다. 모션 필드를 업스케일링하고, 이를 사용하여 다음 레벨의 광흐름 추정을 초기화하며, 모션 필드를 획득하는 이 프로세스는 1212에서 피라미드의 최하위 레벨에 도달할 때까지(즉, 풀 스케일로 계산된 도함수들에 대해 광흐름 추정이 완료될 때까지) 계속된다.[00147] In general, the light flow is first estimated to obtain a motion field at the highest level of the pyramid. The motion field is then upscaled and used to initialize the light flow estimate at the next level. This process of upscaling the motion field, using it to initialize the next level of light flow estimation, and obtaining the motion field is performed at 1212 until the lowest level of the pyramid is reached (i.e., full-scaled derivatives). Until the light flow estimation is completed.

[00148] 일단 레벨이 레퍼런스 프레임들이 다운스케일링되지 않는 레벨(즉, 레퍼런스 프레임들이 그 원래 해상도에 있음)에 도달하면, 프로세스(1200)는 1216으로 진행한다. 예를 들면, 레벨들의 수는 도 13의 예에서와 같은, 3 개일 수 있다. 1216에서, 워핑된 레퍼런스 블록들은 블렌딩되어 광흐름 레퍼런스 블록(예를 들면, 전술한 바와 같이

)을 형성한다. 1216에서 블렌딩된 워핑된 레퍼런스 블록들은 1210에서 추정된 모션 필드를 사용하여 1208에 기술된 프로세스에 따라 다시 워핑되는 풀 스케일 레퍼런스 블록들일 수 있음에 유의하자. 다시 말하면, 풀 스케일의 레퍼런스 블록들은 2 회 ― 이전 처리 층으로부터의 최초의 업스케일링된 모션 필드를 사용하여 한 번 및 모션 필드가 풀 스케일 레벨로 개선된 후에 다시 ― 워핑될 수 있다. 블렌딩은 1116에서 설명된 처리와 유사하게 시간 선형성 가정을 이용하여 수행될 수 있다. 1116에서 설명되고 도 14에 예로서 도시된 바와 같은 선택적인 폐색 검출은 1216에 블렌딩의 일부로서 통합될 수도 있다.[00148] Once the level reaches a level where the reference frames are not downscaled (ie, the reference frames are at their original resolution), the process 1200 proceeds to 1216. For example, the number of levels can be three, such as in the example of FIG. 13. At 1216, the warped reference blocks are blended to form a lightflow reference block (eg, as described above).

). Note that the warped reference blocks blended at 1216 may be full scale reference blocks warped again according to the process described at 1208 using the motion field estimated at 1210. In other words, the full scale reference blocks can be warped twice-once using the first upscaled motion field from the previous processing layer and again after the motion field has been improved to the full scale level. Blending can be performed using the assumption of linearity of time similar to the process described at 1116. Selective occlusion detection, as described at 1116 and shown as an example in FIG. 14, may be incorporated as part of the blending at 1216.

[00149] 1216에서 병치된 레퍼런스 블록이 생성되고 난 후에, 프로세스(1200)는 1218로 진행하여 예측을 위한 추가 프레임 부분들(여기서는, 블록들)이 있는지 여부를 결정한다. 있는 경우, 프로세스(1200)는 다음 블록에 대해 1206에서 시작하여 반복한다. 블록들은 스캔 순서로 처리될 수 있다. 1218에서의 질의에 응답하여 고려해야 할 블록이 더 이상 없으면, 프로세스(1200)는 종료된다.[00149] After the collocated reference block is generated at 1216, process 1200 proceeds to 1218 to determine whether there are additional frame portions (here, blocks) for prediction. If so, process 1200 repeats, beginning at 1206 for the next block. Blocks may be processed in scan order. If there are no more blocks to consider in response to the query at 1218, process 1200 ends.

[00150] 도 10을 다시 참조하면, 프로세스(1200)는 프로세스(1000)에서 1006을 구현할 수 있다. 1006에서의 처리의 말미에는, 프로세스(1100), 프로세스(1200), 또는 본 명세서에 기재된 바와 같은 이들의 변형예에 따라 수행되는지에 관계없이, 하나 이상의 워핑된 레퍼런스 프레임 부분들이 존재한다.[00150] Referring back to FIG. 10, process 1200 may implement 1006 in process 1000. At the end of the processing at 1006, there are one or more warped reference frame portions, whether performed in accordance with process 1100, process 1200, or variations thereof as described herein.

[00151] 1008에서는, 1006에서 생성된 광흐름 레퍼런스 프레임 부분을 사용하여 예측 프로세스가 수행된다. 인코더에서 예측 프로세스를 수행하는 것은 프레임의 현재 블록에 대한 광흐름 레퍼런스 프레임으로부터 예측 블록을 생성하는 것을 포함할 수 있다. 광흐름 레퍼런스 프레임은 프로세스(1100)에 의해 출력되고 레퍼런스 프레임 버퍼(600)와 같은 레퍼런스 프레임 버퍼에 저장되는 광흐름 레퍼런스 프레임일 수 있다. 광흐름 레퍼런스 프레임은 프로세스(1200)에 의해 출력된 광흐름 레퍼런스 부분들을 결합함으로써 생성된 광흐름 레퍼런스 프레임일 수 있다. 광흐름 레퍼런스 부분들을 결합하는 것은 광흐름 레퍼런스 부분들 각각의 생성에 사용되는 각각의 현재 프레임 부분들의 픽셀 위치들에 따라 광흐름 레퍼런스 부분들(예를 들면, 병치된 레퍼런스 블록들)을 배열하는 것을 포함할 수 있다. 결과적인 광흐름 레퍼런스 프레임은 인코더(400)의 레퍼런스 프레임 버퍼(600)와 같은, 인코더의 레퍼런스 프레임 버퍼에 사용하기 위해 저장될 수 있다.[00151] At 1008, the prediction process is performed using the portion of the lightflow reference frame generated at 1006. Performing the prediction process at the encoder may include generating the predictive block from the lightflow reference frame for the current block of the frame. The light flow reference frame may be a light flow reference frame output by the process 1100 and stored in a reference frame buffer such as the reference frame buffer 600. The light flow reference frame may be a light flow reference frame generated by combining the light flow reference portions output by the process 1200. Combining the light flow reference portions may be to arrange the light flow reference portions (eg, collocated reference blocks) according to the pixel positions of each current frame portions used to generate each of the light flow reference portions. It may include. The resulting lightflow reference frame may be stored for use in the encoder's reference frame buffer, such as the reference frame buffer 600 of the encoder 400.

[00152] 인코더에서 예측 블록을 생성하는 것은 광흐름 레퍼런스 프레임에서 병치된 블록을 예측 블록으로서 선택하는 것을 포함할 수 있다. 인코더에서 예측 블록을 생성하는 것은 대신에 현재 블록에 대한 가장 잘 매칭되는 예측 블록을 선택하기 위해 광흐름 레퍼런스 프레임 내에서 모션 검색을 수행하는 것을 포함할 수 있다. 하지만, 예측 블록은 인코더에서 생성되며, 결과적인 잔차는 예컨대 도 4의 인코더(400)와 관련하여 설명된 손실성 인코딩 프로세스(lossy encoding process)를 사용하여 추가로 처리될 수 있다.[00152] Generating the predictive block at the encoder may include selecting the collocated block in the lightflow reference frame as the predictive block. Generating the predictive block at the encoder may instead include performing a motion search within the lightflow reference frame to select the best matching predictive block for the current block. However, the predictive block is generated at the encoder, and the resulting residual can be further processed using, for example, the lossy encoding process described with respect to encoder 400 of FIG.

[00153] 인코더에서, 프로세스(1000)는 하나 이상의 인트라 예측 모드들 및 현재 프레임에 대해 이용 가능한 예측 프레임들을 사용하는 단일 및 복합 인터 예측 모드들 둘 모두를 포함하는, 다양한 예측 모드들을 사용하는 현재 블록에 대한 레이트 왜곡 루프(rate distortion loop)의 일부를 형성할 수 있다. 단일 인터 예측 모드는 인터 예측을 위해 단일의 순방향 또는 역방향 레퍼런스 프레임만을 사용한다. 복합 인터 예측 모드는 인터 예측을 위해 순방향 및 역방향 레퍼런스 프레임 양자 모두를 사용한다. 레이트 왜곡 루프에서는, 각각의 예측 모드들을 사용하여 현재 블록을 인코딩하는데 사용되는 레이트(예를 들면, 비트 수)가 인코딩으로 인한 왜곡과 비교된다. 왜곡은 인코딩 전과 디코딩 후의 블록의 픽셀 값들 사이의 차이들로 계산될 수 있다. 차이들은 절대 차이들의 합 또는 프레임들의 블록들에 대한 누적 오류를 캡처하는 다른 측정치일 수 있다.[00153] At the encoder, process 1000 includes a rate for the current block using various prediction modes, including both single and complex inter prediction modes using one or more intra prediction modes and prediction frames available for the current frame. It can form part of a rate distortion loop. The single inter prediction mode uses only a single forward or backward reference frame for inter prediction. Complex inter prediction mode uses both forward and backward reference frames for inter prediction. In the rate distortion loop, the rate (eg, number of bits) used to encode the current block using the respective prediction modes is compared with the distortion due to the encoding. The distortion can be calculated with the differences between the pixel values of the block before encoding and after decoding. The differences can be another measure that captures the sum of absolute differences or cumulative error for blocks of frames.

[00154] 몇몇 구현예들에서는, 광흐름 레퍼런스 프레임의 사용을 단일 인터 예측 모드로 제한하는 것이 바람직할 수 있다. 즉, 광흐름 레퍼런스 프레임은 모든 복합 레퍼런스 모드에서는 레퍼런스 프레임으로서 제외될 수 있다. 이는 레이트 왜곡 루프를 단순화할 수 있고, 광흐름 레퍼런스 프레임은 이미 순방향 및 역방향 레퍼런스 프레임 양자 모두를 고려하기 때문에 블록의 인코딩에 대한 추가적인 영향은 거의 없을 것으로 예상된다. 본 명세서에 기재된 구현예에 따르면, 광흐름 레퍼런스 프레임이 현재 프레임을 인코딩하는데 사용하는데 이용 가능한지 여부를 나타내기 위해 플래그가 비트스트림으로 인코딩될 수 있다. 플래그는 일례에서 현재 프레임 내의 임의의 단일 블록이 광흐름 레퍼런스 프레임 블록을 사용하여 인코딩될 때 인코딩될 수 있다. 광흐름 레퍼런스 프레임이 현재 프레임에 이용 가능한 경우, 현재 블록이 광흐름 레퍼런스 프레임을 사용한 인터 예측에 의해 인코딩되었는지 여부를 나타내는 추가 플래그 또는 다른 표시기(예를 들면, 블록 레벨에서)를 포함할 수 있다.[00154] In some implementations, it may be desirable to limit the use of lightflow reference frames to a single inter prediction mode. That is, the light flow reference frame may be excluded as a reference frame in all composite reference modes. This can simplify the rate distortion loop, and since the optical flow reference frame already considers both the forward and reverse reference frames, it is expected that there will be little additional impact on the encoding of the block. According to the implementations described herein, a flag may be encoded into the bitstream to indicate whether a lightflow reference frame is available for use in encoding the current frame. The flag may in one example be encoded when any single block in the current frame is encoded using the lightflow reference frame block. If the lightflow reference frame is available for the current frame, it may include an additional flag or other indicator (eg, at the block level) indicating whether the current block has been encoded by inter prediction using the lightflow reference frame.

[00155] 1008에서의 예측 프로세스는 현재 프레임이 인코딩될 때까지 현재 프레임의 모든 블록들에 대해 반복될 수 있다.[00155] The prediction process at 1008 may be repeated for all blocks of the current frame until the current frame is encoded.

[00156] 디코더에서, 1008에서 광흐름 레퍼런스 프레임 부분을 사용하여 예측 프로세스를 수행하는 것은 광흐름 레퍼런스 프레임이 현재 프레임을 디코딩하는데 이용 가능하다는 결정에 기인할 수 있다. 몇몇 구현예들에서, 결정은 현재 프레임의 적어도 하나의 블록이 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩되었음을 나타내는 플래그를 검사함으로써 이루어진다. 디코더에서 1008에서 예측 프로세스를 수행하는 것은 예측 블록을 생성하는 것을 포함할 수 있다. 예측 블록을 생성하는 것은 블록 헤더에서와 같이, 인코딩된 비트스트림으로부터 디코딩된 인터 예측 모드를 사용하는 것을 포함할 수 있다. 인터 예측 모드를 결정하기 위해 플래그 또는 표시기가 디코딩될 수 있다. 인터 예측 모드가 광흐름 레퍼런스 프레임 모드인 경우(즉, 블록이 광흐름 레퍼런스 프레임 부분을 사용하여 인터 예측된 경우), 디코딩될 현재 블록에 대한 예측 블록은 광흐름 레퍼런스 프레임 부분의 픽셀들 및 모션 벡터 모드 및/또는 모션 벡터를 사용하여 생성된다.[00156] At the decoder, performing the prediction process using the lightflow reference frame portion at 1008 may be due to the determination that the lightflow reference frame is available for decoding the current frame. In some implementations, the decision is made by examining a flag indicating that at least one block of the current frame has been encoded using the lightflow reference frame portion. Performing the prediction process at 1008 at the decoder may include generating a prediction block. Generating the predictive block may include using an inter prediction mode decoded from the encoded bitstream, as in the block header. A flag or indicator can be decoded to determine the inter prediction mode. If the inter prediction mode is the lightflow reference frame mode (ie, the block is inter predicted using the lightflow reference frame portion), the prediction block for the current block to be decoded is the pixels and motion vectors of the lightflow reference frame portion. Generated using mode and / or motion vectors.

[00157] 디코딩의 일부로서 예측 프로세스에 사용하기 위한 광흐름 레퍼런스 프레임을 생성하기 위한 동일한 처리가 인코더에서 수행된 것과 같이, 디코더(500)와 같은 디코더에서 수행될 수 있다. 예를 들면, 플래그가 현재 프레임의 적어도 하나의 블록이 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩되었다는 것을 나타낼 때, 전체 광흐름 레퍼런스 프레임이 생성되어 예측 프로세스에 사용하기 위해 저장될 수 있다. 하지만, 코딩 블록들이 인터 예측 레퍼런스 프레임으로서 병치/광흐름 레퍼런스 프레임을 사용하는 것으로 식별되는 프로세스(1200)의 성능을 제한하도록 프로세스(1200)를 수정함으로써, 디코더에서의 컴퓨테이션 파워가 추가적으로 절약된다. 이는 디코더를 최적화하기 위한 하나의 기술을 예시하는 도면인 도 15를 참조하여 설명될 수 있다.[00157] The same processing for generating a lightflow reference frame for use in the prediction process as part of the decoding may be performed at a decoder such as decoder 500, as performed at the encoder. For example, when the flag indicates that at least one block of the current frame has been encoded using the lightflow reference frame portion, the entire lightflow reference frame may be generated and stored for use in the prediction process. However, by modifying the process 1200 to limit the performance of the process 1200 in which the coding blocks are identified as using a juxtaposition / lightflow reference frame as the inter prediction reference frame, the computation power at the decoder is further saved. This may be described with reference to FIG. 15, which is a diagram illustrating one technique for optimizing a decoder.

[00158] 도 15에는, 픽셀들이 그리드(1500)를 따라 도시되며, w는 그리드(1500)의 제1 축을 따라 픽셀의 위치를 나타내고 y는 그리드(1500)의 제2 축을 따라 픽셀의 위치를 나타낸다. 그리드(1500)는 현재 프레임의 일부의 픽셀 위치들을 나타낸다. 1008에서 디코더에서 예측 프로세스를 수행하기 위해, 1006 및 1008에서의 처리가 결합될 수 있다. 예를 들면, 1006에서 프로세스를 수행하기 전에, 1008에서의 예측 프로세스는 (예를 들면, 모션 벡터와 같은 헤더 정보로부터) 현재 블록을 인코딩하는 데 사용되는 레퍼런스 블록을 찾는 것을 포함할 수 있다. 도 15에서, 현재 코딩 블록(1502)에 대한 모션 벡터는 내부 파선(1504)으로 나타낸 레퍼런스 블록을 가리킨다. 현재 코딩 블록(1502)은 4x4 픽셀들을 포함한다. 레퍼런스 블록이 현재 프레임이 아니라 레퍼런스 프레임에 위치되기 때문에 레퍼런스 블록의 위치는 파선(1504)으로 도시된다.[00158] In FIG. 15, pixels are shown along grid 1500, where w represents the location of the pixel along the first axis of grid 1500 and y represents the location of the pixel along the second axis of grid 1500. Grid 1500 represents pixel locations of a portion of the current frame. The processing at 1006 and 1008 may be combined to perform the prediction process at the decoder at 1008. For example, prior to performing the process at 1006, the prediction process at 1008 may include finding a reference block used to encode the current block (eg, from header information such as a motion vector). In FIG. 15, the motion vector for the current coding block 1502 points to the reference block indicated by the inner dashed line 1504. Current coding block 1502 includes 4x4 pixels. The location of the reference block is shown by dashed line 1504 because the reference block is located in the reference frame rather than the current frame.

[00159] 일단 레퍼런스 블록이 위치되고 나면, 레퍼런스 블록이 걸쳐져 있는(즉, 중복되는) 레퍼런스 블록들 모두가 식별된다. 이는 서브픽셀 보간 필터들을 고려하기 위해 각각의 경계부에서 필터 길이의 절반만큼 레퍼런스 블록의 크기를 확장하는 것을 포함할 수 있다. 도 15에서는, 서브픽셀 보간 필터의 길이(L)가 레퍼런스 블록을 외부 파선(1506)으로 나타낸 경계부들까지 확장하는데 사용된다. 비교적 흔하듯이, 모션 벡터는 전체 펠(full-pel) 위치들과는 완벽하게 정렬되지 않는 레퍼런스 블록을 발생시킨다. 도 15에서 어두운 영역은 전체 펠 위치들을 나타낸다. 전체 펠 위치들과 중복되는 레퍼런스 블록들 모두가 식별된다. 블록 크기들이 현재 코딩 블록(1502)과 동일하다고 가정하면, 현재 블록과 병치된 제1 레퍼런스 블록, 제1 레퍼런스 블록의 위에 있는 제2 레퍼런스 블록, 제1 레퍼런스 블록의 좌측으로부터 연장되는 2 개의 레퍼런스 블록들, 및 제2 레퍼런스 블록의 좌측으로부터 연장되는 2 개의 레퍼런스 블록들이 식별된다.[00159] Once the reference block is located, all of the reference blocks that span the reference block (ie, overlap) are identified. This may include extending the size of the reference block by half the filter length at each boundary to account for subpixel interpolation filters. In FIG. 15, the length L of the subpixel interpolation filter is used to extend the reference block to the boundaries indicated by the outer dashed line 1506. As is relatively common, a motion vector generates a reference block that is not perfectly aligned with full-pel positions. The dark areas in FIG. 15 represent the full pel locations. All reference blocks that overlap with the full pel locations are identified. Assuming that the block sizes are the same as the current coding block 1502, the first reference block collocated with the current block, the second reference block on top of the first reference block, and two reference blocks extending from the left side of the first reference block. And two reference blocks extending from the left side of the second reference block are identified.

[00160] 레퍼런스 블록들이 식별되면, 병치된/광흐름 추정된 레퍼런스 블록들을 생성하기 위해 식별된 레퍼런스 블록들과 병치된 현재 프레임 내의 블록들에 대해서만 프로세스(1200)가 1006에서 수행될 수 있다. 도 15의 예에서는, 이로 인해 6 개의 광흐름 레퍼런스 프레임 부분들이 발생되게 된다.[00160] Once the reference blocks are identified, process 1200 may be performed at 1006 only for blocks in the current frame that are collocated with the identified reference blocks to generate collocated / lightflow estimated reference blocks. In the example of FIG. 15, this results in six light flow reference frame portions.

[00161] 이 수정된 프로세스에 따르면, 인코더와 디코더가 동일한 예측자(predictor)를 가지면서도 디코더는 병치된 레퍼런스 프레임 전체를 계산할 필요가 없는 것이 보장된다. 임의의 확장된 경계들을 포함하는 후속 블록에 대한 레퍼런스 블록(들)은 현재 블록의 디코딩 프로세스 중에 식별된 하나 이상의 레퍼런스 블록들과 중첩될 수 있다는 것이 주목된다. 이 경우에, 디코더에서의 컴퓨팅 요건들을 더욱 저감시키기 위해 식별된 블록들 중 임의의 블록에 대해 광흐름 추정이 단 1 회만 수행될 필요가 있다. 다시 말하면, 1216에서 생성된 레퍼런스 블록은 현재 프레임의 다른 블록들을 디코딩하는데 사용하기 위해 저장될 수 있다.[00161] This modified process ensures that the encoder and decoder have the same predictor while the decoder does not need to calculate the entire collocated reference frame. It is noted that the reference block (s) for subsequent blocks that include any extended boundaries may overlap with one or more reference blocks identified during the decoding process of the current block. In this case, light flow estimation only needs to be performed once for any of the identified blocks to further reduce computing requirements at the decoder. In other words, the reference block generated at 1216 may be stored for use in decoding other blocks of the current frame.

[00162] 하지만, 예측 블록은 디코더에서 생성되며, 인코딩된 비트스트림으로부터의 현재 블록에 대한 디코딩된 잔차는 도 5의 디코더(500)와 관련하여 예로서 설명된 바와 같이 재구성 블록을 형성하기 위해 예측 블록과 결합될 수 있다.[00162] However, the prediction block is generated at the decoder, and the decoded residual for the current block from the encoded bitstream is combined with the prediction block to form a reconstruction block as described by way of example with respect to decoder 500 of FIG. 5. Can be.

[00163] 프로세스(1200) 이후에 또는 프로세스(1200)와 함께 수행되든지 간에, 1008에서의 예측 프로세스는 현재 프레임이 디코딩될 때까지 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩된 현재 프레임의 모든 블록들에 대해 반복될 수 있다. 디코딩 순서로 블록들을 처리할 때, 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩되지 않은 블록은 종래에, 인코딩된 비트스트림으로부터의 블록에 대해 디코딩된 예측 모드에 따라 종래의 방식으로 디코딩될 수 있다.[00163] Whether performed after or in conjunction with process 1200, the prediction process at 1008 may be repeated for all blocks of the current frame encoded using the lightflow reference frame portion until the current frame is decoded. Can be. When processing blocks in decoding order, a block that is not encoded using the lightflow reference frame portion may be conventionally decoded in a conventional manner according to the decoded prediction mode for the block from the encoded bitstream.

[00164] 프레임 또는 블록 내의 N 개의 픽셀들에 대해, 광흐름 공식의 해를 구하기 위한 복잡도는 O(N * M)으로 나타낼 수 있는데, 여기서 M은 선형 방정식들을 풀기 위한 반복들의 횟수이다. M은 레벨들의 개수 또는 라그랑지 파라미터 λ의 값들의 개수와 관련이 없다. 대신에, M은 선형 방정식을 푸는 데 있어서 계산 정밀도와 관련이 있다. M의 값이 클수록 정밀도가 더 양호해진다. 이러한 복잡성을 고려하면, 프레임 레벨로부터 서브프레임 레벨(예를 들면, 블록 기반)의 추정으로 진행하는 것은 디코더 복잡도를 감소시키는 위한 몇 가지 옵션들을 제공한다. 첫째, 및 모션 필드 평활도의 제약이 블록 경계부들에서는 완화되기 때문에, 블록의 선형 방정식들을 풀 때 해답으로 수렴하기가 더 용이하며, 그래서 유사한 정밀도에 대해 M이 더 작아진다. 둘째, 모션 벡터의 해를 구하는 것은 평활도 페널티 팩터로 인해 그 인접한 모션 벡터들을 포함한다. 블록 경계부들에서의 모션 벡터들은 더 적은 수의 인접한 모션 벡터들을 가지며, 그래서 보다 빠른 계산들이 이루어진다. 셋째, 및 위에서 논의된 바와 같이, 광흐름은 전체 프레임이 아니라, 인터 예측을 위해 병치된 레퍼런스 프레임을 사용하는 그 코딩 블록들에 의해 식별되는 병치된 레퍼런스 프레임의 블록들의 일부에 대해서만 계산될 필요가 있다.[00164] For N pixels in a frame or block, the complexity for solving the light flow formula can be represented by O (N * M), where M is the number of iterations to solve the linear equations. M is not related to the number of levels or the number of values of the Lagrange parameter λ. Instead, M relates to computational precision in solving linear equations. The larger the value of M, the better the accuracy. Given this complexity, proceeding from the frame level to the estimation of the subframe level (eg, block-based) offers several options for reducing decoder complexity. First, and because the constraints of motion field smoothness are relaxed at block boundaries, it is easier to converge on the solution when solving the linear equations of the block, so M is smaller for similar precision. Second, solving the motion vector includes its adjacent motion vectors due to the smoothness penalty factor. Motion vectors at block boundaries have fewer contiguous motion vectors, so faster calculations are made. Third, and as discussed above, the light flow need not be calculated for the entire frame, but only for a portion of the blocks of the collocated reference frame identified by those coding blocks using the collocated reference frame for inter prediction. have.

[00165] 설명의 단순화를 위해, 프로세스들(1000, 1100, 및 1200) 각각은 일련의 스텝들 또는 동작들로서 도시 및 기재되어 있다. 하지만, 본 발명에 따른 스텝들 또는 동작들은 다양한 순서들로 및/또는 동시에 발생할 수도 있다. 또한, 본 명세서에 제시 및 기재되지 않은 다른 스텝들 또는 동작들도 사용될 수 있다. 또한, 개시된 주제에 따른 방법을 구현하기 위해 예시된 모든 스텝들 또는 동작들이 필요한 것은 아니다.[00165] For simplicity of explanation, each of the processes 1000, 1100, and 1200 are shown and described as a series of steps or operations. However, steps or actions in accordance with the present invention may occur in various orders and / or concurrently. In addition, other steps or actions not shown and described herein may be used. Moreover, not all illustrated steps or actions may be required to implement a methodology in accordance with the disclosed subject matter.

[00166] 전술한 인코딩 및 디코딩의 양태들은 인코딩 및 디코딩 기술들의 몇몇 예들을 예시한다. 하지만, 인코딩 및 디코딩은 이들 용어들이 청구범위에 사용되는 바와 같이, 데이터의 압축, 압축 해제, 변환, 또는 임의의 다른 처리 또는 변경을 의미할 수 있음을 이해해야 한다.[00166] Aspects of the encoding and decoding described above illustrate some examples of encoding and decoding techniques. However, it should be understood that encoding and decoding may mean compression, decompression, transformation, or any other processing or modification of data, as these terms are used in the claims.

[00167] "예"라는 단어는 본 명세서에서 예, 사례, 또는 예시로서 기능함을 의미하도록 사용된다. 본 명세서에서 "예"로 기재된 임의의 양태 또는 설계는 반드시 다른 양태들 또는 설계들보다 바람직하거나 유익한 것으로 해석될 필요는 없다. 오히려, "예"라는 단어의 사용은 개념을 구체적인 방식으로 제시하기 위한 것이다. 본 출원에서 사용되는 "또는"이라는 용어는 배타적인 "또는"이 아니라 포괄적인 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되거나 문맥으로부터 분명하지 않은 한, "X는 A 또는 B를 포함한다"는 자연스런 포괄적인 순열들(permutations) 중 임의의 것을 의미하는 것으로 의도된다. 즉, X가 A를 포함하고; X가 B를 포함하며; 또는 X가 A와 B 양자 모두를 포함하면, "X는 A 또는 B를 포함한다"는 전술한 사례들 중 임의의 것에서 충족된다. 또한, 본 출원 및 첨부된 청구범위에서 사용되는 관사들 "a" 및 "an"은 단수 형태를 지시하도록 달리 특정되거나 문맥으로부터 분명하지 않은 한 "하나 이상"을 의미하는 것으로 일반적으로 해석되어야 한다. 또한, 전체를 통해서 "일 구현예" 또는 "하나의 구현예"라는 용어의 사용은 동일한 실시예 또는 구현예을 의미하도록 기재되지 않는 한 이와 같이 의미하도록 의도되지 않는다.[00167] The word "yes" is used herein to mean functioning as an example, example, or illustration. Any aspect or design described herein as "example" need not necessarily be construed as preferred or advantageous over other aspects or designs. Rather, the use of the word "yes" is intended to present the concept in a concrete manner. The term "or" as used herein is intended to mean a generic "or" rather than an exclusive "or". In other words, unless otherwise specified or apparent from the context, "X includes A or B" is intended to mean any of the natural, comprehensive permutations. That is, X comprises A; X comprises B; Or if X comprises both A and B, then "X comprises A or B" is met in any of the foregoing cases. Also, the articles “a” and “an” used in this application and the appended claims should be generally interpreted to mean “one or more” unless specifically specified or in the context of a singular form. Moreover, the use of the term "one embodiment" or "an embodiment" throughout is not intended to mean this unless it is indicated to mean the same embodiment or embodiment.

[00168] 송신 스테이션(102) 및/또는 수신 스테이션(106)(및 이에 저장되고 및/또는 인코더(400) 및 디코더(500)에 의한 것을 포함하여, 이에 의해 실행되는 알고리즘들, 방법들, 명령들 등)의 구현예들은 하드웨어, 소프트웨어, 또는 이들의 임의의 조합으로 실현될 수 있다. 하드웨어는 예를 들면, 컴퓨터들, 지적 재산권(IP) 코어들, ASICs(application-specific integrated circuits), 프로그래머블 로직 어레이들, 광학 프로세서들, 프로그래머블 로직 컨트롤러들, 마이크로코드(microcode), 마이크로컨트롤러들, 서버들, 마이크로프로세서들, 디지털 신호 프로세서들, 또는 임의의 다른 적절한 회로를 포함할 수 있다. 청구범위에서, "프로세서"라는 용어는 전술한 하드웨어 중 임의의 것을 단독으로 또는 조합하여 포함하는 것으로 이해되어야 한다. "신호"와 "데이터"라는 용어들은 상호 교환적으로 사용된다. 또한, 송신 스테이션(102)과 수신 스테이션(106)의 부분들은 반드시 동일한 방식으로 구현될 필요는 없다.[00168] Transmitting station 102 and / or receiving station 106 (and stored therein and / or algorithms, methods, instructions, etc., executed by it, including by encoder 400 and decoder 500) May be implemented in hardware, software, or any combination thereof. The hardware may include, for example, computers, intellectual property (IP) cores, application-specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, Servers, microprocessors, digital signal processors, or any other suitable circuit. In the claims, the term "processor" should be understood to include any or all of the foregoing hardware, alone or in combination. The terms "signal" and "data" are used interchangeably. In addition, the parts of the transmitting station 102 and the receiving station 106 need not necessarily be implemented in the same manner.

[00169] 또한, 일 양태에서, 예를 들면 송신 스테이션(102) 또는 수신 스테이션(106)은 실행될 때 본 명세서에 기재된 각각의 방법들, 알고리즘들, 및/또는 명령들 중 임의의 것을 수행하는 컴퓨터 프로그램을 갖는 범용 컴퓨터 또는 범용 프로세서를 사용하여 구현될 수 있다. 부가적으로 또는 대안적으로, 예를 들면 본 명세서에 기재된 방법들, 알고리즘들, 또는 명령들 중 임의의 것을 수행하기 위한 다른 하드웨어를 포함하는 전용 컴퓨터/프로세서가 사용될 수도 있다.[00169] Further, in one aspect, for example, the transmitting station 102 or the receiving station 106 has a computer program that, when executed, performs any of the respective methods, algorithms, and / or instructions described herein. It may be implemented using a general purpose computer or a general purpose processor. Additionally or alternatively, a dedicated computer / processor may be used that includes, for example, other hardware for performing any of the methods, algorithms, or instructions described herein.

[00170] 송신 스테이션(102) 및 수신 스테이션(106)은 예를 들면, 화상 회의 시스템의 컴퓨터들 상에서 구현될 수 있다. 대안적으로, 송신 스테이션(102)은 서버 상에서 구현될 수도 있고, 수신 스테이션(106)은 핸드헬드(hand-held) 통신 디바이스와 같은 서버와 별개인 디바이스 상에서 구현될 수도 있다. 이 경우에, 송신 스테이션(102)은 인코더(400)를 사용하여 콘텐츠를 인코딩된 비디오 신호로 인코딩하고는, 인코딩된 비디오 신호를 통신 디바이스에 전송할 수 있다. 차례로, 그 후, 통신 디바이스는 인코딩된 비디오 신호를 디코더(500)를 사용하여 디코딩할 수 있다. 대안적으로, 통신 디바이스는 통신 디바이스 상에 로컬로 저장된 콘텐츠, 예를 들면, 송신 스테이션(102)에 의해 전송되지 않은 콘텐츠를 디코딩할 수도 있다. 다른 적절한 송신 및 수신 구현 스킴들도 이용 가능하다. 예를 들면, 수신 스테이션(106)은 휴대용 통신 디바이스라기 보다는 일반적으로 고정형의 개인용 컴퓨터일 수 있고 및/또는 인코더(400)를 포함하는 디바이스가 디코더(500)도 또한 포함할 수도 있다.[00170] The transmitting station 102 and the receiving station 106 may be implemented on computers of a video conferencing system, for example. Alternatively, the transmitting station 102 may be implemented on a server and the receiving station 106 may be implemented on a device separate from the server, such as a hand-held communication device. In this case, the transmitting station 102 may use the encoder 400 to encode the content into an encoded video signal and transmit the encoded video signal to the communication device. In turn, the communication device can then decode the encoded video signal using decoder 500. Alternatively, the communication device may decode content stored locally on the communication device, eg, content that was not transmitted by the transmitting station 102. Other suitable transmit and receive implementation schemes are also available. For example, the receiving station 106 may be a generally stationary personal computer rather than a portable communication device and / or a device that includes the encoder 400 may also include the decoder 500.

[00171] 또한, 본 발명의 구현예들의 전부 또는 일부는 예를 들면, 컴퓨터 사용 가능 또는 컴퓨터 판독 가능 매체로부터 액세스 가능한 컴퓨터 프로그램 제품의 형태를 취할 수 있다. 컴퓨터 사용 가능 또는 컴퓨터 판독 가능 매체는 예를 들면, 임의의 프로세서에 의해 또는 프로세서와 연계하여 사용하기 위한 프로그램을 유형적으로(tangibly) 포함, 저장, 통신, 또는 운반할 수 있는 임의의 디바이스일 수 있다. 매체는 예를 들면, 전자, 자기, 광학, 전자기, 또는 반도체 디바이스일 수 있다. 다른 적절한 매체들도 또한 이용 가능하다.[00171] In addition, all or some of the embodiments of the invention may take the form of a computer program product accessible from a computer usable or computer readable medium, for example. The computer usable or computer readable medium can be any device capable of tangibly containing, storing, communicating, or carrying a program for use by or in conjunction with any processor, for example. . The medium may be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device. Other suitable media are also available.

[00172] 추가 구현예들은 이하의 예들에 요약되어 있다.[00172] Further implementations are summarized in the examples below.

[00173] 예 1: 비디오 시퀀스에서 예측될 제1 프레임을 결정하는 단계; 제1 프레임의 순방향 인터 예측을 위해 비디오 시퀀스로부터 제1 레퍼런스 프레임을 결정하는 단계; 제1 프레임의 역방향 인터 예측을 위해 비디오 시퀀스로부터 제2 레퍼런스 프레임을 결정하는 단계; 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임을 사용하여 광흐름 추정을 수행함으로써, 제1 프레임의 인터 예측을 위한 광흐름 레퍼런스 프레임을 생성하는 단계; 및 광흐름 레퍼런스 프레임을 사용하여 제1 프레임에 대한 예측 프로세스를 수행하는 단계를 포함하는 방법.[00173] Example 1: determining a first frame to be predicted in a video sequence; Determining a first reference frame from the video sequence for forward inter prediction of the first frame; Determining a second reference frame from the video sequence for reverse inter prediction of the first frame; Generating an optical flow reference frame for inter prediction of the first frame by performing optical flow estimation using the first reference frame and the second reference frame; And performing a prediction process for the first frame using the lightflow reference frame.

[00174] 예 2: 예 1의 방법에서, 광흐름 레퍼런스 프레임을 생성하는 단계는: 제1 프레임의 각각의 픽셀들에 대해 라그랑지 함수를 최소화함으로써 광흐름 추정을 수행하는 단계를 포함한다.[00174] Example 2: In the method of Example 1, generating the lightflow reference frame includes: performing lightflow estimation by minimizing a Lagrangian function for each pixel of the first frame.

[00175] 예 3: 예 1 또는 2의 방법에서, 광흐름 추정은 제1 프레임의 픽셀들에 대한 각각의 모션 필드를 생성하고, 광흐름 레퍼런스 프레임을 생성하는 단계는: 제1 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 제2 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 및 광흐름 레퍼런스 프레임을 형성하기 위해 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계를 포함한다.[00175] Example 3: In the method of example 1 or 2, the light flow estimation generates each motion field for the pixels of the first frame, and the generating the light flow reference frame comprises: forming a first warped reference frame Warping the first reference frame to the first frame using the motion fields; Warping the second reference frame to the first frame using the motion fields to form a second warped reference frame; And blending the first warped reference frame and the second warped reference frame to form a lightflow reference frame.

[00176] 예 4: 예 3의 방법에서, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계는: 제1 레퍼런스 프레임과 제2 레퍼런스 프레임 사이 및 현재 프레임과 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임 각각의 사이의 거리들을 사용하여, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임의 병치된 픽셀 값들을 스케일링함으로써 병치된 픽셀 값들을 결합하는 단계를 포함한다.[00176] Example 4: In the method of example 3, the blending of the first warped reference frame and the second warped reference frame comprises: between the first reference frame and the second reference frame and between the current frame and the first reference frame and the second reference. Combining the collocated pixel values by scaling the collocated pixel values of the first warped reference frame and the second warped reference frame using the distances between each of the frames.

[00177] 예 5: 예 3 또는 예 4의 방법에서, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계는: 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임의 병치된 픽셀 값들을 결합하는 것 또는 제1 워핑된 레퍼런스 프레임 또는 제2 워핑된 레퍼런스 프레임 중 하나의 단일 픽셀 값을 사용하는 것 중 어느 하나에 의해 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 단계를 포함한다.[00177] Example 5: In the method of example 3 or example 4, the blending of the first warped reference frame and the second warped reference frame comprises: collocated pixel values of the first warped reference frame and the second warped reference frame; Filling pixel positions of the lightflow reference frame by either combining or using a single pixel value of either the first warped reference frame or the second warped reference frame.

[00178] 예 6: 예 3 내지 예 5 중 어느 하나의 예의 방법에서, 방법은: 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 사용하여 제1 레퍼런스 프레임에서 폐색을 검출하는 단계를 더 포함하고, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계는: 제2 워핑된 레퍼런스 프레임으로부터의 픽셀 값들로 폐색에 대응하는 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 단계를 포함한다.[00178] Example 6: In the method of any of examples 3-5, the method further comprises: detecting an occlusion in the first reference frame using the first warped reference frame and the second warped reference frame, Blending the first warped reference frame and the second warped reference frame includes: filling pixel locations of the lightflow reference frame corresponding to the occlusion with pixel values from the second warped reference frame.

[00179] 예 7: 예 1 내지 예 6 중 어느 하나의 예의 방법에서, 예측 프로세스를 수행하는 단계는: 제1 프레임의 블록들의 단일 레퍼런스 인터 예측을 위해서만 광흐름 레퍼런스 프레임을 사용하는 단계를 포함한다.[00179] Example 7: In the method of any of examples 1-6, performing the prediction process comprises: using the lightflow reference frame only for single reference inter prediction of blocks of the first frame.

[00180] 예 8: 예 1 내지 예 7 중 어느 하나의 예의 방법에서, 제1 레퍼런스 프레임은 제1 프레임의 순방향 인터 예측에 이용 가능한 제1 프레임에 대해 비디오 시퀀스의 디스플레이 순서로 가장 가까운 재구성된 프레임이고, 제2 레퍼런스 프레임은 제1 프레임의 역방향 인터 예측에 이용 가능한 제1 프레임에 대해 디스플레이 순서로 가장 가까운 재구성된 프레임이다.[00180] Example 8: The method of any of examples 1-7, wherein the first reference frame is the closest reconstructed frame in display order of the video sequence relative to the first frame available for forward inter prediction of the first frame; The two reference frames are the reconstructed frames closest in display order to the first frame available for reverse inter prediction of the first frame.

[00181] 예 9: 예 1 내지 예 8 중 어느 하나의 예의 방법에서, 예측 프로세스를 수행하는 단계는: 제1 프레임의 제1 블록과 병치된 광흐름 레퍼런스 프레임 내의 레퍼런스 블록을 결정하는 단계; 및 레퍼런스 블록과 제1 블록의 잔차를 인코딩하는 단계를 포함한다.[00181] Example 9: In the method of any one of examples 1-8, performing the prediction process comprises: determining a reference block in a lightflow reference frame in parallel with the first block of the first frame; And encoding a residual of the reference block and the first block.

[00182] 예 10: 장치는: 프로세서; 및 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하는 비일시적 저장 매체를 포함하고, 방법은: 비디오 시퀀스에서 예측될 제1 프레임을 결정하는 단계; 제1 프레임의 순방향 인터 예측을 위한 제1 레퍼런스 프레임의 이용 가능성 및 제1 프레임의 역방향 인터 예측을 위한 제2 레퍼런스 프레임의 이용 가능성을 결정하는 단계; 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임 양자 모두의 이용 가능성을 결정하는 것에 응답하여: 광흐름 추정을 사용하여 제1 레퍼런스 프레임과 제2 레퍼런스 프레임을 사용하여 제1 프레임의 픽셀들에 대한 각각의 모션 필드를 생성하는 단계; 제1 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 제2 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 및 제1 프레임의 블록들의 인터 예측을 위한 광흐름 레퍼런스 프레임을 형성하기 위해 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계를 포함한다.[00182] Example 10 An apparatus includes: a processor; And a non-transitory storage medium comprising instructions executable by the processor to perform the method, the method comprising: determining a first frame to be predicted in a video sequence; Determining availability of the first reference frame for forward inter prediction of the first frame and availability of the second reference frame for reverse inter prediction of the first frame; In response to determining the availability of both the first reference frame and the second reference frame: each motion for the pixels of the first frame using the first reference frame and the second reference frame using the light flow estimation Creating a field; Warping the first reference frame to the first frame using the motion fields to form a first warped reference frame; Warping the second reference frame to the first frame using the motion fields to form a second warped reference frame; And blending the first warped reference frame and the second warped reference frame to form an optical flow reference frame for inter prediction of the blocks of the first frame.

[00183] 예 11: 예 10의 장치에서, 방법은: 광흐름 레퍼런스 프레임을 사용하여 제1 프레임에 대한 예측 프로세스를 수행하는 단계를 더 포함한다.[00183] Example 11: In the apparatus of Example 10, the method further comprises: performing a prediction process for the first frame using the lightflow reference frame.

[00184] 예 12: 예 10 또는 예 11의 장치에서, 방법은: 제1 프레임의 블록들의 단일 레퍼런스 인터 예측을 위해서만 광흐름 레퍼런스 프레임을 사용하는 단계를 더 포함한다.[00184] Example 12: The apparatus of example 10 or 11, the method further comprising using the lightflow reference frame only for single reference inter prediction of blocks of the first frame.

[00185] 예 13: 예 10 내지 예 12 중 어느 하나의 예의 장치에서, 각각의 모션 필드를 생성하는 단계는: 제1 레퍼런스 프레임과 제2 레퍼런스 프레임을 사용하여 제1 프레임의 각각의 픽셀들에 대한 라그랑지 함수의 출력을 계산하는 단계를 포함한다.[00185] Example 13: In the apparatus of any of Examples 10-12, generating each motion field comprises: a Lagrange for each pixel of the first frame using the first reference frame and the second reference frame. Calculating the output of the function.

[00186] 예 14: 예 13의 장치에서, 라그랑지 함수의 출력을 계산하는 단계는: 라그랑지 파라미터에 대한 제1 값을 사용하여 현재 프레임의 픽셀들에 대한 제1 모션 필드들의 세트를 계산하는 단계; 및 현재 프레임의 픽셀들에 대한 개선된 모션 필드들의 세트를 계산하기 위해 라그랑지 파라미터에 대한 제2 값을 사용하는 라그랑지 함수에 대한 입력으로서 제1 모션 필드들의 세트를 사용하는 단계를 포함하고, 라그랑지 파라미터에 대한 제2 값은 라그랑지 파라미터에 대한 제1 값보다 더 작고, 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스는 개선된 모션 필드들의 세트를 사용하여 워핑된다.[00186] Example 14: In the apparatus of Example 13, calculating the output of the Lagrange function includes: calculating a first set of motion fields for pixels of the current frame using the first value for the Lagrange parameter; And using the first set of motion fields as input to a Lagrange function using the second value for the Lagrange parameter to calculate a set of improved motion fields for the pixels of the current frame, The second value for the Lagrange parameter is smaller than the first value for the Lagrange parameter, and the first warped reference frame and the second warped reference are warped using the improved set of motion fields.

[00187] 예 15: 장치는: 프로세서; 및 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하는 비일시적 저장 매체를 포함하고, 방법은: 광흐름 추정을 위해 제1 처리 레벨에서 제1 프레임의 픽셀들에 대한 모션 필드들을 초기화함으로써 ― 제1 처리 레벨은 제1 프레임 내의 다운스케일된 모션을 나타내고 다수의 레벨들 중 하나의 레벨을 포함함 ― , 비디오 시퀀스로부터의 제1 레퍼런스 프레임 및 비디오 시퀀스의 제2 레퍼런스 프레임을 사용하여 비디오 시퀀스의 제1 프레임의 인터 예측을 위한 광흐름 레퍼런스 프레임을 생성하는 단계; 다수의 레벨들 중 각각의 레벨에 대해: 제1 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 제2 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 광흐름 추정을 사용하여 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임 사이의 모션 필드들을 추정하는 단계; 및 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임 사이의 모션 필드들을 사용하여 제1 프레임의 픽셀들에 대한 모션 필드들을 업데이트하는 단계; 다수의 레벨들 중 최종 레벨에 대해: 최종의 제1 워핑된 레퍼런스 프레임을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 최종의 제2 워핑된 레퍼런스 프레임을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 및 광흐름 레퍼런스 프레임을 형성하기 위해 최종의 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계를 포함한다. [00187] Example 15 An apparatus includes: a processor; And a non-transitory storage medium comprising instructions executable by the processor to perform the method, the method comprising: initializing motion fields for pixels of a first frame at a first processing level for light flow estimation; The first processing level represents downscaled motion within the first frame and includes one of a plurality of levels—a first reference frame from the video sequence and a second reference frame of the video sequence; Generating an optical flow reference frame for inter prediction of the first frame; For each of the plurality of levels: warping the first reference frame to the first frame using motion fields to form a first warped reference frame; Warping the second reference frame to the first frame using the motion fields to form a second warped reference frame; Estimating motion fields between the first warped reference frame and the second warped reference frame using light flow estimation; And updating the motion fields for the pixels of the first frame using the motion fields between the first warped reference frame and the second warped reference frame; For a final level of the plurality of levels: warping the first reference frame to the first frame using the updated motion fields to form a final first warped reference frame; Warping the second reference frame to the first frame using the updated motion fields to form a final second warped reference frame; And blending the final first warped reference frame and the second warped reference frame to form a lightflow reference frame.

[00188] 예 16: 예 15의 장치에서, 광흐름 추정은 프레임의 각각의 픽셀들에 대해 라그랑지 함수를 사용한다.[00188] Example 16: In the apparatus of example 15, the light flow estimation uses a Lagrange function for each pixel of the frame.

[00189] 예 17: 예 16의 장치에서, 방법은 다수의 레벨들 중 각각의 레벨에 대해: 제1 레퍼런스 프레임을 워핑하는 단계, 제2 레퍼런스 프레임을 워핑하는 단계, 모션 필드들을 추정하는 단계, 및 모션 필드들을 업데이트하는 단계의 제1 반복(first iteration)에 대해 라그랑지 함수의 라그랑지 파라미터를 최대 값으로 초기화하는 단계; 및 라그랑지 파라미터에 대해 한 세트의 가능한 값들 중 점점 더 작은 값들을 사용하여, 제1 레퍼런스 프레임을 워핑하는 단계, 제2 레퍼런스 프레임을 워핑하는 단계, 모션 필드들을 추정하는 단계, 및 모션 필드들을 업데이트하는 단계의 추가 반복(additional iteration)을 수행하는 단계를 더 포함한다.[00189] Example 17: The apparatus of example 16, the method further comprises: for each of the plurality of levels: warping a first reference frame, warping a second reference frame, estimating motion fields, and a motion field. Initializing a Lagrangian parameter of the Lagrange function to a maximum value for a first iteration of updating them; And using the smaller and smaller of a set of possible values for the Lagrange parameter, warping the first reference frame, warping the second reference frame, estimating motion fields, and updating the motion fields. The method further includes performing additional iteration of the step.

[00190] 예 18: 예 16 또는 예 17의 장치에서, 모션 필드들을 추정하는 단계는: 수평축, 수직축, 및 시간에 대해 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스 프레임의 픽셀 값들의 도함수들을 계산하는 단계; 레벨이 최종 레벨과 상이하다는 것에 응답하여 도함수들을 다운스케일링하는 단계; 도함수들을 사용하여 라그랑지 함수를 나타내는 선형 방정식들을 푸는 단계를 포함한다.[00190] Example 18: In the apparatus of example 16 or 17, estimating the motion fields comprises: calculating derivatives of pixel values of the first warped reference frame and the second warped reference frame with respect to the horizontal axis, the vertical axis, and time. ; Downscaling the derivatives in response to the level being different than the final level; Solving the linear equations representing the Lagrange function using the derivatives.

[00191] 예 19: 예 15 내지 예 18 중 어느 하나의 예의 장치에서, 방법은 광흐름 레퍼런스 프레임을 사용하여 제1 프레임을 인터 예측하는 단계를 더 포함한다.[00191] Example 19: The apparatus of any of examples 15-18, the method further comprising inter predicting the first frame using the lightflow reference frame.

[00192] 예 20: 예 15 내지 예 19 중 어느 하나의 예의 장치에서, 프로세서와 비일시적 저장 매체는 디코더를 형성한다.[00192] Example 20: The apparatus of any of examples 15-19, wherein the processor and the non-transitory storage medium form a decoder.

[00193] 전술한 실시예들, 구현예들, 및 양태들은 본 발명의 이해를 수월하게 하기 위해 기재되었으며 본 발명을 제한하지 않는다. 그 반대로, 본 발명은 첨부된 청구항들의 범위 내에 포함된 다양한 변형들 및 등가의 배열들을 포함하도록 의도되며, 이러한 범위는 법에 의해 허용되는 모든 이러한 변형들 및 동등한 구조를 포괄하도록 가장 넓은 해석에 따르게 된다.[00193] The foregoing embodiments, embodiments, and aspects have been described to facilitate understanding of the invention and do not limit the invention. On the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope is to be accorded the widest interpretation so as to encompass all such modifications and equivalent structures permitted by law. do.

Claims

Determining a first frame portion of a first frame to be predicted, wherein the first frame is in a video sequence;
Determining a first reference frame from the video sequence for forward inter prediction of the first frame;
Determining a second reference frame from the video sequence for reverse inter prediction of the first frame;
Generating an optical flow reference frame portion for inter prediction of the first frame portion by performing optical flow estimation using the first reference frame and the second reference frame; And
Performing a prediction process for the first frame portion using an optical flow reference frame portion,
Way.

The method of claim 1,
Generating the light flow reference frame portion may include:
Performing the light flow estimation by minimizing a Lagrangian function for each pixel of the first frame portion,
Way.

The method according to claim 1 or 2,
The light flow estimate generates respective motion fields for the pixels of the first frame portion,
Generating the light flow reference frame portion may include:
Warping pixels of the first reference frame co-located with the first frame portion to the first frame portion using the motion fields to form a first warped reference frame portion. step;
Warping pixels of the second reference frame in parallel with the first frame portion to the first frame portion using the motion fields to form a second warped reference frame portion; And
Blending the first warped reference frame portion and the second warped reference frame portion to form the lightflow reference frame portion;
Way.

The method of claim 3, wherein
Blending the first warped reference frame portion and the second warped reference frame portion includes:
The first warped reference frame portion and the second warping using distances between the first reference frame and the second reference frame and between the current frame and each of the first reference frame and the second reference frame. Combining the juxtaposed pixel values by scaling juxtaposed pixel values of a portion of the referenced reference frame,
Way.

The method according to claim 3 or 4,
Blending the first warped reference frame portion and the second warped reference frame portion includes:
Combining the collocated pixel values of the first warped reference frame portion and the second warped reference frame portion or a single pixel value of one of the first warped reference frame portion or the second warped reference frame portion Populating pixel positions of the lightflow reference frame portion by using one of
Way.

The method according to any one of claims 1 to 5,
The first frame portion includes a current block of the first frame or one of the first frames, and the light flow reference frame portion is a block when the first frame portion includes the current block, and the first frame portion includes: When one frame part includes the first frame,
Way.

The method according to any one of claims 1 to 6,
The first reference frame is a reconstructed frame closest to the first frame available for forward inter prediction of the first frame in display order of the video sequence, and the second reference frame is a reverse inter frame of the first frame. The reconstructed frame nearest to the display order for the first frame available for prediction,
Way.

The method according to any one of the preceding claims.
The first frame portion is a current block to be decoded,
Performing the prediction process includes:
Using the motion vector used to encode the current block to identify a reference block location;
Adjusting boundaries of the reference block by a length of a subpixel interpolation filter; And
Identifying blocks comprising pixels in the adjusted boundaries of the reference block,
The generating of the light flow reference frame portion may include: light flow estimation on blocks of the first frame collocated with the identified blocks without performing light flow estimation on the remaining blocks of the first frame. Comprising the steps of:
Way.

The method according to any one of claims 1 to 8,
The first frame portion is a current block to be encoded,
Generating the light flow reference frame portion may include performing light flow estimation on each block of the first frame as the current block to generate each collocated reference block for the light flow reference frame. Include,
Performing the prediction process includes:
Forming the lightflow reference frame by combining the juxtaposed reference blocks to their respective pixel positions;
Storing the optical flow reference frame in a reference frame buffer; And
Using the lightflow reference frame for motion search for the current block,
Way.

As a device,
A processor; And
A non-transitory storage medium comprising instructions executable by the processor to perform a method,
The method is:
Determining a first frame to be predicted in the video sequence;
Determining availability of a first reference frame for forward inter prediction of the first frame and availability of a second reference frame for reverse inter prediction of the first frame;
In response to determining the availability of both the first reference frame and the second reference frame:
Using the first reference frame and the second reference frame as inputs to a light flow estimation process, generating respective motion fields for the pixels of the first frame portion;
Warping a first reference frame portion to the first frame portion using the motion fields to form a first warped reference frame portion, the first reference frame portion being equal to the pixels of the first frame portion; Includes pixels of the collocated first reference frame;
Warping a second reference frame portion into the first frame portion using the motion fields to form a second warped reference frame portion, wherein the second reference frame portion is coupled with the pixels of the first frame portion; Includes pixels of the collocated second reference frame; And
Blending the first warped reference frame portion and the second warped reference frame portion to form an optical flow reference frame portion for inter prediction of the block of the first frame;
Device.

The method of claim 10,
The method is:
Performing a prediction process on the block of the first frame using the lightflow reference frame portion;
Device.

The method according to claim 10 or 11, wherein
The method is:
Using the lightflow reference frame portion only for single reference inter prediction of blocks of the first frame,
Device.

The method according to any one of claims 10 to 12,
Generating each motion field includes:
Calculating an output of a Lagrange function for each pixel of the first frame portion using the first reference frame portion and the second reference frame portion;
Device.

The method of claim 13,
Computing the output of the Lagrange function is:
Calculating a first set of motion fields for pixels of the first frame portion using the first value for the Lagrange parameter; And
To calculate the set of refined motion fields for the pixels of the first frame portion, the input of the first motion fields as input to the Lagrange function using a second value for the Lagrange parameter. Using the set,
The second value for the Lagrange parameter is less than the first value for the Lagrange parameter, wherein the first warped reference frame and the second warped reference frame are warped using the improved set of motion fields. felled,
Device.

As a device,
A processor; And
A non-transitory storage medium comprising instructions executable by the processor to perform a method,
The method is:
By initializing motion fields for pixels of a first frame portion at a first processing level for light flow estimation, wherein the first processing level represents downscaled motion within the first frame portion and indicates one of a plurality of levels. Including a level, wherein the first reference frame portion from the video sequence and the second reference frame portion of the video sequence are used to generate a light flow reference frame portion for inter prediction of a block of the first frame of the video sequence. step;
For each level of the plurality of levels:
Warping the first reference frame portion to the first frame portion using the motion fields to form a first warped reference frame portion;
Warping the second reference frame portion to the first frame portion using the motion fields to form a second warped reference frame portion;
Estimating motion fields between the first warped reference frame portion and the second warped reference frame portion using the light flow estimation; And
Updating the motion fields for pixels of the first frame portion using the motion fields between the first warped reference frame portion and the second warped reference frame portion;
For the final level of the plurality of levels:
Warping the first reference frame portion to the first frame portion using the updated motion fields to form a final first warped reference frame portion;
Warping the second reference frame portion to the first frame portion using the updated motion fields to form a final second warped reference frame portion; And
Blending the final warped reference frame portion with the second warped reference frame portion to form the lightflow reference frame portion;
Device.

The method of claim 15,
The light flow estimation uses a Lagrange function for each pixel of the first frame portion,
Device.

The method of claim 16,
The method, for each level of the plurality of levels:
The Lagrang for a first iteration of warping the first reference frame portion, warping the second reference frame portion, estimating the motion fields, and updating the motion fields. Initializing the Lagrangian parameters of the support function to a maximum value; And
Warping the first reference frame portion using the smaller and smaller of a set of possible values for the Lagrange parameter, warping the second reference frame portion, estimating the motion fields, And performing an additional iteration of updating the motion fields.
Device,

The method according to claim 16 or 17,
Estimating the motion fields includes:
Calculating derivatives of pixel values of the first warped reference frame portion and the second warped reference frame portion with respect to a horizontal axis, a vertical axis, and time;
Downscaling the derivatives in response to the level being different than the final level;
Solving linear equations representing the Lagrange function using the derivatives,
Device.

The method according to any one of claims 15 to 18,
The method is:
Inter predicting a current block of the first frame using the lightflow reference frame portion;
Device.

The method according to any one of claims 15 to 19,
The processor and the non-transitory storage medium form a decoder;
Device.

As a device,
A processor; And
A non-transitory storage medium comprising instructions executable by the processor to perform a method,
The method is:
Determining a first frame portion of a first frame to be predicted, wherein the first frame is in a video sequence;
Determining a first reference frame from the video sequence for forward inter prediction of the first frame;
Determining a second reference frame from the video sequence for reverse inter prediction of the first frame;
Generating an optical flow reference frame portion for inter prediction of the first frame portion by performing optical flow estimation using the first reference frame and the second reference frame; And
Performing a prediction process for a first frame portion using the lightflow reference frame portion;
Device.