KR20230137171A

KR20230137171A - Image encoding/decoding method and apparatus based on object detection information, and recording medium storing bitstream

Info

Publication number: KR20230137171A
Application number: KR1020220034995A
Authority: KR
Inventors: 문전학; 김대연; 이영렬; 김명준; 송현주; 임수연
Original assignee: 주식회사 칩스앤미디어; 세종대학교산학협력단
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2023-10-04

Abstract

영상 부호화/복호화 방법, 장치 및 영상 부호화 방법에 의해 생성된 비트스트림을 저장하는 기록 매체가 제공된다. 본 개시에 따른 영상 복호화 방법은, 입력 영상에 대한 물체 인식 정보를 획득하는 단계, 및 상기 물체 인식 정보에 기반하여, 상기 입력 영상을 복원하는 단계를 포함하고, 상기 물체 인식 정보가 상기 입력 영상 내의 제1 물체에 대한 제1 물체 인식 정보 및 상기 입력 영상 내의 제2 물체에 대한 제2 물체 인식 정보를 포함하는 경우, 상기 제2 물체 인식 정보는 상기 제1 물체 인식 정보와의 차분 정보로서 유도될 수 있다.An image encoding/decoding method, device, and recording medium for storing a bitstream generated by the image encoding method are provided. An image decoding method according to the present disclosure includes obtaining object recognition information for an input image, and restoring the input image based on the object recognition information, wherein the object recognition information is included in the input image. When it includes first object recognition information for a first object and second object recognition information for a second object in the input image, the second object recognition information will be derived as difference information with the first object recognition information. You can.

Description

Image encoding/decoding method based on object recognition information, device, and recording medium storing bitstream {IMAGE ENCODING/DECODING METHOD AND APPARATUS BASED ON OBJECT DETECTION INFORMATION, AND RECORDING MEDIUM STORING BITSTREAM}

본 발명은 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체에 관한 것으로, 보다 상세하게는 물체 인식 정보 기반의 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체에 관한 것이다.The present invention relates to an image encoding/decoding method, a device, and a recording medium storing a bitstream. More specifically, it relates to an image encoding/decoding method based on object recognition information, a device, and a recording medium storing a bitstream.

최근 HD(High Definition) 영상 및 UHD(Ultra High Definition) 영상과 같은 고해상도, 고품질 영상에 대한 수요가 다양한 분야에서 증가하고 있다. 한편, 기계학습을 활용한 물체 인식 기술이 급속도로 발전하고 있는 바, 물체 인식 알고리즘을 통해 획득된 데이터를 영상 부호화/복호화에 이용함으로써, 영상 압축 효율을 획기적으로 개선할 수 있을 것으로 기대되고 있다. 이에 따라, 물체 인식 기술을 영상 압축 기술에 접목시키려는 다양한 연구 개발이 활발히 진행되고 있다.Recently, demand for high-resolution, high-quality video, such as HD (High Definition) video and UHD (Ultra High Definition) video, is increasing in various fields. Meanwhile, as object recognition technology using machine learning is rapidly developing, it is expected that image compression efficiency can be dramatically improved by using data obtained through object recognition algorithms for image encoding/decoding. Accordingly, various research and development efforts are being actively conducted to combine object recognition technology with image compression technology.

본 개시는 부호화/복호화 효율이 향상된 영상 부호화/복호화 방법 및 장치를 제공하는 것을 목적으로 한다.The purpose of the present disclosure is to provide a video encoding/decoding method and device with improved encoding/decoding efficiency.

또한, 본 개시는 물체 인식 정보에 기반한 영상 부호화/복호화 방법 및 장치를 제공하는 것을 목적으로 한다.Additionally, the present disclosure aims to provide an image encoding/decoding method and device based on object recognition information.

또한, 본 개시는 물체 인식 정보에 기반하여 재구성된 압축 단위를 갖는 영상 부호화/복호화 방법 및 장치를 제공하는 것을 목적으로 한다.Additionally, the present disclosure aims to provide an image encoding/decoding method and device having a compression unit reconstructed based on object recognition information.

또한, 본 개시는 물체 인식 정보에 기반한 적응적 비트율을 갖는 영상 부호화/복호화 방법 및 장치를 제공하는 것을 목적으로 한다.Additionally, the present disclosure aims to provide an image encoding/decoding method and device with an adaptive bit rate based on object recognition information.

또한, 본 개시는 영상 내 인식된 물체를 포함하는 영역에 대해서만 예측을 수행하는 영상 부호화/복호화 방법 및 장치를 제공하는 것을 목적으로 한다.Additionally, the present disclosure aims to provide an image encoding/decoding method and device that performs prediction only on the area containing a recognized object in the image.

또한, 본 개시는 물체 인식 정보를 영상 정보와 함께 부호화/복호화하는 영상 부호화/복호화 방법 및 장치를 제공하는 것을 목적으로 한다.Additionally, the present disclosure aims to provide an image encoding/decoding method and device for encoding/decoding object recognition information together with image information.

또한, 본 개시는 물체 인식 정보를 영상 정보와는 별도로 부호화/복호화하는 영상 부호화/복호화 방법 및 장치를 제공하는 것을 목적으로 한다.Additionally, the present disclosure aims to provide an image encoding/decoding method and device that encodes/decodes object recognition information separately from image information.

또한, 본 개시는 본 개시에 따른 영상 부호화 방법 또는 장치에 의해 생성된 비트스트림을 저장한 컴퓨터 판독가능한 기록 매체를 제공하는 것을 목적으로 한다.Additionally, the present disclosure aims to provide a computer-readable recording medium that stores a bitstream generated by the video encoding method or device according to the present disclosure.

본 특허는 (과제명: 5G 연계 초고실감 미디어를 위한 VVC (Vesatile Video Codec) 디코더 IP 개발, 과제고유번호: 1415172976, 과제번호: 20010342) 산업통상자원부에서 지원한 과제로 수행되었다.This patent was carried out as a project supported by the Ministry of Trade, Industry and Energy (Project name: Development of VVC (Vesatile Video Codec) decoder IP for 5G-linked ultra-realistic media, Project identification number: 1415172976, Project number: 20010342).

본 개시의 일 양상에 따른 영상 복호화 방법은, 입력 영상에 대한 물체 인식 정보를 획득하는 단계, 및 상기 물체 인식 정보에 기반하여 상기 입력 영상을 복원하는 단계를 포함하고, 상기 물체 인식 정보가 상기 입력 영상 내의 제1 물체에 대한 제1 물체 인식 정보 및 상기 입력 영상 내의 제2 물체에 대한 제2 물체 인식 정보를 포함하는 경우, 상기 제2 물체 인식 정보는 상기 제1 물체 인식 정보와의 차분 정보로서 획득될 수 있다.An image decoding method according to an aspect of the present disclosure includes obtaining object recognition information for an input image, and restoring the input image based on the object recognition information, wherein the object recognition information is included in the input image. When it includes first object recognition information for a first object in an image and second object recognition information for a second object in the input image, the second object recognition information is difference information with the first object recognition information. can be obtained.

본 개시의 다른 양상에 따른 영상 부호화 방법은, 물체 인식 알고리즘에 기반하여 입력 영상 내 물체 인식 정보를 획득하는 단계, 및 상기 물체 인식 정보에 기반하여 상기 입력 영상을 부호화하는 단계를 포함하고, 상기 물체 인식 정보가 상기 입력 영상 내의 제1 물체에 대한 제1 물체 인식 정보 및 상기 입력 영상 내의 제2 물체에 대한 제2 물체 인식 정보를 포함하는 경우, 상기 제2 물체 인식 정보는 상기 제1 물체 인식 정보와의 차분 정보로서 부호화될 수 있다.An image encoding method according to another aspect of the present disclosure includes obtaining object recognition information in an input image based on an object recognition algorithm, and encoding the input image based on the object recognition information, wherein the object When the recognition information includes first object recognition information for the first object in the input image and second object recognition information for the second object in the input image, the second object recognition information is the first object recognition information. It can be encoded as difference information with .

본 개시의 또 다른 양상에 따른 컴퓨터 판독 가능한 기록 매체는, 본 개시의 영상 부호화 방법 또는 영상 부호화 장치에 의해 생성된 비트스트림을 저장할 수 있다.A computer-readable recording medium according to another aspect of the present disclosure can store a bitstream generated by the video encoding method or video encoding device of the present disclosure.

본 개시에 대하여 위에서 간략하게 요약된 특징들은 후술하는 본 개시의 상세한 설명의 예시적인 양상일 뿐이며, 본 개시의 범위를 제한하는 것은 아니다.The features briefly summarized above with respect to the present disclosure are merely exemplary aspects of the detailed description of the present disclosure described below, and do not limit the scope of the present disclosure.

본 개시에 따르면, 부호화/복호화 효율이 향상된 영상 부호화/복호화 방법 및 장치를 제공될 수 있다.According to the present disclosure, an image encoding/decoding method and device with improved encoding/decoding efficiency can be provided.

또한, 본 개시에 따르면, 물체 인식 정보에 기반한 영상 부호화/복호화 방법 및 장치를 제공될 수 있다.Additionally, according to the present disclosure, an image encoding/decoding method and device based on object recognition information can be provided.

또한, 본 개시에 따르면, 물체 인식 정보에 기반하여 재구성된 압축 단위를 갖는 영상 부호화/복호화 방법 및 장치가 제공될 수 있다.Additionally, according to the present disclosure, an image encoding/decoding method and device having a compression unit reconstructed based on object recognition information can be provided.

또한, 본 개시에 따르면, 물체 인식 정보에 기반한 적응적 비트율을 갖는 영상 부호화/복호화 방법 및 장치가 제공될 수 있다.Additionally, according to the present disclosure, an image encoding/decoding method and device having an adaptive bit rate based on object recognition information can be provided.

또한, 본 개시에 따르면, 영상 내 인식된 물체를 포함하는 영역에 대해서만 예측을 수행하는 영상 부호화/복호화 방법 및 장치가 제공될 수 있다.Additionally, according to the present disclosure, an image encoding/decoding method and device that performs prediction only for a region containing a recognized object in an image can be provided.

또한, 본 개시에 따르면, 물체 인식 정보를 영상 정보와 함께 부호화/복호화하는 영상 부호화/복호화 방법 및 장치가 제공될 수 있다.Additionally, according to the present disclosure, an image encoding/decoding method and device for encoding/decoding object recognition information together with image information can be provided.

또한, 본 개시에 따르면, 물체 인식 정보를 영상 정보와는 별도로 부호화/복호화하는 영상 부호화/복호화 방법 및 장치가 제공될 수 있다.Additionally, according to the present disclosure, an image encoding/decoding method and device can be provided that encodes/decodes object recognition information separately from image information.

또한, 본 개시에 따르면, 본 개시에 따른 영상 부호화 방법 또는 장치에 의해 생성된 비트스트림을 저장한 컴퓨터 판독가능한 기록 매체가 제공될 수 있다.Additionally, according to the present disclosure, a computer-readable recording medium storing a bitstream generated by the image encoding method or device according to the present disclosure may be provided.

도 1은 영상 부호화 장치를 나타낸 블록도이다.
도 2는 영상 복호화 장치를 나타낸 블록도이다.
도 3 내지 도 10은 본 발명의 실시예들에 따른 영상 부호화기 및 영상 복호화기를 설명하기 위한 도면들이다.
도 11 내지 도 12b는 본 발명의 일 실시예에 따른 압축 단위를 재구성하는 방법을 설명하기 위한 도면들이다.
도 13은 본 발명의 일 실시예에 따른 화면간 예측을 설명하기 위한 도면이다.
도 14 및 도 15는 본 발명의 실시예들에 적용될 수 있는 물체 분류표의 일 예들을 나타낸 도면들이다.
도 16은 본 발명의 일 실시예에 따른 영상 부호화 방법을 나타낸 흐름도이다.
도 17은 본 발명의 일 실시예에 따른 영상 복호화 방법을 나타낸 흐름도이다.
도 18은 본 발명의 일 실시예에 따른 영상 부호화/복호화 장치를 포함하는 전자기기를 개략적으로 나타낸 블록도이다.1 is a block diagram showing a video encoding device.
Figure 2 is a block diagram showing a video decoding device.
3 to 10 are diagrams for explaining a video encoder and a video decoder according to embodiments of the present invention.
11 to 12B are diagrams for explaining a method of reconstructing a compression unit according to an embodiment of the present invention.
Figure 13 is a diagram for explaining inter-screen prediction according to an embodiment of the present invention.
14 and 15 are diagrams showing examples of object classification tables that can be applied to embodiments of the present invention.
Figure 16 is a flowchart showing an image encoding method according to an embodiment of the present invention.
Figure 17 is a flowchart showing an image decoding method according to an embodiment of the present invention.
Figure 18 is a block diagram schematically showing an electronic device including an image encoding/decoding device according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다. 후술하는 예시적 실시예들에 대한 상세한 설명은, 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 실시예를 실시할 수 있기에 충분하도록 상세히 설명된다. 다양한 실시예들은 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 실시예의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 예시적 실시예들의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention. Similar reference numbers in the drawings refer to identical or similar functions across various aspects. The shapes and sizes of elements in the drawings may be exaggerated for clearer explanation. For a detailed description of the exemplary embodiments described below, refer to the accompanying drawings, which illustrate specific embodiments by way of example. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments. It should be understood that the various embodiments are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein with respect to one embodiment may be implemented in other embodiments without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the embodiment. Accordingly, the detailed description that follows is not to be taken in a limiting sense, and the scope of the exemplary embodiments is limited only by the appended claims, together with all equivalents to what those claims assert if properly described.

본 발명에서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.In the present invention, terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be named a second component, and similarly, the second component may also be named a first component without departing from the scope of the present invention. The term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.

본 발명의 어떤 구성 요소가 다른 구성 요소에 “연결되어” 있다거나 “접속되어” 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있으나, 중간에 다른 구성 요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어"있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component of the present invention is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but other components may exist in between. It must be understood that it may be possible. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

본 발명의 실시예에 나타나는 구성부들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성부들이 분리된 하드웨어나 하나의 소프트웨어 구성단위로 이루어짐을 의미하지 않는다. 즉, 각 구성부는 설명의 편의상 각각의 구성부로 나열하여 포함한 것으로 각 구성부 중 적어도 두 개의 구성부가 합쳐져 하나의 구성부로 이루어지거나, 하나의 구성부가 복수 개의 구성부로 나뉘어져 기능을 수행할 수 있고 이러한 각 구성부의 통합된 실시예 및 분리된 실시예도 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리범위에 포함된다.The components appearing in the embodiments of the present invention are shown independently to represent different characteristic functions, and do not mean that each component is comprised of separate hardware or a single software component. That is, each component is listed and included as a separate component for convenience of explanation, and at least two of each component can be combined to form one component, or one component can be divided into a plurality of components to perform a function, and each of these components can perform a function. Integrated embodiments and separate embodiments of the constituent parts are also included in the scope of the present invention as long as they do not deviate from the essence of the present invention.

본 발명에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 발명에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 즉, 본 발명에서 특정 구성을 “포함”한다고 기술하는 내용은 해당 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성이 본 발명의 실시 또는 본 발명의 기술적 사상의 범위에 포함될 수 있음을 의미한다. The terms used in the present invention are only used to describe specific embodiments and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In the present invention, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof. In other words, the description of “including” a specific configuration in the present invention does not exclude configurations other than the configuration, and means that additional configurations may be included in the practice of the present invention or the scope of the technical idea of the present invention.

본 발명의 일부의 구성 요소는 본 발명에서 본질적인 기능을 수행하는 필수적인 구성 요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성 요소일 수 있다. 본 발명은 단지 성능 향상을 위해 사용되는 구성 요소를 제외한 본 발명의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 단지 성능 향상을 위해 사용되는 선택적 구성 요소를 제외한 필수 구성 요소만을 포함한 구조도 본 발명의 권리범위에 포함된다.Some of the components of the present invention may not be essential components that perform essential functions in the present invention, but may simply be optional components to improve performance. The present invention can be implemented by including only essential components for implementing the essence of the present invention excluding components used only to improve performance, and a structure including only essential components excluding optional components used only to improve performance. is also included in the scope of rights of the present invention.

이하, 도면을 참조하여 본 발명의 실시 형태에 대하여 구체적으로 설명한다. 본 명세서의 실시예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략하고, 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In describing the embodiments of the present specification, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present specification, the detailed description will be omitted, and the same reference numerals will be used for the same components in the drawings. Redundant descriptions of the same components are omitted.

도 1은 영상 부호화 장치를 나타낸 블록도이다.1 is a block diagram showing a video encoding device.

[도 1] 영상 부호화 장치[Figure 1] Video encoding device

도 1을 참조하면, 영상 부호화 장치(100)는 픽처 분할부(110), 예측부(120, 125), 변환부(130), 양자화부(135), 재정렬부(160), 엔트로피 부호화부(165), 역양자화부(140), 역변환부(145), 필터부(150) 및 메모리(155)를 포함할 수 있다.Referring to FIG. 1, the image encoding device 100 includes a picture segmentation unit 110, prediction units 120 and 125, a transformation unit 130, a quantization unit 135, a rearrangement unit 160, and an entropy encoding unit ( 165), an inverse quantization unit 140, an inverse transform unit 145, a filter unit 150, and a memory 155.

도 1에 나타난 각 구성부들은 영상 부호화 장치에서 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시한 것으로, 각 구성부들이 분리된 하드웨어나 하나의 소프트웨어 구성단위로 이루어짐을 의미하지 않는다. 즉, 각 구성부는 설명의 편의상 각각의 구성부로 나열하여 포함한 것으로 각 구성부 중 적어도 두 개의 구성부가 합쳐져 하나의 구성부로 이루어지거나, 하나의 구성부가 복수개의 구성부로 나뉘어져 기능을 수행할 수 있고 이러한 각 구성부의 통합된 실시예 및 분리된 실시예도 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리범위에 포함된다.Each component shown in FIG. 1 is shown independently to represent different characteristic functions in the video encoding device, and does not mean that each component is comprised of separate hardware or a single software component. That is, each component is listed and included as a separate component for convenience of explanation, and at least two of each component can be combined to form one component, or one component can be divided into a plurality of components to perform a function, and each of these components can be divided into a plurality of components. Integrated embodiments and separate embodiments of the constituent parts are also included in the scope of the present invention as long as they do not deviate from the essence of the present invention.

또한, 일부의 구성 요소는 본 발명에서 본질적인 기능을 수행하는 필수적인 구성 요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성 요소일 수 있다. 본 발명은 단지 성능 향상을 위해 사용되는 구성 요소를 제외한 본 발명의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 단지 성능 향상을 위해 사용되는 선택적 구성 요소를 제외한 필수 구성 요소만을 포함한 구조도 본 발명의 권리범위에 포함된다.Additionally, some components may not be essential components that perform essential functions in the present invention, but may simply be optional components to improve performance. The present invention can be implemented by including only essential components for implementing the essence of the present invention excluding components used only to improve performance, and a structure including only essential components excluding optional components used only to improve performance. is also included in the scope of rights of the present invention.

픽처 분할부(110)는 입력된 픽처를 적어도 하나의 처리 단위로 분할할 수 있다. 이때, 처리 단위는 예측 단위(Prediction Unit: PU)일 수도 있고, 변환 단위(Transform Unit: TU)일 수도 있으며, 부호화 단위(Coding Unit: CU)일 수도 있다. 픽처 분할부(110)에서는 하나의 픽처에 대해 복수의 부호화 단위, 예측 단위 및 변환 단위의 조합으로 분할하고 소정의 기준(예를 들어, 비용 함수)으로 하나의 부호화 단위, 예측 단위 및 변환 단위 조합을 선택하여 픽처를 부호화 할 수 있다.The picture division unit 110 may divide the input picture into at least one processing unit. At this time, the processing unit may be a prediction unit (PU), a transformation unit (TU), or a coding unit (CU). The picture division unit 110 divides one picture into a combination of a plurality of coding units, prediction units, and transformation units, and combines one coding unit, prediction unit, and transformation unit based on a predetermined standard (for example, a cost function). You can encode the picture by selecting .

예를 들어, 하나의 픽처는 복수개의 부호화 단위로 분할될 수 있다. 픽처에서 부호화 단위를 분할하기 위해서는 쿼드 트리 구조(Quad Tree Structure)와 같은 재귀적인 트리 구조를 사용할 수 있는데 하나의 영상 또는 최대 크기 부호화 단위(largest coding unit)를 루트로 하여 다른 부호화 단위로 분할되는 부호화 유닛은 분할된 부호화 단위의 개수만큼의 자식 노드를 가지고 분할될 수 있다. 일정한 제한에 따라 더 이상 분할되지 않는 부호화 단위는 리프 노드가 된다. 즉, 하나의 코딩 유닛에 대하여 정방형 분할만이 가능하다고 가정하는 경우, 하나의 부호화 단위는 최대 4개의 다른 부호화 단위로 분할될 수 있다.For example, one picture may be divided into a plurality of coding units. To split the coding unit in a picture, a recursive tree structure such as the Quad Tree Structure can be used. Coding that is split into other coding units with one image or the largest coding unit as the root. A unit can be divided into child nodes equal to the number of divided coding units. A coding unit that is no longer divided according to certain restrictions becomes a leaf node. That is, assuming that only square division is possible for one coding unit, one coding unit can be divided into up to four different coding units.

이하, 본 발명의 실시예에서는 부호화 단위는 부호화를 수행하는 단위의 의미로 사용할 수도 있고, 복호화를 수행하는 단위의 의미로 사용할 수도 있다.Hereinafter, in the embodiments of the present invention, the coding unit may be used to mean a unit that performs encoding, or may be used to mean a unit that performs decoding.

예측 단위는 하나의 부호화 단위 내에서 동일한 크기의 적어도 하나의 정사각형 또는 직사각형 등의 형태를 가지고 분할된 것일 수도 있고, 하나의 부호화 단위 내에서 분할된 예측 단위 중 어느 하나의 예측 단위가 다른 하나의 예측 단위와 상이한 형태 및/또는 크기를 가지도록 분할된 것일 수도 있다.A prediction unit may be divided into at least one square or rectangular shape of the same size within one coding unit, and any one of the prediction units divided within one coding unit may be a prediction unit of another prediction unit. It may be divided to have a different shape and/or size than the unit.

부호화 단위를 기초로 인트라 예측을 수행하는 예측 단위를 생성시 최소 부호화 단위가 아닌 경우, 복수의 예측 단위 NxN으로 분할하지 않고 인트라 예측을 수행할 수 있다.If the prediction unit for which intra prediction is performed based on the coding unit is not the minimum coding unit when generated, intra prediction can be performed without dividing the prediction unit into a plurality of prediction units NxN.

예측부(120, 125)는 인터 예측을 수행하는 인터 예측부(120)와 인트라 예측을 수행하는 인트라 예측부(125)를 포함할 수 있다. 예측 단위에 대해 인터 예측을 사용할 것인지 또는 인트라 예측을 수행할 것인지를 결정하고, 각 예측 방법에 따른 구체적인 정보(예컨대, 인트라 예측 모드, 모션 벡터, 참조 픽처 등)를 결정할 수 있다. 이때, 예측이 수행되는 처리 단위와 예측 방법 및 구체적인 내용이 정해지는 처리 단위는 다를 수 있다. 예컨대, 예측의 방법과 예측 모드 등은 예측 단위로 결정되고, 예측의 수행은 변환 단위로 수행될 수도 있다. 생성된 예측 블록과 원본 블록 사이의 잔차값(잔차 블록)은 변환부(130)로 입력될 수 있다. 또한, 예측을 위해 사용한 예측 모드 정보, 모션 벡터 정보 등은 잔차값과 함께 엔트로피 부호화부(165)에서 부호화되어 복호화기에 전달될 수 있다. 특정한 부호화 모드를 사용할 경우, 예측부(120, 125)를 통해 예측 블록을 생성하지 않고, 원본 블록을 그대로 부호화하여 복호화부에 전송하는 것도 가능하다.The prediction units 120 and 125 may include an inter prediction unit 120 that performs inter prediction and an intra prediction unit 125 that performs intra prediction. It is possible to determine whether to use inter prediction or intra prediction for a prediction unit, and determine specific information (eg, intra prediction mode, motion vector, reference picture, etc.) according to each prediction method. At this time, the processing unit in which the prediction is performed and the processing unit in which the prediction method and specific contents are determined may be different. For example, the prediction method and prediction mode are determined in prediction units, and prediction may be performed in transformation units. The residual value (residual block) between the generated prediction block and the original block may be input to the conversion unit 130. Additionally, prediction mode information, motion vector information, etc. used for prediction may be encoded in the entropy encoder 165 together with the residual value and transmitted to the decoder. When using a specific encoding mode, it is possible to encode the original block as is and transmit it to the decoder without generating a prediction block through the prediction units 120 and 125.

인터 예측부(120)는 현재 픽처의 이전 픽처 또는 이후 픽처 중 적어도 하나의 픽처의 정보를 기초로 예측 단위를 예측할 수도 있고, 경우에 따라서는 현재 픽처 내의 부호화가 완료된 일부 영역의 정보를 기초로 예측 단위를 예측할 수도 있다. 인터 예측부(120)는 참조 픽처 보간부, 모션 예측부, 움직임 보상부를 포함할 수 있다. The inter prediction unit 120 may predict a prediction unit based on information on at least one picture among the pictures before or after the current picture, and in some cases, prediction based on information on a partially encoded region within the current picture. Units can also be predicted. The inter prediction unit 120 may include a reference picture interpolation unit, a motion prediction unit, and a motion compensation unit.

참조 픽처 보간부에서는 메모리(155)로부터 참조 픽처 정보를 제공받고 참조 픽처에서 정수 화소 이하의 화소 정보를 생성할 수 있다. 휘도 화소의 경우, 1/4 화소 단위로 정수 화소 이하의 화소 정보를 생성하기 위해 필터 계수를 달리하는 DCT 기반의 8탭 보간 필터(DCT-based Interpolation Filter)가 사용될 수 있다. 색차 신호의 경우 1/8 화소 단위로 정수 화소 이하의 화소 정보를 생성하기 위해 필터 계수를 달리하는 DCT 기반의 4탭 보간 필터(DCT-based Interpolation Filter)가 사용될 수 있다.The reference picture interpolator may receive reference picture information from the memory 155 and generate pixel information of an integer number of pixels or less from the reference picture. In the case of luminance pixels, a DCT-based 8-tap interpolation filter with different filter coefficients can be used to generate pixel information of an integer pixel or less in 1/4 pixel units. In the case of color difference signals, a DCT-based 4-tap interpolation filter with different filter coefficients can be used to generate pixel information of an integer pixel or less in 1/8 pixel units.

모션 예측부는 참조 픽처 보간부에 의해 보간된 참조 픽처를 기초로 모션 예측을 수행할 수 있다. 모션 벡터를 산출하기 위한 방법으로 FBMA(Full search-based Block Matching Algorithm), TSS(Three Step Search), NTS(New Three-Step Search Algorithm) 등 다양한 방법이 사용될 수 있다. 모션 벡터는 보간된 화소를 기초로 1/2 또는 1/4 화소 단위의 모션 벡터 값을 가질 수 있다. 모션 예측부에서는 모션 예측 방법을 다르게 하여 현재 예측 단위를 예측할 수 있다. 모션 예측 방법으로 스킵(Skip) 방법, 머지(Merge) 방법, AMVP(Advanced Motion Vector Prediction) 방법, 인트라 블록 카피(Intra Block Copy) 방법 등 다양한 방법이 사용될 수 있다.The motion prediction unit may perform motion prediction based on the reference picture interpolated by the reference picture interpolation unit. Various methods such as FBMA (Full search-based Block Matching Algorithm), TSS (Three Step Search), and NTS (New Three-Step Search Algorithm) can be used to calculate the motion vector. The motion vector may have a motion vector value in units of 1/2 or 1/4 pixels based on the interpolated pixels. The motion prediction unit can predict the current prediction unit by using a different motion prediction method. As a motion prediction method, various methods such as the skip method, the merge method, the Advanced Motion Vector Prediction (AMVP) method, and the intra block copy method can be used.

인트라 예측부(125)는 현재 픽처 내의 화소 정보인 현재 블록 주변의 참조 픽셀 정보를 기초로 예측 단위를 생성할 수 있다. 현재 예측 단위의 주변 블록이 인터 예측을 수행한 블록이어서, 참조 픽셀이 인터 예측을 수행한 픽셀일 경우, 인터 예측을 수행한 블록에 포함되는 참조 픽셀을 주변의 인트라 예측을 수행한 블록의 참조 픽셀 정보로 대체하여 사용할 수 있다. 즉, 참조 픽셀이 가용하지 않는 경우, 가용하지 않은 참조 픽셀 정보를 가용한 참조 픽셀 중 적어도 하나의 참조 픽셀로 대체하여 사용할 수 있다.The intra prediction unit 125 may generate a prediction unit based on reference pixel information around the current block, which is pixel information in the current picture. If the neighboring block of the current prediction unit is a block on which inter prediction has been performed and the reference pixel is a pixel on which inter prediction has been performed, the reference pixel included in the block on which inter prediction has been performed is the reference pixel of the block on which intra prediction has been performed. It can be used in place of information. That is, when a reference pixel is not available, the unavailable reference pixel information can be replaced with at least one reference pixel among available reference pixels.

인트라 예측에서 예측 모드는 참조 픽셀 정보를 예측 방향에 따라 사용하는 방향성 예측 모드와 예측을 수행시 방향성 정보를 사용하지 않는 비방향성 모드를 가질 수 있다. 상기 방향성 예측 모드의 개수는 HEVC 표준에 정의된 33개와 같거나 그 이상일 수 있으며, 예를 들어 60 내지 70 범위 내의 개수로 확장될 수 있다. 휘도 정보를 예측하기 위한 모드와 색차 정보를 예측하기 위한 모드가 상이할 수 있고, 색차 정보를 예측하기 위해 휘도 정보를 예측하기 위해 사용된 인트라 예측 모드 정보 또는 예측된 휘도 신호 정보를 활용할 수 있다.In intra prediction, the prediction mode can include a directional prediction mode that uses reference pixel information according to the prediction direction and a non-directional mode that does not use directional information when performing prediction. The number of directional prediction modes may be equal to or more than 33 defined in the HEVC standard, and may be expanded to a number in the range of 60 to 70, for example. The mode for predicting luminance information and the mode for predicting chrominance information may be different, and intra prediction mode information used to predict luminance information or predicted luminance signal information may be used to predict chrominance information.

인트라 예측을 수행할 때 예측 단위의 크기와 변환 단위의 크기가 동일할 경우, 예측 단위의 좌측에 존재하는 픽셀, 좌측 상단에 존재하는 픽셀, 상단에 존재하는 픽셀을 기초로 예측 단위에 대한 인트라 예측을 수행할 수 있다. 그러나 인트라 예측을 수행할 때 예측 단위의 크기와 변환 단위의 크기가 상이할 경우, 변환 단위를 기초로 한 참조 픽셀을 이용하여 인트라 예측을 수행할 수 있다. 또한, 최소 부호화 단위에 대해서만 N x N 분할을 사용하는 인트라 예측을 사용할 수 있다.When performing intra prediction, if the size of the prediction unit and the size of the transformation unit are the same, intra prediction for the prediction unit is made based on the pixel on the left, the pixel on the top left, and the pixel on the top of the prediction unit. can be performed. However, when performing intra prediction, if the size of the prediction unit and the size of the transformation unit are different, intra prediction can be performed using a reference pixel based on the transformation unit. Additionally, intra prediction using N x N partitioning can be used only for the minimum coding unit.

인트라 예측 방법은 예측 모드에 따라 참조 화소에 AIS(Adaptive Intra Smoothing) 필터를 적용한 후 예측 블록을 생성할 수 있다. 참조 화소에 적용되는 AIS 필터의 종류는 상이할 수 있다. 인트라 예측 방법을 수행하기 위해 현재 예측 단위의 인트라 예측 모드는 현재 예측 단위의 주변에 존재하는 예측 단위의 인트라 예측 모드로부터 예측할 수 있다. 주변 예측 단위로부터 예측된 모드 정보를 이용하여 현재 예측 단위의 예측 모드를 예측하는 경우, 현재 예측 단위와 주변 예측 단위의 인트라 예측 모드가 동일하면 소정의 플래그 정보를 이용하여 현재 예측 단위와 주변 예측 단위의 예측 모드가 동일하다는 정보를 전송할 수 있고, 만약 현재 예측 단위와 주변 예측 단위의 예측 모드가 상이하면 엔트로피 부호화를 수행하여 현재 블록의 예측 모드 정보를 부호화할 수 있다.The intra prediction method can generate a prediction block after applying an Adaptive Intra Smoothing (AIS) filter to the reference pixel according to the prediction mode. The type of AIS filter applied to the reference pixel may be different. To perform the intra prediction method, the intra prediction mode of the current prediction unit can be predicted from the intra prediction mode of prediction units existing around the current prediction unit. When predicting the prediction mode of the current prediction unit using predicted mode information from neighboring prediction units, if the intra prediction mode of the current prediction unit and neighboring prediction units are the same, predetermined flag information is used to predict the current prediction unit and neighboring prediction units. Information that the prediction modes of are the same can be transmitted, and if the prediction modes of the current prediction unit and neighboring prediction units are different, entropy encoding can be performed to encode the prediction mode information of the current block.

또한, 예측부(120, 125)에서 생성된 예측 단위를 기초로 예측을 수행한 예측 단위와 예측 단위의 원본 블록과 차이 값인 잔차값(Residual) 정보를 포함하는 잔차 블록이 생성될 수 있다. 생성된 잔차 블록은 변환부(130)로 입력될 수 있다. Additionally, based on the prediction units generated by the prediction units 120 and 125, a residual block may be generated that includes residual information that is the difference between the prediction unit and the original block of the prediction unit. The generated residual block may be input to the conversion unit 130.

변환부(130)에서는 원본 블록과 예측부(120, 125)를 통해 생성된 예측 단위의 잔차값(residual)정보를 포함한 잔차 블록을 DCT(Discrete Cosine Transform), DST(Discrete Sine Transform), KLT와 같은 변환 방법을 사용하여 변환시킬 수 있다. 잔차 블록을 변환하기 위해 DCT를 적용할지, DST를 적용할지 또는 KLT를 적용할지는 잔차 블록을 생성하기 위해 사용된 예측 단위의 인트라 예측 모드 정보를 기초로 결정할 수 있다. The transform unit 130 transforms the residual block, including the original block and the residual value information of the prediction unit generated through the prediction units 120 and 125, into DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT and It can be converted using the same conversion method. Whether to apply DCT, DST, or KLT to transform the residual block can be determined based on intra prediction mode information of the prediction unit used to generate the residual block.

양자화부(135)는 변환부(130)에서 주파수 영역으로 변환된 값들을 양자화할 수 있다. 블록에 따라 또는 영상의 중요도에 따라 양자화 계수는 변할 수 있다. 양자화부(135)에서 산출된 값은 역양자화부(140)와 재정렬부(160)에 제공될 수 있다.The quantization unit 135 may quantize the values converted to the frequency domain by the conversion unit 130. The quantization coefficient may change depending on the block or the importance of the image. The value calculated by the quantization unit 135 may be provided to the inverse quantization unit 140 and the realignment unit 160.

재정렬부(160)는 양자화된 잔차 값에 대해 계수 값의 재정렬을 수행할 수 있다.The rearrangement unit 160 may rearrange coefficient values for the quantized residual values.

재정렬부(160)는 계수 스캐닝(Coefficient Scanning) 방법을 통해 2차원의 블록 형태 계수를 1차원의 벡터 형태로 변경할 수 있다. 예를 들어, 재정렬부(160)에서는 지그-재그 스캔(Zig-Zag Scan)방법을 이용하여 DC 계수부터 고주파수 영역의 계수까지 스캔하여 1차원 벡터 형태로 변경시킬 수 있다. 변환 단위의 크기 및 인트라 예측 모드에 따라 지그-재그 스캔 대신 2차원의 블록 형태 계수를 열 방향으로 스캔하는 수직 스캔, 2차원의 블록 형태 계수를 행 방향으로 스캔하는 수평 스캔이 사용될 수도 있다. 즉, 변환 단위의 크기 및 인트라 예측 모드에 따라 지그-재그 스캔, 수직 방향 스캔 및 수평 방향 스캔 중 어떠한 스캔 방법이 사용될지 여부를 결정할 수 있다.The rearrangement unit 160 can change the coefficients in a two-dimensional block form into a one-dimensional vector form through a coefficient scanning method. For example, the realignment unit 160 can scan from DC coefficients to coefficients in the high frequency region using a zig-zag scan method and change it into a one-dimensional vector form. Depending on the size of the transformation unit and the intra prediction mode, a vertical scan that scans the two-dimensional block-type coefficients in the column direction or a horizontal scan that scans the two-dimensional block-type coefficients in the row direction may be used instead of the zig-zag scan. That is, depending on the size of the transformation unit and the intra prediction mode, it can be determined which scan method among zig-zag scan, vertical scan, and horizontal scan will be used.

엔트로피 부호화부(165)는 재정렬부(160)에 의해 산출된 값들을 기초로 엔트로피 부호화를 수행할 수 있다. 엔트로피 부호화는 예를 들어, 지수 골롬(Exponential Golomb), CAVLC(Context-Adaptive Variable Length Coding), CABAC(Context-Adaptive Binary Arithmetic Coding)과 같은 다양한 부호화 방법을 사용할 수 있다.The entropy encoding unit 165 may perform entropy encoding based on the values calculated by the reordering unit 160. Entropy coding can use various coding methods, such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC).

엔트로피 부호화부(165)는 재정렬부(160) 및 예측부(120, 125)로부터 부호화 단위의 잔차값 계수 정보 및 블록 타입 정보, 예측 모드 정보, 분할 단위 정보, 예측 단위 정보 및 전송 단위 정보, 모션 벡터 정보, 참조 프레임 정보, 블록의 보간 정보, 필터링 정보 등 다양한 정보를 부호화할 수 있다. The entropy encoding unit 165 receives the residual value coefficient information and block type information of the coding unit, prediction mode information, division unit information, prediction unit information and transmission unit information, and motion information from the reordering unit 160 and the prediction units 120 and 125. Various information such as vector information, reference frame information, block interpolation information, and filtering information can be encoded.

엔트로피 부호화부(165)에서는 재정렬부(160)에서 입력된 부호화 단위의 계수 값을 엔트로피 부호화할 수 있다.The entropy encoding unit 165 may entropy encode the coefficient value of the coding unit input from the reordering unit 160.

역양자화부(140) 및 역변환부(145)에서는 양자화부(135)에서 양자화된 값들을 역양자화하고 변환부(130)에서 변환된 값들을 역변환한다. 역양자화부(140) 및 역변환부(145)에서 생성된 잔차값(Residual)은 예측부(120, 125)에 포함된 움직임 추정부, 움직임 보상부 및 인트라 예측부를 통해서 예측된 예측 단위와 합쳐져 복원 블록(Reconstructed Block)을 생성할 수 있다. The inverse quantization unit 140 and the inverse transformation unit 145 inversely quantize the values quantized in the quantization unit 135 and inversely transform the values transformed in the transformation unit 130. The residual value generated in the inverse quantization unit 140 and the inverse transform unit 145 is restored by combining the prediction units predicted through the motion estimation unit, motion compensation unit, and intra prediction unit included in the prediction units 120 and 125. You can create a block (Reconstructed Block).

필터부(150)는 디블록킹 필터, 오프셋 보정부, ALF(Adaptive Loop Filter)중 적어도 하나를 포함할 수 있다.The filter unit 150 may include at least one of a deblocking filter, an offset correction unit, and an adaptive loop filter (ALF).

디블록킹 필터는 복원된 픽처에서 블록 간의 경계로 인해 생긴 블록 왜곡을 제거할 수 있다. 디블록킹을 수행할지 여부를 판단하기 위해 블록에 포함된 몇 개의 열 또는 행에 포함된 픽셀을 기초로 현재 블록에 디블록킹 필터 적용할지 여부를 판단할 수 있다. 블록에 디블록킹 필터를 적용하는 경우 필요한 디블록킹 필터링 강도에 따라 강한 필터(Strong Filter) 또는 약한 필터(Weak Filter)를 적용할 수 있다. 또한 디블록킹 필터를 적용함에 있어 수직 필터링 및 수평 필터링 수행시 수평 방향 필터링 및 수직 방향 필터링이 병행 처리되도록 할 수 있다.The deblocking filter can remove block distortion caused by boundaries between blocks in the restored picture. To determine whether to perform deblocking, it is possible to determine whether to apply a deblocking filter to the current block based on the pixels included in several columns or rows included in the block. When applying a deblocking filter to a block, a strong filter or a weak filter can be applied depending on the required deblocking filtering strength. Additionally, when applying a deblocking filter, horizontal filtering and vertical filtering can be processed in parallel when vertical filtering and horizontal filtering are performed.

오프셋 보정부는 디블록킹을 수행한 영상에 대해 픽셀 단위로 원본 영상과의 오프셋을 보정할 수 있다. 특정 픽처에 대한 오프셋 보정을 수행하기 위해 영상에 포함된 픽셀을 일정한 수의 영역으로 구분한 후 오프셋을 수행할 영역을 결정하고 해당 영역에 오프셋을 적용하는 방법 또는 각 픽셀의 에지 정보를 고려하여 오프셋을 적용하는 방법을 사용할 수 있다.The offset correction unit may correct the offset of the deblocked image from the original image in pixel units. In order to perform offset correction for a specific picture, the pixels included in the image are divided into a certain number of areas, then the area to perform offset is determined and the offset is applied to that area, or the offset is performed by considering the edge information of each pixel. You can use the method of applying .

ALF(Adaptive Loop Filtering)는 필터링한 복원 영상과 원래의 영상을 비교한 값을 기초로 수행될 수 있다. 영상에 포함된 픽셀을 소정의 그룹으로 나눈 후 해당 그룹에 적용될 하나의 필터를 결정하여 그룹마다 차별적으로 필터링을 수행할 수 있다. ALF를 적용할지 여부에 관련된 정보는 휘도 신호는 부호화 단위(Coding Unit, CU) 별로 전송될 수 있고, 각각의 블록에 따라 적용될 ALF 필터의 모양 및 필터 계수는 달라질 수 있다. 또한, 적용 대상 블록의 특성에 상관없이 동일한 형태(고정된 형태)의 ALF 필터가 적용될 수도 있다. Adaptive Loop Filtering (ALF) can be performed based on a comparison between the filtered restored image and the original image. After dividing the pixels included in the image into predetermined groups, filtering can be performed differentially for each group by determining one filter to be applied to that group. Information related to whether to apply ALF may be transmitted for each coding unit (CU), and the shape and filter coefficients of the ALF filter to be applied may vary for each block. Additionally, an ALF filter of the same type (fixed type) may be applied regardless of the characteristics of the block to which it is applied.

메모리(155)는 필터부(150)를 통해 산출된 복원 블록 또는 픽처를 저장할 수 있고, 저장된 복원 블록 또는 픽처는 인터 예측을 수행 시 예측부(120, 125)에 제공될 수 있다.The memory 155 may store a reconstructed block or picture calculated through the filter unit 150, and the stored reconstructed block or picture may be provided to the prediction units 120 and 125 when inter prediction is performed.

최근 HD(High Definition) 영상 및 UHD(Ultra High Definition) 영상과 같은 고해상도, 고품질의 영상에 대한 수요가 다양한 응용 분야에서 증가하고 있다. 마찬가지로, 기계학습을 활용한 물체 인식 기술이 발전함에 따라 영상 압축과 함께 기술을 개발하고자 하는 수요가 증가하고 있다. 따라서, 물체 인식을 적용한 알고리즘을 통해 얻은 데이터를 바탕으로 영상 압축의 고해상도, 고품질화를 만들어줄 수 있다.Recently, demand for high-resolution, high-quality images such as HD (High Definition) images and UHD (Ultra High Definition) images is increasing in various application fields. Likewise, as object recognition technology using machine learning develops, the demand for developing technology with image compression is increasing. Therefore, based on data obtained through an algorithm that applies object recognition, high-resolution and high-quality image compression can be achieved.

[1] 은 물체 인식 알고리즘을 적용하여 얻은 데이터를 물체 기반 영상 압축에 활용할 수 있다. 물체 인식 알고리즘을 통해서 얻은 데이터를 활용하여, 물체의 개수 또는 종류에 따라 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열의 단위 또는 물체 기반 압축 단위와 같은 영상의 구분을 나눌 수 있는 단위를 추가 또는 대체하여 영상을 압축할 수 있다. 또한, 얻은 데이터를 기반으로 영상을 효율적으로 압축할 수 있는 방법을 제시한다. 물체 인식 알고리즘으로부터 얻은 관련 데이터를 엔트로피 부호화/복호화를 통해서 압축하여 전송 또는 저장할 수 있다.[1] can use the data obtained by applying an object recognition algorithm for object-based image compression. By using data obtained through an object recognition algorithm, a unit that can divide the image according to the number or type of objects, such as a picture/slice/tile/tile group/CTU/CTU row/CTU column unit or object-based compression unit You can compress the video by adding or replacing . In addition, we present a method to efficiently compress images based on the obtained data. Related data obtained from the object recognition algorithm can be compressed and transmitted or stored through entropy encoding/decoding.

[2] 은 물체 인식 알고리즘과 영상 압축 표준화 기술이 독립적으로 사용될 수 있다. [1]과 다르게 물체 인식 알고리즘을 통해서 얻은 정보와 영상 압축 표준화의 비트스트림을 따로 전송하게 한다. 이로써, 영상 압축 표준화 기술에서 물체 인식 알고리즘을 통해 얻은 데이터를 공유하며, 압축을 수행할 수 있다.[2] The object recognition algorithm and image compression standardization technology can be used independently. Unlike [1], the information obtained through the object recognition algorithm and the bitstream of image compression standardization are transmitted separately. As a result, data obtained through object recognition algorithms in image compression standardization technology can be shared and compression performed.

종래 기술에서는 물체 인식을 활용한 영상 압축 기술이 존재하지 않는다. 또한, 물체 인식의 기술이 더욱 더 간결하게 발전함에 따라 영상 압축 기술과 같이 활용할 수 있는 기술이 개발이 필요하다.In the prior art, there is no image compression technology using object recognition. Additionally, as object recognition technology becomes more concise, development of technologies that can be utilized, such as image compression technology, is necessary.

본 발명은 물체 인식 알고리즘을 적용과 함께 영상 압축 기술을 함께 사용하기 위함이다. 물체 인식 알고리즘에서 얻은 데이터를 바탕으로 물체에 대해서 압축을 효율적으로 할 수 있다.The purpose of the present invention is to use image compression technology together with the application of an object recognition algorithm. Based on the data obtained from the object recognition algorithm, compression of objects can be performed efficiently.

본 발명은 물체 인식 알고리즘을 통해 얻은 데이터와 영상 압축의 결합을 통해서 효과적인 영상 화질을 제공할 수 있다.The present invention can provide effective image quality through a combination of data obtained through an object recognition algorithm and image compression.

본 발명은 물체 인식 알고리즘을 통해서 얻은 데이터인 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 등을 사용하여 영상의 화질을 향상시키는 방법을 특징으로하는 영상 부호화/복호화 방법 및 장치이다.The present invention provides data obtained through an object recognition algorithm, such as x-axis and y-axis information of the recognized object, width of the recognized object, height of the recognized object, type of object, recognized accuracy of the object, and object identification number. It is a video encoding/decoding method and device characterized by a method of improving video quality by using a video encoding/decoding method and device.

본 발명은 물체 인식 알고리즘을 통해서 얻은 데이터를 바탕으로 압축 단위 재구성에서 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열의 단위 또는 물체 기반 압축 단위(Object-based Coding Unit) 추가(포함) 또는 대체하는 방법을 제공하는 것을 특징으로 하는 영상 부호화/복호화 방법 및 장치이다.The present invention adds (includes) a unit of picture/slice/tile/tile group/CTU/CTU row/CTU column or object-based compression unit (Object-based Coding Unit) in compression unit reconstruction based on data obtained through an object recognition algorithm. Or, a video encoding/decoding method and device characterized by providing a replacement method.

본 발명은 물체 인식 알고리즘과 영상 압축의 기술을 접목하여, 인식된 물체 영역에 대해서 효율적으로 영상을 압축할 수 있는 방법을 제공하는 부호화/복호화 방법 및 장치이다.The present invention is an encoding/decoding method and device that combines an object recognition algorithm and image compression technology to provide a method for efficiently compressing images for a recognized object area.

본 발명은 기존에 존재하는 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열의 단위 또는 물체 기반 압축 단위 추가(포함) 또는 대체하는 방법을 제공하며, 기존의 영상 압축 방식의 틀을 깰 수 있는 부호화/복호화 방법 및 장치를 제공한다.The present invention provides a method of adding (including) or replacing existing picture/slice/tile/tile group/CTU/CTU row/CTU column units or object-based compression units, and breaks the mold of existing video compression methods. Provides an encoding/decoding method and device that can

도 2는 영상 복호화 장치를 나타낸 블록도이다.Figure 2 is a block diagram showing a video decoding device.

[도 2] 영상 복호화 장치[Figure 2] Video decoding device

도 2를 참조하면, 영상 복호화 장치(200)는 엔트로피 복호화부(210), 재정렬부(215), 역양자화부(220), 역변환부(225), 예측부(230, 235), 필터부(240), 메모리(245)가 포함될 수 있다.Referring to FIG. 2, the image decoding device 200 includes an entropy decoding unit 210, a reordering unit 215, an inverse quantization unit 220, an inverse transform unit 225, a prediction unit 230, 235, and a filter unit ( 240) and memory 245 may be included.

영상 부호화 장치에서 영상 비트스트림이 입력된 경우, 입력된 비트스트림은 영상 부호화 장치와 반대의 절차로 복호화될 수 있다.When a video bitstream is input from a video encoding device, the input bitstream can be decoded in a procedure opposite to that of the video encoding device.

엔트로피 복호화부(210)는 영상 부호화 장치의 엔트로피 부호화부에서 엔트로피 부호화를 수행한 것과 반대의 절차로 엔트로피 복호화를 수행할 수 있다. 예를 들어, 영상 부호화 장치에서 수행된 방법에 대응하여 지수 골롬(Exponential Golomb), CAVLC(Context-Adaptive Variable Length Coding), CABAC(Context-Adaptive Binary Arithmetic Coding)과 같은 다양한 방법이 적용될 수 있다. The entropy decoding unit 210 may perform entropy decoding in a procedure opposite to the procedure in which entropy encoding is performed in the entropy encoding unit of the video encoding device. For example, various methods such as Exponential Golomb, CAVLC (Context-Adaptive Variable Length Coding), and CABAC (Context-Adaptive Binary Arithmetic Coding) may be applied in response to the method performed in the image encoding device.

엔트로피 복호화부(210)에서는 부호화기에서 수행된 인트라 예측 및 인터 예측에 관련된 정보를 복호화할 수 있다.The entropy decoder 210 can decode information related to intra prediction and inter prediction performed by the encoder.

재정렬부(215)는 엔트로피 복호화부(210)에서 엔트로피 복호화된 비트스트림을 부호화부에서 재정렬한 방법을 기초로 재정렬을 수행할 수 있다. 1차원 벡터 형태로 표현된 계수들을 다시 2차원의 블록 형태의 계수로 복원하여 재정렬할 수 있다. 재정렬부(215)에서는 부호화부에서 수행된 계수 스캐닝에 관련된 정보를 제공받고 해당 부호화부에서 수행된 스캐닝 순서에 기초하여 역으로 스캐닝하는 방법을 통해 재정렬을 수행할 수 있다.The reordering unit 215 may rearrange the bitstream entropy-decoded by the entropy decoding unit 210 based on the method in which the encoder rearranges the bitstream. Coefficients expressed in the form of a one-dimensional vector can be restored and rearranged as coefficients in the form of a two-dimensional block. The reordering unit 215 may receive information related to coefficient scanning performed by the encoder and perform reordering by reverse scanning based on the scanning order performed by the encoder.

역양자화부(220)는 부호화기에서 제공된 양자화 파라미터와 재정렬된 블록의 계수 값을 기초로 역양자화를 수행할 수 있다. The inverse quantization unit 220 may perform inverse quantization based on the quantization parameters provided by the encoder and the coefficient values of the rearranged blocks.

역변환부(225)는 영상 부호화 장치에서 수행한 양자화 결과에 대해 변환부에서 수행한 변환 즉, DCT, DST, 및 KLT에 대해 역변환 즉, 역 DCT, 역 DST 및 역 KLT를 수행할 수 있다. 역변환은 영상 부호화 장치에서 결정된 전송 단위를 기초로 수행될 수 있다. 영상 복호화 장치의 역변환부(225)에서는 예측 방법, 현재 블록의 크기 및 예측 방향 등 복수의 정보에 따라 변환 기법(예를 들어, DCT, DST, KLT)이 선택적으로 수행될 수 있다.The inverse transform unit 225 may perform inverse transform, that is, inverse DCT, inverse DST, and inverse KLT, on the transform performed by the transformer, that is, DCT, DST, and KLT, on the quantization result performed by the image encoding device. Inverse transformation may be performed based on the transmission unit determined by the video encoding device. The inverse transform unit 225 of the video decoding device may selectively perform a transformation technique (eg, DCT, DST, KLT) according to a plurality of information such as a prediction method, the size of the current block, and the prediction direction.

예측부(230, 235)는 엔트로피 복호화부(210)에서 제공된 예측 블록 생성 관련 정보와 메모리(245)에서 제공된 이전에 복호화된 블록 또는 픽처 정보를 기초로 예측 블록을 생성할 수 있다. The prediction units 230 and 235 may generate a prediction block based on prediction block generation-related information provided by the entropy decoder 210 and previously decoded block or picture information provided by the memory 245.

전술한 바와 같이 영상 부호화 장치에서의 동작과 동일하게 인트라 예측을 수행시 예측 단위의 크기와 변환 단위의 크기가 동일할 경우, 예측 단위의 좌측에 존재하는 픽셀, 좌측 상단에 존재하는 픽셀, 상단에 존재하는 픽셀을 기초로 예측 단위에 대한 인트라 예측을 수행하지만, 인트라 예측을 수행시 예측 단위의 크기와 변환 단위의 크기가 상이할 경우, 변환 단위를 기초로 한 참조 픽셀을 이용하여 인트라 예측을 수행할 수 있다. 또한, 최소 부호화 단위에 대해서만 N x N 분할을 사용하는 인트라 예측을 사용할 수도 있다.As described above, when performing intra prediction in the same manner as the operation in the video encoding device, when the size of the prediction unit and the size of the transformation unit are the same, the pixel existing on the left of the prediction unit, the pixel existing on the upper left, and the size of the transformation unit are the same. Intra prediction of the prediction unit is performed based on existing pixels, but when performing intra prediction, if the size of the prediction unit and the size of the transformation unit are different, intra prediction is performed using a reference pixel based on the transformation unit. can do. Additionally, intra prediction using N x N partitioning only for the minimum coding unit can be used.

예측부(230, 235)는 예측 단위 판별부, 인터 예측부 및 인트라 예측부를 포함할 수 있다. 예측 단위 판별부는 엔트로피 복호화부(210)에서 입력되는 예측 단위 정보, 인트라 예측 방법의 예측 모드 정보, 인터 예측 방법의 모션 예측 관련 정보 등 다양한 정보를 입력 받고 현재 부호화 단위에서 예측 단위를 구분하고, 예측 단위가 인터 예측을 수행하는지 아니면 인트라 예측을 수행하는지 여부를 판별할 수 있다. 인터 예측부(230)는 영상 부호화 장치에서 제공된 현재 예측 단위의 인터 예측에 필요한 정보를 이용해 현재 예측 단위가 포함된 현재 픽처의 이전 픽처 또는 이후 픽처 중 적어도 하나의 픽처에 포함된 정보를 기초로 현재 예측 단위에 대한 인터 예측을 수행할 수 있다. 또는, 현재 예측 단위가 포함된 현재 픽처 내에서 기-복원된 일부 영역의 정보를 기초로 인터 예측을 수행할 수도 있다.The prediction units 230 and 235 may include a prediction unit determination unit, an inter prediction unit, and an intra prediction unit. The prediction unit discriminator receives various information such as prediction unit information input from the entropy decoder 210, prediction mode information of the intra prediction method, and motion prediction-related information of the inter prediction method, distinguishes the prediction unit from the current coding unit, and makes predictions. It is possible to determine whether a unit performs inter-prediction or intra-prediction. The inter prediction unit 230 uses the information required for inter prediction of the current prediction unit provided by the video encoding device to determine the current prediction unit based on information included in at least one of the pictures before or after the current picture including the current prediction unit. Inter prediction can be performed on prediction units. Alternatively, inter prediction may be performed based on information on a pre-restored partial region within the current picture including the current prediction unit.

인터 예측을 수행하기 위해 부호화 단위를 기준으로 해당 부호화 단위에 포함된 예측 단위의 모션 예측 방법이 스킵 모드(Skip Mode), 머지 모드(Merge 모드), AMVP 모드(AMVP Mode), 인트라 블록 카피 모드 중 어떠한 방법인지 여부를 판단할 수 있다.To perform inter prediction, based on the coding unit, the motion prediction method of the prediction unit included in the coding unit is Skip Mode, Merge Mode, AMVP Mode, and Intra Block Copy Mode. You can judge whether it is a certain method or not.

인트라 예측부(235)는 현재 픽처 내의 화소 정보를 기초로 예측 블록을 생성할 수 있다. 예측 단위가 인트라 예측을 수행한 예측 단위인 경우, 영상 부호화 장치에서 제공된 예측 단위의 인트라 예측 모드 정보를 기초로 인트라 예측을 수행할 수 있다. 인트라 예측부(235)에는 AIS(Adaptive Intra Smoothing) 필터, 참조 화소 보간부, DC 필터를 포함할 수 있다. AIS 필터는 현재 블록의 참조 화소에 필터링을 수행하는 부분으로써 현재 예측 단위의 예측 모드에 따라 필터의 적용 여부를 결정하여 적용할 수 있다. 영상 부호화 장치에서 제공된 예측 단위의 예측 모드 및 AIS 필터 정보를 이용하여 현재 블록의 참조 화소에 AIS 필터링을 수행할 수 있다. 현재 블록의 예측 모드가 AIS 필터링을 수행하지 않는 모드일 경우, AIS 필터는 적용되지 않을 수 있다.The intra prediction unit 235 may generate a prediction block based on pixel information in the current picture. If the prediction unit is a prediction unit that has performed intra prediction, intra prediction may be performed based on intra prediction mode information of the prediction unit provided by the video encoding device. The intra prediction unit 235 may include an Adaptive Intra Smoothing (AIS) filter, a reference pixel interpolation unit, and a DC filter. The AIS filter is a part that performs filtering on the reference pixels of the current block, and can be applied by determining whether or not to apply the filter according to the prediction mode of the current prediction unit. AIS filtering can be performed on the reference pixel of the current block using the prediction mode and AIS filter information of the prediction unit provided by the video encoding device. If the prediction mode of the current block is a mode that does not perform AIS filtering, the AIS filter may not be applied.

참조 화소 보간부는 예측 단위의 예측 모드가 참조 화소를 보간한 화소 값을 기초로 인트라 예측을 수행하는 예측 단위일 경우, 참조 화소를 보간하여 정수 값 이하의 화소 단위의 참조 화소를 생성할 수 있다. 현재 예측 단위의 예측 모드가 참조 화소를 보간하지 않고 예측 블록을 생성하는 예측 모드일 경우 참조 화소는 보간되지 않을 수 있다. DC 필터는 현재 블록의 예측 모드가 DC 모드일 경우 필터링을 통해서 예측 블록을 생성할 수 있다.If the prediction mode of the prediction unit is a prediction unit that performs intra prediction based on pixel values by interpolating the reference pixel, the reference pixel interpolator may interpolate the reference pixel to generate a reference pixel in pixel units of an integer value or less. If the prediction mode of the current prediction unit is a prediction mode that generates a prediction block without interpolating the reference pixel, the reference pixel may not be interpolated. The DC filter can generate a prediction block through filtering when the prediction mode of the current block is DC mode.

복원된 블록 또는 픽처는 필터부(240)로 제공될 수 있다. 필터부(240)는 디블록킹 필터, 오프셋 보정부, ALF를 포함할 수 있다.The restored block or picture may be provided to the filter unit 240. The filter unit 240 may include a deblocking filter, an offset correction unit, and an ALF.

영상 부호화 장치로부터 해당 블록 또는 픽처에 디블록킹 필터를 적용하였는지 여부에 대한 정보 및 디블록킹 필터를 적용하였을 경우, 강한 필터를 적용하였는지 또는 약한 필터를 적용하였는지에 대한 정보를 제공받을 수 있다. 영상 복호화 장치의 디블록킹 필터에서는 영상 부호화 장치에서 제공된 디블록킹 필터 관련 정보를 제공받고 영상 복호화 장치에서 해당 블록에 대한 디블록킹 필터링을 수행할 수 있다. Information on whether a deblocking filter has been applied to the corresponding block or picture can be provided from the video encoding device, and when a deblocking filter has been applied, information on whether a strong filter or a weak filter has been applied. The deblocking filter of the video decoding device receives information related to the deblocking filter provided by the video encoding device, and the video decoding device can perform deblocking filtering on the corresponding block.

오프셋 보정부는 부호화시 영상에 적용된 오프셋 보정의 종류 및 오프셋 값 정보 등을 기초로 복원된 영상에 오프셋 보정을 수행할 수 있다.The offset correction unit may perform offset correction on the reconstructed image based on the type of offset correction applied to the image during encoding and offset value information.

ALF는 부호화기로부터 제공된 ALF 적용 여부 정보, ALF 계수 정보 등을 기초로 부호화 단위에 적용될 수 있다. 이러한 ALF 정보는 특정한 파라메터 셋에 포함되어 제공될 수 있다.ALF can be applied to the coding unit based on ALF application availability information, ALF coefficient information, etc. provided from the encoder. This ALF information may be included and provided in a specific parameter set.

메모리(245)는 복원된 픽처 또는 블록을 저장하여 참조 픽처 또는 참조 블록으로 사용할 수 있도록 할 수 있고 또한 복원된 픽처를 출력부로 제공할 수 있다. The memory 245 can store the restored picture or block so that it can be used as a reference picture or reference block, and can also provide the restored picture to an output unit.

전술한 바와 같이 이하, 본 발명의 실시예에서는 설명의 편의상 코딩 유닛(Coding Unit)을 부호화 단위라는 용어로 사용하지만, 부호화 뿐만 아니라 복호화를 수행하는 단위가 될 수도 있다.As described above, hereinafter, in the embodiments of the present invention, the term coding unit is used as a coding unit for convenience of explanation, but it may also be a unit that performs not only encoding but also decoding.

영상 부호화기/복호화기 - 제1-1 구조Video encoder/decoder - 1-1 structure

도 3은 본 발명의 일 실시예에 따른 영상 부호화기를 설명하기 위한 도면이다.Figure 3 is a diagram for explaining a video encoder according to an embodiment of the present invention.

[도 3] 영상 부호화기[Figure 3] Video encoder

도 3을 참조하면, 영상 부호화기(300)는 입력 영상으로부터 획득된 물체 인식 정보에 기반하여 입력 영상을 부호화할 수 있다. 영상 부호화기(300)는 영상 부호화 장치로 지칭될 수도 있으며, 도 1을 참조하여 전술한 영상 부호화 장치(100)와 동일/유사한 구조를 가질 수 있다.Referring to FIG. 3, the image encoder 300 may encode an input image based on object recognition information obtained from the input image. The video encoder 300 may also be referred to as a video encoding device, and may have the same/similar structure as the video encoding device 100 described above with reference to FIG. 1.

물체 인식 정보는 물체 인식기(A)에 의해 생성될 수 있다.Object recognition information may be generated by an object recognizer (A).

구체적으로, 물체 인식기(A)는 입력 영상에 대해 소정의 물체 인식 알고리즘을 적용하여 입력 영상 내에 존재하는 물체를 인식할 수 있다. 실시예에 따라, 상기 물체 인식 알고리즘은 입력 영상 내 특정 유형의 물체만을 인식하도록 설계될 수 있다.Specifically, the object recognizer A can recognize objects present in the input image by applying a predetermined object recognition algorithm to the input image. Depending on the embodiment, the object recognition algorithm may be designed to recognize only specific types of objects in the input image.

물체 인식기(A)는 상기 물체 인식 결과를 데이터화하여 물체 인식 정보를 생성할 수 있다. 물체 인식 정보는 인식된 물체의 좌표, 너비 및 높이, 물체 종류, 인식 정확도 및 물체 식별 번호를 포함할 수 있다. 여기서, 인식된 물체의 좌표는 입력 영상 내에서 상기 물체의 중심 좌표, 좌상측 모서리 좌표, 우상측 모서리 좌표, 좌하측 모서리 좌표 및 우하측 모서리 좌표 중 어느 하나를 나타낼 수 있다.The object recognizer (A) can generate object recognition information by converting the object recognition result into data. Object recognition information may include the coordinates, width and height of the recognized object, object type, recognition accuracy, and object identification number. Here, the coordinates of the recognized object may represent any one of the center coordinates, upper-left corner coordinates, upper-right corner coordinates, lower-left corner coordinates, and lower-right corner coordinates of the object within the input image.

인식된 물체의 좌표, 너비 및 높이는 부동 소수점(floating point)으로 표현될 수 있다. 상기 부동 소수점으로 표현된 값은 입력 영상의 너비 또는 높이와의 곱을 통해 정수화될 수 있다. 예컨대, 상기 물체의 x축 좌표와 너비는 상기 영상의 너비와의 곱을 통해 정수화될 수 있다. 또한, 상기 물체의 y축 좌표와 높이는 상기 영상의 높이와의 곱을 통해 정수화될 수 있다. 이와 같은 정수화 표현을 통해, 물체 인식 정보의 부호화/복호화 효율이 보다 향상될 수 있다. 실시예에 따라, 상기 부동 소수점으로 표현된 값은 정수화되지 않고 그대로 부호화될 수도 있다.The coordinates, width, and height of a recognized object can be expressed as floating points. The value expressed as a floating point can be converted into an integer by multiplying it by the width or height of the input image. For example, the x-axis coordinate and width of the object can be converted into integers by multiplying them by the width of the image. Additionally, the y-axis coordinate and height of the object can be converted into integers by multiplying them by the height of the image. Through this integer representation, the encoding/decoding efficiency of object recognition information can be further improved. Depending on the embodiment, the value expressed as a floating point may be encoded as is without being converted to an integer.

한편, 인식된 물체의 x축 좌표 및 y축 좌표, 너비 및 높이를 각각 개별적으로 전송하는 경우, 부호화 효율이 저하되고 시그널링 오버헤드가 증가할 수 있다. 이와 같은 문제를 해결하기 위하여, 일 실시예에서, 물체 인식기(A)는 부호화 순서상 첫번째 이후의 부호화 대상 물체(이하, '두번째 물체'로 지칭함)에 대해서는 이전 부호화 대상 물체에(이하, '첫번째 물체'로 지칭함) 대한 물체 인식 정보와의 차분으로 물체 인식 정보를 생성할 수 있다. 예컨대, 첫번째 물체의 좌표값은 (42, 118)이고, 너비 및 높이는 각각 50, 30이며, 물체 유형/클래스는 2(예: 차량)이며, 물체 식별 번호는 0이라고 가정한다. 그리고, 두번째 물체의 좌표값은 (70, 220)이고, 너비 및 높이는 각각 40, 28이며, 물체 유형/클래스는 2(예: 차량)이며, 물체 식별 번호는 1이라고 가정한다. 이 경우, 물체 인식기(A)는 첫번째 물체에 대해서는 물체 정보를 그대로 부호화하여 제1 물체 인식 정보를 생성할 수 있다. 이에 반해, 물체 인식기(A)는 두번째 물체에 대해서는 상기 제1 물체 인식 정보와의 차분으로서 제2 물체 인식 정보를 생성할 수 있다. 예컨대, 두번째 물체와 첫번째 물체 간에 x축 좌표의 차분값을 구해보면, 70-42=38이 나온다. 동일한 방식으로, y축 좌표의 차분값을 구해보면, 220-108=102가 나온다. 또한, 높이 및 너비 각각의 차분값을 구해보면, -10 및 -2가 나온다. 물체 유형/클래스의 차분값은 0이 나온다. 이와 같은 산출된 각각의 차분값을 부호화하기 위하여, 물체 인식기(A)는 상기 차분값의 부호(sign)를 나타내는 플래그와 상기 차분값의 절대값을 각각 부호화할 수 있다.Meanwhile, if the x-axis coordinate, y-axis coordinate, width, and height of the recognized object are individually transmitted, coding efficiency may decrease and signaling overhead may increase. In order to solve this problem, in one embodiment, the object recognizer (A) uses the previous encoding object (hereinafter, referred to as 'first object') for the first and subsequent encoding object in the encoding order (hereinafter, referred to as 'second object'). Object recognition information can be generated by differentiating it from the object recognition information for the object (referred to as 'object'). For example, assume that the coordinate values of the first object are (42, 118), the width and height are 50 and 30, respectively, the object type/class is 2 (e.g. vehicle), and the object identification number is 0. Also, assume that the coordinates of the second object are (70, 220), the width and height are 40 and 28, respectively, the object type/class is 2 (e.g. vehicle), and the object identification number is 1. In this case, the object recognizer A can generate first object recognition information by encoding the object information as is for the first object. On the other hand, the object recognizer A may generate second object recognition information for the second object as a difference from the first object recognition information. For example, if you calculate the difference value of the x-axis coordinate between the second object and the first object, you get 70-42=38. In the same way, if you calculate the difference value of the y-axis coordinate, you get 220-108=102. Also, if you calculate the difference values for height and width, you get -10 and -2. The difference value of the object type/class is 0. In order to encode each calculated difference value, the object recognizer A may encode a flag indicating the sign of the difference value and the absolute value of the difference value, respectively.

입력 영상 내에서 인식된 물체가 3개 이상인 경우에도, 상술한 방식이 적용될 수 있다. 예컨대, 물체 인식기(A)는 세번째 물체에 대하여 상기 제2 물체 인식 정보와의 차분으로서 제3 물체 인식 정보를 생성할 수 있다.Even when there are three or more objects recognized in the input image, the above-described method can be applied. For example, the object recognizer A may generate third object recognition information as a difference between the second object recognition information and the third object.

영상 부호화기(100)는 물체 인식기(A)에 의해 생성된 물체 인식 정보에 기반하여 입력 영상을 부호화할 수 있다.The image encoder 100 may encode an input image based on object recognition information generated by the object recognizer (A).

구체적으로, 영상 부호화기(100)는 물체 인식 정보에 기반하여 압축 단위를 재구성할 수 있다(310). 상기 과정은, 입력 영상의 제1 압축 단위를 재구성할지 여부를 결정하고, 상기 제1 압축 단위를 재구성하기로 결정된 경우 상기 물체 인식 정보를 이용하여 상기 제1 압축 단위를 소정의 제2 압축 단위로 재구성함으로써 수행될 수 있다. 여기서, 상기 제1 압축 단위는 기존의 영상 압축 표준에서 사용하는 단위로서, 픽처/슬라이스/ 타일/타일 그룹/CTU/CTU 행/CTU 열 단위를 포함할 수 있다. 이에 반해, 상기 제2 압축 단위는 기존의 영상 압축 표준에서 사용하는 단위가 아닌 새로운 단위로서, 물체 기반 압축 단위(Object-based Coding Unit)를 의미할 수 있다. 일 실시예에서, 상기 제2 압축 단위는, 픽처의 상위 단위로서 슬라이스/ 타일/타일 그룹/CTU/CTU 행/CTU 열을 포함할 수 있다. 또는, 상기 제2 압축 단위는, 슬라이스의 상위 단위로서, 타일/타일 그룹/CTU/CTU 행/CTU 열을 포함할 수 있다. 또는, 상기 제2 압축 단위는, 타일 또는 타일 그룹의 상위 단위로서, CTU/CTU 행/CTU 열을 포함할 수 있다. 또는, 상기 제2 압축 단위는, CTU 행/CTU 열의 상위 단위로서 CTU 열을 포함할 수 있다. 압축 단위 재구성(310)의 구체적 내용은 상세히 후술하기로 한다.Specifically, the image encoder 100 may reconstruct the compression unit based on object recognition information (310). The process determines whether to reconstruct the first compression unit of the input image, and when it is decided to reconstruct the first compression unit, the first compression unit is converted into a predetermined second compression unit using the object recognition information. This can be done by reconstructing. Here, the first compression unit is a unit used in existing video compression standards and may include picture/slice/tile/tile group/CTU/CTU row/CTU column units. On the other hand, the second compression unit is a new unit rather than a unit used in existing video compression standards, and may mean an object-based coding unit. In one embodiment, the second compression unit may include slice/tile/tile group/CTU/CTU row/CTU column as a higher-order unit of the picture. Alternatively, the second compression unit is a higher level unit of the slice and may include a tile/tile group/CTU/CTU row/CTU column. Alternatively, the second compression unit is a higher level unit of a tile or tile group and may include a CTU/CTU row/CTU column. Alternatively, the second compression unit may include a CTU column as a higher-order unit of the CTU row/CTU column. Specific details of the compression unit reconstruction 310 will be described in detail later.

한편, 도 3에서는 물체 인식기(A)에 의해 생성된 물체 인식 정보가 영상 부호화기(300)의 압축 단위 재구성 과정(310)으로만 송신되는 것으로 도시되어 있으나, 이는 예시적인 것일 뿐, 본 발명의 실시예들이 이에 한정되는 것은 아니다. 예컨대, 영상 부호화기(300)는 물체 인식기(A)로부터 수신된 물체 인식 정보를 버퍼에 저장해 놓고, 영상 부호화의 각 동작에 이용할 수도 있다.Meanwhile, in FIG. 3, the object recognition information generated by the object recognizer (A) is shown as being transmitted only through the compression unit reconstruction process 310 of the image encoder 300, but this is only an example and does not apply to the implementation of the present invention. The examples are not limited to this. For example, the image encoder 300 may store the object recognition information received from the object recognizer A in a buffer and use it for each operation of image encoding.

영상 부호화기(300)는 물체 인식 정보에 기반하여 입력 영상의 비트율(bitrate)을 제어할 수 있다(320). 비트율 제어란 영상을 구성하는 각 픽처의 데이터 양을 조절하는 것을 의미한다. 비트율 제어를 통해, 영상 전송 딜레이를 제거하고, 갑작스럽게 과도한 데이터 양이 전송되는 것을 방지할 수 있다.The image encoder 300 can control the bit rate of the input image based on object recognition information (320). Bit rate control means controlling the amount of data of each picture that makes up an image. Through bit rate control, video transmission delay can be eliminated and excessive data amounts can be prevented from being suddenly transmitted.

일 실시예에서, 입력 영상의 비트율은 물체 인식 정보에 기반하여 소정 단위, 예컨대 제1 압축 단위 또는 제2 압축 단위로 결정될 수 있다. 여기서, 상기 제1 압축 단위는, 픽처 그룹(Group of Picture, GoP) 단위, 픽처 단위 및 CU(coding unit) 단위를 포함할 수 있다. 또한, 상기 제2 압축 단위는, 상기 제1 압축 단위로부터 재구성된 새로운 단위를 포함할 수 있다.In one embodiment, the bit rate of the input image may be determined in a predetermined unit, for example, a first compression unit or a second compression unit, based on object recognition information. Here, the first compression unit may include a group of picture (GoP) unit, a picture unit, and a coding unit (CU) unit. Additionally, the second compression unit may include a new unit reconstructed from the first compression unit.

비트율은 입력 영상 내에서 물체가 존재하는 각각의 영역에 대해 개별적으로 결정될 수 있으며, 각 물체의 중요도에 따라 상이한 값을 가질 수 있다. 예컨대, 입력 영상 내에서 인식된 물체를 포함하는 제1 영역의 비트율은 상기 인식된 물체를 포함하지 않는 제2 영역의 비트율보다 높을 수 있다. 상기 제1 영역의 비트율을 상기 제2 영역의 비트율보다 높게 제어함으로써, 입력 영상 내에서 물체가 존재하는 영역을 보다 고해상도로 압축할 수 있게 된다. 비트율 제어(320)의 구체적 내용은 상세히 후술하기로 한다.The bit rate can be determined individually for each area in the input image where an object exists, and can have different values depending on the importance of each object. For example, the bit rate of a first area containing a recognized object in an input image may be higher than the bit rate of a second area not containing the recognized object. By controlling the bit rate of the first area to be higher than the bit rate of the second area, the area where the object exists in the input image can be compressed with higher resolution. Specific details of the bit rate control 320 will be described in detail later.

영상 부호화기(300)는 물체 인식 정보에 기반하여 입력 영상에 대한 화면내/화면간 예측을 수행할 수 있다(330).The image encoder 300 may perform intra-screen/inter-screen prediction on the input image based on object recognition information (330).

구체적으로, 영상 부호화기(300)는, 입력 영상 내 물체가 존재하는 영역에 대해서는 물체 인식 정보에 기반하여 움직임 예측/보상(Motion Estimation/Compensation)을 수행할 수 있다. 일 실시예에서, 상기 움직임 예측/보상은 입력 영상 내 물체가 존재하는 영역에 대해서만 수행될 수 있다. 화면내/화면간 예측의 기본적인 내용은 도 1을 참조하여 전술한 바와 같으며, 물체 인식 정보에 기반한 화면내/화면간 예측(330)의 구체적 내용은 상세히 후술하기로 한다.Specifically, the image encoder 300 may perform motion estimation/compensation based on object recognition information for an area where an object exists in the input image. In one embodiment, the motion prediction/compensation may be performed only for areas where objects exist in the input image. The basic contents of intra-screen/inter-screen prediction are as described above with reference to FIG. 1, and the specific contents of intra-screen/inter-screen prediction 330 based on object recognition information will be described in detail later.

영상 부호화기(300)는 물체 인식 정보에 기반하여 변환/양자화를 수행할 수 있다(340). The image encoder 300 may perform transformation/quantization based on object recognition information (340).

구체적으로, 영상 부호화기(300)는 물체 인식 정보에 기반하여 입력 영상 내의 물체가 존재하는 영역과 물체가 존재하지 않는 영역을 구별하고, 물체가 존재하는지 여부에 따라 상이하게 결정되는 양자화 파라미터에 기반하여 변환/양자화를 수행할 수 있다. 일 실시예에서, 입력 영상 내 인식된 물체를 포함하는 제1 영역의 양자화 파라미터는 인식된 물체를 포함하지 않는 제2 영역의 양자화 파라미터보다 작게 조절될 수 있다. 상기 제1 영역의 양자화 파라미터를 보다 작게 조절함으로써 입력 영상 내의 보다 큰 고주파 영역을 이용할 수 있게 되므로, 고화질, 고해상도의 영상 압축을 실현할 수 있다. 변환/양자화의 기본적 내용은 도 1을 참조하여 전술한 바와 같으며, 물체 인식 정보에 기반한 변환/양자화(340)의 구체적 내용은 상세히 후술하기로 한다.Specifically, the image encoder 300 distinguishes between an area where an object exists and an area where an object does not exist in the input image based on object recognition information, and based on quantization parameters determined differently depending on whether an object exists. Transformation/quantization can be performed. In one embodiment, the quantization parameter of the first area containing the recognized object in the input image may be adjusted to be smaller than the quantization parameter of the second area not containing the recognized object. By adjusting the quantization parameter of the first region to a smaller size, a larger high-frequency region within the input image can be used, making it possible to realize high-quality, high-resolution image compression. The basic contents of transformation/quantization are as described above with reference to FIG. 1, and the specific details of transformation/quantization 340 based on object recognition information will be described in detail later.

영상 부호화기(300)는 물체 인식 정보에 기반하여 엔트로피 부호화를 수행할 수 있다(350).The image encoder 300 may perform entropy encoding based on object recognition information (350).

구체적으로, 영상 부호화기(300)는 물체 인식 정보에 기반하여 입력 영상 정보(예: 변환 및 양자화된 잔차 정보)를 엔트로피 부호화하여 비트스트림을 생성할 수 있다. 일 실시예에서, 상기 비트스트림은 물체 인식 정보를 더 포함할 수 있다. 엔트로피 부호화의 기본적 내용은 도 1을 참조하여 전술한 바와 같으며, 물체 인식 정보에 기반한 엔트로피 부호화(350)의 구체적 내용은 상세히 후술하기로 한다.Specifically, the image encoder 300 may generate a bitstream by entropy encoding input image information (eg, transformed and quantized residual information) based on object recognition information. In one embodiment, the bitstream may further include object recognition information. The basic contents of entropy encoding are as described above with reference to FIG. 1, and the specific contents of entropy encoding 350 based on object recognition information will be described in detail later.

한편, 영상 부호화기(300)는 변환/양자화된 입력 영상 정보에 대해 역변환/역양자화(360)를 수행한 후, 화면내/화면간 예측에 의해 생성된 예측 정보와 가산함으로써 입력 영상을 복원할 수 있다. 복원된 영상은 인루프 필터링된 후(370), 다음 부호화 대상 영상에 대한 화면간 예측(330)에 이용될 수 있다. 역변환/역양자화(360) 및 인루프 필터링(370)의 기본적 내용은 도 1을 참조하여 전술한 바와 같으며, 그 구체적 설명은 생략하기로 한다.Meanwhile, the image encoder 300 can restore the input image by performing inverse transformation/inverse quantization 360 on the transformed/quantized input image information and then adding it with prediction information generated by intra-screen/inter-screen prediction. there is. The restored image can be in-loop filtered (370) and then used for inter-screen prediction (330) for the next encoding target image. The basic contents of the inverse transformation/inverse quantization 360 and the in-loop filtering 370 are the same as described above with reference to FIG. 1, and detailed description thereof will be omitted.

도 4는 본 발명의 일 실시예에 따른 영상 복호화기를 설명하기 위한 도면이다.Figure 4 is a diagram for explaining a video decoder according to an embodiment of the present invention.

[도 4] 영상 복호화기[Figure 4] Video decoder

영상 복호화기(400)는 영상 복호화 장치로 지칭될 수도 있다. 영상 복호화기(400)는 도 3의 영상 부호화기(300)에 대응하며, 도 2를 참조하여 전술한 영상 복호화 장치(200)와 동일/유사한 구조를 가질 수 있다.The video decoder 400 may also be referred to as a video decoding device. The video decoder 400 corresponds to the video encoder 300 of FIG. 3 and may have the same/similar structure as the video decoding device 200 described above with reference to FIG. 2.

도 4를 참조하면, 영상 복호화기(400)는 물체 인식 정보에 기반하여 영상을 복호화할 수 있다.Referring to FIG. 4, the image decoder 400 can decode an image based on object recognition information.

구체적으로, 영상 복호화기(400)는 수신된 비트스트림에 대해 엔트로피 복호화를 수행하여, 영상 정보 및 물체 인식 정보를 획득할 수 있다(410). 상기 비트스트림은 영상 정보(예: 잔차 정보 및 각종 부호화 정보) 및 물체 인식 정보를 포함하는 단일 비트스트림 구조를 가질 수 있다. 일 실시예에서, 상기 물체 인식 정보는 이전 부호화 대상 물체의 물체 인식 정보와의 차분으로 부호화될 수 있다.Specifically, the image decoder 400 may perform entropy decoding on the received bitstream to obtain image information and object recognition information (410). The bitstream may have a single bitstream structure including image information (eg, residual information and various encoding information) and object recognition information. In one embodiment, the object recognition information may be encoded as a difference from the object recognition information of a previous encoding target object.

영상 복호화기(400)는 수신된 정보에 기반하여 화면내/화면간 예측(420) 및 역변환/역양자화(430)를 각각 수행할 수 있다. 그리고, 영상 복호화기(400)는 예측 영상 및 역변환/역양자화된 영상을 가산하여 영상을 복원할 수 있다. 복원 영상은 인루프 필터링(440)을 거친 후, 다음 복호화 대상 영상의 화면내/화면간 예측을 위해 이용될 수 있으며, 디스플레이 등의 표시 장치를 통해 출력될 수 있다. 엔트로피 복호화(410) 내지 인루프 필터링(440)의 기본적 내용은 도 1을 참조하여 전술한 바와 같으며, 그 구체적 설명은 생략하기로 한다.The video decoder 400 may perform intra-screen/inter-screen prediction 420 and inverse transformation/inverse quantization 430, respectively, based on the received information. Additionally, the image decoder 400 can restore the image by adding the predicted image and the inversely transformed/inversely quantized image. After going through in-loop filtering 440, the restored image can be used for intra-screen/inter-screen prediction of the next image to be decoded, and can be output through a display device such as a display. The basic contents of the entropy decoding 410 to the in-loop filtering 440 are the same as described above with reference to FIG. 1, and detailed description thereof will be omitted.

한편, 엔트로피 복호화(410) 내지 역변환/역양자화(430)는 물체 인식 정보에 기반하여 수행될 수 있다. 그 구체적 내용은 도 3을 참조하여 전술한 바와 같다. 예컨대, 화면간 예측(420)은 영상 내 물체가 존재하는 영역에 대해서만 수행될 수 있다. Meanwhile, entropy decoding 410 to inverse transformation/inverse quantization 430 may be performed based on object recognition information. The specific details are the same as described above with reference to FIG. 3. For example, inter-screen prediction 420 may be performed only on areas where objects exist in the image.

이상, 도 3 및 도 4를 참조하여 전술한 영상 부호화기(300) 및 영상 복호화기(400)는 물체 인식 정보에 기반하여 영상을 적응적으로(adpatively) 부호화/복호화할 수 있다. 이에 따라, 영상 압축 효율이 획기적으로 개선되어, 고화질, 고해상도 영상을 보다 효과적으로 부호화/복호화할 수 있게 된다.The video encoder 300 and the video decoder 400 described above with reference to FIGS. 3 and 4 can adaptively encode/decode an image based on object recognition information. Accordingly, image compression efficiency is dramatically improved, making it possible to encode/decode high-quality, high-resolution images more effectively.

영상 부호화기/복호화기 - 제1-2 구조Video encoder/decoder - Structure 1-2

도 5는 본 발명의 다른 실시예에 따른 영상 부호화기를 설명하기 위한 도면이다.Figure 5 is a diagram for explaining a video encoder according to another embodiment of the present invention.

도 6은 본 발명의 다른 실시예에 따른 영상 복호화기를 설명하기 위한 도면이다.Figure 6 is a diagram for explaining a video decoder according to another embodiment of the present invention.

[도 5] 영상 부호화기[Figure 5] Video encoder

[도 6] 영상 복호화기[Figure 6] Video decoder

도 5는 영상 부호화기를 나타내며, 도 6은 영상 복호화기를 나타낸다. Figure 5 shows a video encoder, and Figure 6 shows a video decoder.

도 5는 [1]을 나타낸다. 이때, [1]은 영상 부호화기와 물체 인식 알고리즘을 통해 얻은 정보를 함께 공유하며 압축을 수행한 뒤 하나의 비트스트림을 통해 전송할 수 있다.Figure 5 shows [1]. At this time, [1] shares information obtained through an image encoder and an object recognition algorithm, performs compression, and transmits it through a single bitstream.

[1] 도 5는 두 방향에서 입력영상을 받게 된다. 이때, 첫번째는 [E1] 물체 인식 알고리즘 적용을 통해서 얻은 정보를 바탕으로 압축을 수행하며, 두번째는 [E3] 압축 단위 재구성을 지나 영상을 부호화하는 과정을 수행하게 된다. 이때, [E1] 물체 인식 알고리즘 적용에서 얻은 정보를 바탕으로 [E2] 비트율 제어를 하게 되며, 기존에 이루어져 있는 화면 내/간[E4] 예측을 수행하며, [E5] 변환 및 양자화, [E6] 역변환 및 역양자화, 인루프 필터, [E7] 엔트로피 부호화를 수행한 뒤, 비트스트림을 시그널링하게 된다.[1] In Figure 5, input images are received from two directions. At this time, the first performs compression based on information obtained through the application of the [E1] object recognition algorithm, and the second performs the process of encoding the image through [E3] compression unit reconstruction. At this time, [E2] bit rate control is performed based on the information obtained from the application of the [E1] object recognition algorithm, the existing intra/inter-screen [E4] prediction is performed, [E5] conversion and quantization, and [E6] After performing inverse transformation, inverse quantization, in-loop filter, and [E7] entropy coding, the bitstream is signaled.

도 5에서 부호화한 것을 도 6에서 복호화하는 방법이다. 이때, 도 5에서는 물체 인식 알고리즘을 통해서 얻은 정보와 압축을 수행한 뒤 하나의 비트스트림을 복호화하여 영상 복호화 과정을 수행할 수 있다.This is a method of decoding what was encoded in FIG. 5 in FIG. 6. At this time, in Figure 5, an image decoding process can be performed by compressing information obtained through an object recognition algorithm and then decoding one bitstream.

도 6은 도 5에서 전송된 비트스트림을 받아, [D1] 엔트로피 복호화를 수행하고, 화면 내/간[D3] 예측과 [D2] 역변환 및 역양자화를 통해서 얻은 영상을 인루프 필터를 지나 복원 영상을 갖게 된다. Figure 6 receives the bitstream transmitted in Figure 5, performs [D1] entropy decoding, and passes the image obtained through intra/inter-screen [D3] prediction and [D2] inverse transformation and inverse quantization to a restored image through an in-loop filter. You will have

영상 부호화기/복호화기 - 제2 구조Video encoder/decoder - second structure

도 7은 본 발명의 또 다른 실시예에 따른 영상 부호화기를 설명하기 위한 도면이다.Figure 7 is a diagram for explaining a video encoder according to another embodiment of the present invention.

도 8은 본 발명의 또 다른 실시예에 따른 영상 복호화기를 설명하기 위한 도면이다.Figure 8 is a diagram for explaining a video decoder according to another embodiment of the present invention.

[도 7] 영상 부호화기[Figure 7] Video encoder

[도 8] 영상 복호화기[Figure 8] Video decoder

도 7은 영상 부호화기를 나타내며, 도 8은 영상 복호화기를 나타낸다. Figure 7 shows a video encoder, and Figure 8 shows a video decoder.

도 7는 [2]를 나타낸다. 이 때, [2]는 영상 부호화기와 물체 인식 알고리즘이 독립적으로 사용하기 때문에, 각각의 비트스트림을 통해 전송할 수 있다.Figure 7 shows [2]. At this time, [2] is used independently by the image encoder and object recognition algorithm, so it can be transmitted through each bitstream.

[2] 도 7는 두 방향에서 입력영상을 받게 된다. 이때, 물체 인식 부호화기에서 [E1] 물체 인식 알고리즘 적용을 통해서 얻은 정보를 바탕 비트스트림을 전송할 수 있다. 영상 부호화기에서는 [E2] 비트율을 제어를 하게 되며, 기존에 이루어져 있는 화면 내/간[E4] 예측을 수행하며, [E5] 변환 및 양자화, [E6] 역변환 및 역양자화, 인루프 필터, [E7] 엔트로피 부호화를 수행한 뒤, 비트스트림을 시그널링하게 된다.[2] In Figure 7, input images are received from two directions. At this time, the object recognition encoder can transmit the information obtained through the application of the [E1] object recognition algorithm as a background bitstream. The video encoder controls the [E2] bit rate, performs existing intra/inter-screen [E4] prediction, [E5] transformation and quantization, [E6] inverse transformation and inverse quantization, in-loop filter, [E7] ] After entropy encoding is performed, the bitstream is signaled.

도 7에서 부호화한 것을 도 8에서 복호화하는 방법이다. 이때, 도 8에서는 물체 인식 알고리즘을 통해서 얻은 정보와 영상 부호화기에서 각각의 비트스트림을 가져온다. 물체 인식 부호화기에서는 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 등을 부호화하여 가지고 있을 수 있다. 이때, 영상 부호화기에서 각각 필요한 정보를 활용하여 복원영상을 생성할 수 있다.This is a method of decoding what was encoded in FIG. 7 in FIG. 8. At this time, in Figure 8, information obtained through an object recognition algorithm and each bitstream are retrieved from the image encoder. The object recognition encoder may encode and have the x-axis and y-axis information of the recognized object, the width of the recognized object, the height of the recognized object, the type of object, the recognized accuracy of the object, the object identification number, etc. At this time, the image encoder can generate a restored image using each necessary information.

도 8은 도 7에서 전송된 비트스트림을 받아, [D1] 엔트로피 복호화를 수행하고, 화면 내/간[D3] 예측과 [D2] 역변환 및 역양자화를 통해서 얻은 영상을 인루프 필터를 지나 복원 영상을 갖게 된다. 도 8의 물체 인식 복호화기에서 물체 인식 정보를 복호화하여 영상 복호화기에 사용한다.Figure 8 receives the bitstream transmitted in Figure 7, performs [D1] entropy decoding, and passes the image obtained through intra/inter-screen [D3] prediction and [D2] inverse transformation and inverse quantization to a restored image through an in-loop filter. You will have The object recognition information is decoded in the object recognition decoder of FIG. 8 and used in the video decoder.

영상 부호화기/복호화기 - 제3 구조Video encoder/decoder - 3rd structure

도 9는 본 발명의 또 다른 실시예에 따른 영상 부호화기와 물체 인식 레이어를 설명하기 위한 도면이다.Figure 9 is a diagram for explaining an image encoder and an object recognition layer according to another embodiment of the present invention.

도 10은 본 발명의 또 다른 실시예에 따른 영상 복호화기와 물체 인식 레이어를 설명하기 위한 도면이다.Figure 10 is a diagram for explaining an image decoder and an object recognition layer according to another embodiment of the present invention.

[도 9] 영상 부호화기[Figure 9] Video encoder

[도 10] 영상 복호화기[Figure 10] Video decoder

도 9 및 도 10을 함께 참조하면, 물체 인식 레이어는 물체 인식 알고리즘을 영상 부호화기와 영상 복호화기의 복원 영상을 만드는 것에 영향을 주지 않으며 사용할 수 있게 한다. 물체 인식 레이어에서 물체 인식 알고리즘을 사용하여 얻은 정보는 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 등이다. 이때, 얻은 정보를 바탕으로 영상 부호화기에 활용할 수 있다. 영상 부호화기에서 얻은 정보는 화면 간 예측 차분값, delta-qp, 양자화 차분값 등이다. 이때, 얻은 정보는 비트스트림을 통해서 전송할 수 있다.Referring to FIGS. 9 and 10 together, the object recognition layer allows the object recognition algorithm to be used without affecting the creation of a restored image of the image encoder and image decoder. The information obtained using the object recognition algorithm in the object recognition layer includes the x-axis and y-axis information of the recognized object, the width of the recognized object, the height of the recognized object, the type of object, the recognized accuracy of the object, and the identification number of the object. etc. At this time, the obtained information can be used in a video encoder. Information obtained from the video encoder includes inter-screen prediction difference values, delta-qp, and quantization difference values. At this time, the obtained information can be transmitted through a bitstream.

화면 간 예측 차분값은 기존 영상 압축에서 활용하는 화면 간 예측값과 물체 인식 기반 화면 간 예측값과 빼준 값을 의미한다.The predicted difference value between screens refers to the value subtracted from the predicted value between screens used in existing video compression and the predicted value between screens based on object recognition.

delta-qp는 물체 인식 알고리즘을 통해서 얻은 정보를 바탕으로 물체의 특성에 따라 다른 delta-qp 값을 양자화할 때 다르게 정의할 수 있다.delta-qp can be defined differently when quantizing different delta-qp values depending on the characteristics of the object based on information obtained through the object recognition algorithm.

양자화 차분값은 기존 영상 압축에서 활용하는 역 양자화를 통해서 얻은 값과 물체 인식 기반 역 양자화를 통해서 얻은 값을 빼준 값을 의미한다.The quantization difference value refers to the value obtained by subtracting the value obtained through inverse quantization used in existing image compression and the value obtained through object recognition-based inverse quantization.

도 10은 영상 복호화기에서 엔트로피 복호화, 화면 내/간 예측, 역변환 및 역양자화, 인루프 필터를 통해서 복원 영상을 구할 수 있다. 이때, 물체 인식 레이어에서 얻은 화면 간 예측 차분값 또는 역양자화 차분값을 더해주어 물체 인식 기반 복원 영상을 구할 수 있다. Figure 10 shows that the image decoder can obtain a restored image through entropy decoding, intra/inter-screen prediction, inverse transformation and inverse quantization, and in-loop filter. At this time, a reconstructed image based on object recognition can be obtained by adding the inter-screen predicted difference or inverse quantization difference obtained from the object recognition layer.

도 10은 비트스트림을 전송 받아, 물체 인식 정보 복호화 과정을 통해서 화면 간 예측 차분값, 역양자화 차분값, delta-qp 등을 가질 수 있다.Figure 10 receives a bitstream and can have inter-screen prediction difference values, inverse quantization difference values, delta-qp, etc. through the object recognition information decoding process.

이하, 본 개시의 실시예들에 따른 물체 인식 알고리즘 적용 방법, 압축 단위 재구성, 비트율 제어, 화면내/화면간 예측, 변환/양자화 및 엔트로피 부/복호화 방법에 대해 상세히 설명하기로 한다.Hereinafter, a method for applying an object recognition algorithm, compression unit reconstruction, bit rate control, intra/interscreen prediction, transformation/quantization, and entropy encoding/decoding methods according to embodiments of the present disclosure will be described in detail.

[E1] 물체 인식 알고리즘 적용 방법[E1] How to apply object recognition algorithm

입력된 영상으로부터 물체 인식 알고리즘을 적용해서 영상 내에 있는 물체들을 파악할 수 있다. 이때, 알고리즘을 통해서 한가지 종류의 물체 또는 해당 물체를 특정해서 알 수 있다.Objects in the image can be identified by applying an object recognition algorithm from the input image. At this time, one type of object or the corresponding object can be identified through an algorithm.

물체 인식 알고리즘을 적용해서 얻은 정보를 통해서 영상을 해당 정보를 적용하며 압축을 수행할 수 있다. 다시 말해, 물체 인식을 통해서 얻은 정보를 통해 영상 압축에 적용하는 방법을 관심 영역 기반 영상 압축(Region of Interest-based Video Compression)이라고 말할 수 있다. 이때, 물체 인식 알고리즘을 통해서 얻은 정보의 종류에 따라서 다양한 방법의 압축을 수행 할 수 있다.Through the information obtained by applying the object recognition algorithm, compression can be performed by applying the information to the image. In other words, the method of applying video compression using information obtained through object recognition can be called Region of Interest-based Video Compression. At this time, various methods of compression can be performed depending on the type of information obtained through the object recognition algorithm.

물체 인식 알고리즘을 적용하여 얻은 정보는 다음과 같다. 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 등을 물체 인식 알고리즘을 통해서 얻을 수 있다.The information obtained by applying the object recognition algorithm is as follows. The x-axis and y-axis information of the recognized object, the width of the recognized object, the height of the recognized object, the type of object, the recognized accuracy of the object, and the object identification number can be obtained through the object recognition algorithm.

예를 들어, 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이는 소수(floating point)로 표현 될 수 있다. 이때, 소수로 표현된 값은 영상의 너비 또는 높이의 곱을 통해서 정수화 할 수 있다. x축과 인식된 물체의 너비는 영상의 너비의 곱을 통해서 정수화할 수 있으며, y축과 인식된 물체의 높이는 영상의 높이의 곱을 통해서 정수화할 수 있다. 이때, 정수화 표현은 부호화/복호화에 효율적으로 하기 위함이다. 이대 물체의 x축과 y축의 정보는 물체의 정 가운데 좌표 또는 왼쪽 위 모서리 또는 오른쪽 위 모서리 또는 왼쪽 아래 모서리 또는 오른쪽 아래 모서리를 나타낼 수 있다.For example, the x-axis and y-axis information of the recognized object, the width of the recognized object, and the height of the recognized object can be expressed as floating points. At this time, the value expressed as a decimal can be converted to an integer by multiplying the width or height of the image. The width of the x-axis and the recognized object can be integerized through the product of the width of the image, and the y-axis and the height of the recognized object can be integerized through the product of the height of the image. At this time, the integer representation is for efficient encoding/decoding. Information on the x-axis and y-axis of an object may represent the coordinates of the center of the object, the upper left corner, the upper right corner, the lower left corner, or the lower right corner.

또 다른 예로 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이는 소수로 표현될 수 있다. 이때 소수로 표현된 값을 부호화/복호화할 수 있다.As another example, the x- and y-axis information of the recognized object, the width of the recognized object, and the height of the recognized object can be expressed as decimals. At this time, values expressed as decimals can be encoded/decoded.

예를 들어, x축과 y축 정보, 인식된 물체의 너비와 높이를 각각 전송하면 부담이 될수 있기 때문에 2개 이상의 물체에 대해서는 첫번째 물체와 두번째 외 다른 물체들의 서로간의 정보들의 차분값을 구해 해당 정보들을 부호화/복호화할 수 있다. For example, transmitting the x-axis and y-axis information and the width and height of the recognized object separately may be burdensome, so for two or more objects, the difference value of the information between the first object and the second object and other objects must be obtained. Information can be encoded/decoded.

구체적인 예로, 인식된 물체가 2개라고 가정할때, 첫번째 물체의 정보는 다음과 같다. (x,y) 축 좌표 값은 (42,118) 이고 물체의 너비와 높이는 각각 50, 30이고 물체의 종류 클래스는 2(차량)이고 물체의 식별 번호는 0이다. 두번째 물체의 정보는 (x,y) 축 좌료 값은 (70,220)이고 물체의 너비와 높이는 각각 40, 28이고 물체의 종류 클래는 2(차량)이고 물체의 식별 번호는 1이다. 이때, 첫번째 물체의 정보에 대해서는 부호화를 시켜 전송을 하며, 두번째 물체의 정보를 전송할 때, 두 물체에 대한 차분값을 구해 부호화를 시켜 전송할 수 있다. 두번째와 첫번째 물체 간에 x축 좌표의 차분값을 구하게되면, 70-42=38이 나온다. 마찬가지로, 똑같은 방법으로 y축 좌표의 차분값을 구하게되면, 220-108=102가 나온다. 높이와 너비에 대해서도 이전의 방식들과 같게하면 두 물체간의 너비와 높이에 대한 차분값은 각각 -10, -2가 나온다. 물체의 종류 클래스 차분값도 0이 나온다. 이렇게해서 각각 구해진 차분값을 부호화할 때, 차분값의 부호를 나타내는 플래그를 전송하고, 양수화한 차분값을 부호화하여 전송할 수 있다.As a specific example, assuming there are two recognized objects, the information of the first object is as follows. The (x,y) axis coordinate value is (42,118), the width and height of the object are 50 and 30, respectively, the object type class is 2 (vehicle), and the object identification number is 0. The information about the second object is that the (x, y) axis coordinate value is (70,220), the width and height of the object are 40 and 28, respectively, the object type is 2 (vehicle), and the object identification number is 1. At this time, the information on the first object is encoded and transmitted, and when transmitting the information on the second object, the difference value for the two objects can be obtained and encoded before transmission. If you find the difference value of the x-axis coordinate between the second and first objects, you get 70-42=38. Likewise, if you calculate the difference value of the y-axis coordinate in the same way, you get 220-108=102. If the height and width are the same as the previous methods, the difference values for the width and height between the two objects are -10 and -2, respectively. The difference value for the type and class of the object is also 0. When encoding the difference values obtained in this way, a flag indicating the sign of the difference value can be transmitted, and the positive difference value can be encoded and transmitted.

이때, 3개 이상의 물체가 있는 경우, 같은 방법으로 수행을 하며, 두번째 물체는 두번째 물체와 위의 방법으로 차분값을 구하고, 두번째 물체는 세번째 물체와 위의 방법으로 차분값을 구하여 부호화하여 전송할 수 있다.At this time, if there are three or more objects, the same method is used. For the second object, the difference value is obtained using the above method with the second object, and for the second object, the difference value is obtained using the above method with the third object, encoded, and transmitted. there is.

예를 들어, 인식된 물체의 정보가 변경되지 않은 경우, object_coincidence_flag가 제1 값이 되며 인식된 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 등을 전송하지 않을 수 있다. 단, 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이는 전송할 수 있다.For example, if the information of the recognized object is not changed, object_coincidence_flag becomes the first value and the type of recognized object, recognized accuracy of the object, identification number of the object, etc. may not be transmitted. However, the x- and y-axis information of the recognized object, the width of the recognized object, and the height of the recognized object can be transmitted.

또 다른 예를 들어, 인식된 물체의 정보가 변경되지 않은 경우, object_coincidence_flag가 제1 값이 되며 인식된 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 등을 전송하지 않을 수 있다. 단, 영상 자체가 정지 영상인 경우 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이는 전송하지 않을 수 있다.For another example, if the information of the recognized object is not changed, object_coincidence_flag becomes the first value and may not transmit the type of recognized object, recognized accuracy of the object, identification number of the object, etc. However, if the image itself is a still image, the x- and y-axis information of the recognized object, the width of the recognized object, and the height of the recognized object may not be transmitted.

또 다른 예를 들어, 인식된 물체의 정보가 변경이 된 경우, object_coincidence_flag가 제2 값이 되며, 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 등을 전송할 수 있다.For another example, if the information of the recognized object changes, object_coincidence_flag becomes the second value, including the x- and y-axis information of the recognized object, the width of the recognized object, the height of the recognized object, and the The type, recognized accuracy of the object, identification number of the object, etc. can be transmitted.

이때, object_coincidence_flag가 제1 값은 0을 나타내며, 제2 값은 1을 나타낸다.At this time, the first value of object_coincidence_flag indicates 0 and the second value indicates 1.

이때, 물체 인식 알고리즘을 적용하여 얻은 정보는 [1]영상 압축시 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열의 상위 레벨 구문 요소 또는 [2]비디오압축 스트림과 독립적으로 전송할 수 있다. At this time, the information obtained by applying the object recognition algorithm can be transmitted independently of [1] the high-level syntax elements of the picture/slice/tile/tile group/CTU/CTU row/CTU column during video compression or [2] the video compression stream. .

[E2] 압축 단위 재구성[E2] Reconstruction of compression units

압축 단위 재구성은 기존에 존재하는 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열의 단위 또는 물체 기반 압축 단위(Object-based Coding Unit)로 인식된 물체에 대한 정보를 저장하는 단위를 추가(포함) 또는 대체하는 단계이다. 구체적으로 설명하자면, 하나의 픽처 내의 2개 이상의 물체가 존재한다고 할 때, 물체들에 대한 정보를 저장하는 단위를 추가한다고 할 때, 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열의 단위 또는 물체 기반 압축 단위에 저장할 수 있다. Compression unit reconstruction adds a unit that stores information about a recognized object as a unit of an existing picture/slice/tile/tile group/CTU/CTU row/CTU column or as an object-based coding unit. (included) or replacement step. To explain specifically, when two or more objects exist in one picture and a unit for storing information about the objects is added, the picture/slice/tile/tile group/CTU/CTU row/CTU column It can be stored in units or object-based compression units.

이때, 추가된 물체 기반 압축 단위는 기존에 영상 압축 표준에서 사용하는 단위가 아닌 새로운 단위이다.At this time, the added object-based compression unit is a new unit rather than a unit used in existing video compression standards.

이때, 픽처/슬라이스/ 타일/타일 그룹/CTU/CTU 행/CTU 열의 단위는 기존 영상 압축 표준에 사용하는 단위이다.At this time, the units of picture/slice/tile/tile group/CTU/CTU row/CTU column are the units used in existing video compression standards.

이때, 물체 기반 압축 단위는 픽처의 상위 단위일 수 있으며, 슬라이스/ 타일/타일 그룹/CTU/CTU 행/CTU 열을 포함할 수 있다.At this time, the object-based compression unit may be a higher-order unit of the picture and may include slice/tile/tile group/CTU/CTU row/CTU column.

이때, 물체 기반 압축 단위는 슬라이스의 상위 단위일 수 있으며, 타일/타일 그룹/CTU/CTU 행/CTU 열을 포함할 수 있다.At this time, the object-based compression unit may be the upper unit of the slice and may include tile/tile group/CTU/CTU row/CTU column.

이때, 물체 기반 압축 단위는 타일 또는 타일 그룹의 상위 단위일 수 있으며, CTU/CTU 행/CTU 열을 포함할 수 있다.At this time, the object-based compression unit may be a higher-level unit of a tile or tile group, and may include a CTU/CTU row/CTU column.

이때, 물체 기반 압축 단위는 CTU 행/CTU 열의 상위 단위일 수 있으며, CTU열을 포함할 수 있다.At this time, the object-based compression unit may be a higher unit of the CTU row/CTU column and may include the CTU column.

[1]은 상위 레벨 구문요소 에서 물체 인식 알고리즘을 사용한다고 할 때, 물체 인식 알고리즘을 통해서 얻은 정보를 바탕으로, 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열 또는 물체 기반 압축 단위을 재구성할 수 있다. 단, 픽처내에 인식된 물체가 없는 경우, 압축하고 있는 해당 픽처를 기존 영상 압축과 같이 수행할 수 있으며, 또는 해당 픽처를 스킵하여 다음 픽처 압축을 수행할 수 있다. [1] When using an object recognition algorithm in a high-level syntax element, based on the information obtained through the object recognition algorithm, a picture/slice/tile/tile group/CTU/CTU row/CTU column or object-based compression unit is used. It can be reconstructed. However, if there is no recognized object in the picture, the picture being compressed can be compressed like existing image compression, or the picture can be skipped and the next picture compressed.

[2]는 또는 영상 압축 비트스트림이 아니 독립적인 스트림으로 물체들에 전송하는 경우는 기존이 영상 압축 표준화 방식은 있는 그대로 적용하는 것도 가능하다.[2] Alternatively, when transmitting objects as an independent stream rather than a video compressed bitstream, it is possible to apply the existing video compression standardization method as is.

이때, 압축 단위의 재구성을 결정하는 object_based_coding_unit_reconstruction_flag 상기 플래그를 통해서 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열 또는 물체 기반 압축 단위을 재구성 여부를 결정할 수 있다. 이때, 상기 변수가 제1 값인 경우, 압축 단위의 재구성을 수행하지 않으며, 제2 값인 경우, 압축 단위의 재구성을 수행할 수 있다. 이때, 제1 값은 0일수 있으며, 제2 값은 1이다.At this time, it is possible to determine whether to reconstruct a picture/slice/tile/tile group/CTU/CTU row/CTU column or object-based compression unit through the object_based_coding_unit_reconstruction_flag flag that determines reconstruction of the compression unit. At this time, if the variable is the first value, reconstruction of the compression unit is not performed, and if the variable is the second value, reconstruction of the compression unit may be performed. At this time, the first value may be 0, and the second value may be 1.

구체적으로, 물체가 인식이 되어 [E1]을 통해서 물체의 정보를 알 수 있다고 가정할 때, 해당 물체의 크기에 맞게 픽처/ 슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열 또는 물체 기반 압축 단위 추가(포함) 또는 대체하여서 해당 단위로 압축을 수행하는 방법이다.Specifically, assuming that the object is recognized and the object's information can be known through [E1], picture/slice/tile/tile group/CTU/CTU row/CTU column or object-based compression is appropriate for the size of the object. This is a method of performing compression by adding (including) or replacing units.

이때, 물체의 너비 또는 높이가 4로 나누었을 때, 나머지가 0이 아니면 나머지가 0이 될 수 있도록 물체의 너비 또는 높이에 임의의 양수(1~3)의 값을 더해줄 수 있다. 영상 압축 표준에서 4x4단위로 움직임 벡터등을 저장하는 방식을 사용하기 때문에, 물체의 너비 또는 높이를 4로 나누어질 수 있도록 맞춰주는 효과를 가져온다. At this time, when the width or height of the object is divided by 4, if the remainder is not 0, an arbitrary positive number (1 to 3) can be added to the width or height of the object so that the remainder becomes 0. Since the video compression standard uses a method of storing motion vectors in 4x4 units, it has the effect of adjusting the width or height of the object to be divisible by 4.

이때, 2개 이상의 물체가 인식이 된 경우, 물체 간에 겹치는 경우, 하나의 물체로 생각하여 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열 또는 물체 기반 압축 단위을 재구성할 수 있다.At this time, if two or more objects are recognized and the objects overlap, they can be considered as one object and the picture/slice/tile/tile group/CTU/CTU row/CTU column or object-based compression unit can be reconstructed.

이때, 물체 기반 압축 단위를 추가할 때, 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열을 포함 또는 대체할 수 있다.At this time, when adding an object-based compression unit, picture/slice/tile/tile group/CTU/CTU row/CTU column can be included or replaced.

이때, 물체 기반 압축 단위는 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열의 하위단계일 수 있다. 하위단계라고 하는 것은 해당 단계의 단위에 대해서 물체 기반 압축 단위가 포함되는 것을 의미한다. 이때, 해당 단계의 단위는 픽처/슬라이스/타일/타일 그룹/CTU/CTU 행/CTU 열일 수 있다.At this time, the object-based compression unit may be a lower level of picture/slice/tile/tile group/CTU/CTU row/CTU column. A lower level means that an object-based compression unit is included for the unit of that level. At this time, the unit of the corresponding step may be picture/slice/tile/tile group/CTU/CTU row/CTU column.

이때, 물체가 인식되지 않은 영역에 대해서는 압축을 수행하지 않을 수 있다. 또는 RDO(Rate-Distortion Optimization) 과정을 스킵하며 빠르게 압축을 수행할 수 있다. At this time, compression may not be performed on areas where objects are not recognized. Alternatively, compression can be performed quickly by skipping the RDO (Rate-Distortion Optimization) process.

하나의 예로, 물체 기반 압축 단위는 물체가 인식된 곳에 대해서만 구성을 하여 압축을 수행할 수 있다.As an example, an object-based compression unit can perform compression by configuring only where an object is recognized.

또다른 예로, 물체 기반 압축 단위는 물체가 인식된 곳과 물체가 인식되지 않은 곳에 대해서 서로 구별 가능하게 구성하여 압축을 수행할 수 있다.As another example, an object-based compression unit can perform compression by configuring a place where an object is recognized and a place where an object is not recognized to be distinguishable from each other.

물체 기반 압축 단위들 간에는 연속적으로 존재할 수 있지만, 연속적으로 존재하지 않을 수도 있다.Object-based compression units may be contiguous, but may not be contiguous.

도 11 내지 도 12b는 본 발명의 일 실시예에 따른 압축 단위를 재구성하는 방법을 설명하기 위한 도면들이다.11 to 12B are diagrams for explaining a method of reconstructing a compression unit according to an embodiment of the present invention.

[도 11] 압축 단위 재구성 방법[Figure 11] Compression unit reconstruction method

[도 12a] 압축 단위 재구성 방법[Figure 12a] Compression unit reconstruction method

[도 12b] 압축 단위 재구성 방법[Figure 12b] Compression unit reconstruction method

도 11은 입력 영상 내에서 3개의 물체가 인식되었다고 가정한다. 도면에서는 하나의 사각형은 편의상 하나의 CTU를 나타내었다. 이때, 물체 기반 압축 단위를 사용하여 CTU 또는 슬라이스를 대체하는 방법을 사용할 수 있다. 구체적으로, 도 12a와 도 12b는 슬라이스를 대체한 물체 기반 압축 단위라고 할 수 있다. 이때, [E1]에서 얻은 데이터를 비트스트림을 통해서 전송했다면 슬라이스를 복호화 과정에서도 유도할 수 있기 때문에 슬라이스 위치를 전송하지 않을 수 있다. 단, [E1]에서 얻은 데이터를 비트스트림을 통해서 전송하지 않았다면, 물체 기반 압축 단위의 시작 지점을 전송 해야할 수 있다.Figure 11 assumes that three objects are recognized in the input image. In the drawing, one square represents one CTU for convenience. At this time, a method of replacing CTU or slice using an object-based compression unit can be used. Specifically, Figures 12A and 12B can be said to be object-based compression units that replace slices. At this time, if the data obtained in [E1] is transmitted through a bitstream, the slice position may not be transmitted because the slice can be derived during the decoding process. However, if the data obtained in [E1] is not transmitted through a bitstream, the starting point of the object-based compression unit may need to be transmitted.

도 12a는 부호화/복호화의 편의를 위해서 CTU로 픽처를 분할하였다는 가정하에 존재할 수 있는 기존 위치를 사용하여 물체가 포함되어 있는 영역을 붉은 색(각 영역에서 가장 큰 사각형)으로 나타내었으며, 이를 물체 기반 압축 단위로 사용할 수 있다. 이때, 물체 기반 압축 단위는 CTU로 분할되어서 부호화/복호화를 수행할 수 있다.Figure 12a shows the area containing the object in red (the largest square in each area) using existing positions that may exist under the assumption that the picture is divided into CTUs for convenience of encoding/decoding, and this is the object. Can be used as a base compression unit. At this time, the object-based compression unit can be divided into CTUs to perform encoding/decoding.

도 12b는 물체의 좌우상단하단의 끝 모서리들을 참조해서 물체 기반 압축 단위를 재구성할 수 있다. 왼쪽에 2개의 물체가 겹쳐있기 때문에 하나의 물체 기반 압축 단위로 만들 수 있다. 이때, 기존의 CTU 위치를 사용하지 않고 물체 기반 압축 단위의 왼쪽 상단에서 시작점이라고 보며, 부호화 복호화할 수 있다. 마찬가지로 물체 기반 압축 단위의 우측 하단을 끝점이라고 볼 수 있다. 이때, 시작점은 압축하기 위해 시작하는 지점을 말하며, 끝점은 압축하기 위해 마지막에 도달하는 지점을 일컫는다.In Figure 12b, an object-based compression unit can be reconstructed by referring to the upper, left, lower, and end edges of the object. Since the two objects on the left overlap, they can be created as one object-based compression unit. At this time, encoding and decoding can be done by considering the upper left corner of the object-based compression unit as the starting point without using the existing CTU location. Likewise, the bottom right of the object-based compression unit can be considered the endpoint. At this time, the starting point refers to the starting point for compression, and the end point refers to the final point reached for compression.

비트율 제어bit rate control

비트율 제어는 영상을 구성하는 각각의 픽처의 데이터의 양을 일정하게 조절하는 기술이다. 따라서 영상을 전송하는데 있어서 딜레이와 영상을 받는 사용자 입장에서 갑작스런 과도한 데이터 양이 전송되지 않게하여 영상을 잘 실행할 수 있도록 미연의 방지를 하는 기술이다.Bit rate control is a technology that constantly adjusts the amount of data of each picture that makes up an image. Therefore, it is a technology that prevents delays in transmitting video and prevents the user from receiving the video from suddenly transmitting an excessive amount of data so that the video can run smoothly.

비트율 제어는 GOP 또는 픽처 또는 CU 단위로 조절할 수 있게 디자인이 되어 있다.Bit rate control is designed to be adjustable on a GOP, picture, or CU basis.

이때, 압축 단위의 재구성을 결정하는 object_based_coding_unit_reconstruction_flag가 제2 값인 경우, 물체 기반 압축 단위로도 비트율 제어를 디자인 할 수 있다.At this time, if object_based_coding_unit_reconstruction_flag, which determines reconstruction of the compression unit, is the second value, bit rate control can be designed with an object-based compression unit.

이때, 물체 기반 압축 단위는 GOP 또는 픽처 또는 CU를 대체할 수 있다.At this time, the object-based compression unit can replace GOP, picture, or CU.

비트율 제어는 GOP단위로 사용 가능한 비트를 책정하며, 그에 따라 책정된 GOP단위 비트를 픽처 단위로 각각 구분해서 책정할 수 있다. 그리고 픽처에 책정된 비트에 따라 CU단위로 나누어 비트를 책정할 수 있다. 이때, 물체 기반 압축 단위가 GOP단위 또는 픽처 단위 또는 CU단위를 대체할 수 있다. 또는 물체 기반 압축 단위를 또 다른 단위로 추가할 수 있다. 예를 들어, 480장의 영상이 있고, GOP 단위를 16장의 픽처로 구성되었다고 가정할 때, 480/16=30으로 30개의 GOP로 구성이 될 수 있다. 이때, 480장의 영상을 부호화하기 위해 사용 가능한 양의 비트는 300,000이다. 그렇다면, 하나의 GOP 당 100,000의 비트를 사용할 수 있다. 그렇다면, 하나의 픽처 단위로 100,000/16 = 6,250 비트이다. 단, 픽처의 중요도 또는 종류 또는 특정 조건에 따라서 균등하게 할당하지 않을 수 있다.Bit rate control sets the available bits on a GOP basis, and the GOP unit bits set accordingly can be separately set on a picture basis. And, depending on the bits assigned to the picture, the bits can be divided into CU units. At this time, an object-based compression unit may replace a GOP unit, picture unit, or CU unit. Alternatively, you can add an object-based compression unit as another unit. For example, assuming that there are 480 images and that the GOP unit consists of 16 pictures, 480/16=30 can be composed of 30 GOPs. At this time, the amount of bits available to encode 480 images is 300,000. If so, 100,000 bits can be used per GOP. Then, 100,000/16 = 6,250 bits in one picture unit. However, the allocation may not be equal depending on the importance or type of the picture or specific conditions.

하나의 예로, [E1] 물체 인식 알고리즘을 적용했을 때, 특정 GOP에 주요 물체들이 인식되었다고 가정할 때, 해당 영상을 더 나은 화질로 압축하기 위해서 다른 GOP에 비트를 덜 할당하고 특정 GOP에 대해서 비트를 더 많이 할당할 수 있다. 이때, a_i라는 가중치를 통해서 나타낼 수 있다. 이때, i는 GOP의 순서를 나타내는 값이다. 30개의 GOP로 구성되어 있다고 가정할 때, 사용 가능한 양의 비트가 300,000이라고할 때, 하나의 GOP는 (300,000/30) * a_i를 통해서 비트를 각각 구할 수 있다. a_i 는 양의 값이라고 할 수 있다. 이때, a_i 의 값은 GOP에 있는 물체의 개수에 따라 정해질 수 있으며, 또는 물체의 중요도에 따라서 정해질 수 있다.As an example, when applying the [E1] object recognition algorithm, assuming that major objects are recognized in a specific GOP, in order to compress the video with better image quality, less bits are allocated to other GOPs and bits are allocated to the specific GOP. You can allocate more. At this time, it can be expressed through a weight called a_i. At this time, i is a value indicating the order of GOP. Assuming that it is composed of 30 GOPs, and the available bits are 300,000, each bit for one GOP can be obtained through (300,000/30) * a_i. a_i can be said to be a positive value. At this time, the value of a_i may be determined according to the number of objects in the GOP or according to the importance of the object.

또 다른 예로, GOP 단위가 16장의 픽처로 구성되었으며, 각각의 픽처 단위로 6,250비트를 사용할 수 있다고 가정할 수 있다. 또는 픽처 단위로 물체의 개수 또는 물체의 중요도에 따라 픽처마다 b_i라는 가중치를 통해서 비트를 픽처마다 다르게 분배할 수 있다. 픽처마다의 비트를 책정할 때, (100,000/16) * b_i를 통해서 비트를 각각 구할 수 있다. i는 픽처의 순서를 나타내는 값이며, b_i는 양의 값이라고 할 수 있다.As another example, it can be assumed that a GOP unit consists of 16 pictures, and that 6,250 bits are available for each picture unit. Alternatively, bits can be distributed differently for each picture through a weight called b_i for each picture, depending on the number of objects or the importance of the objects per picture. When setting the bits for each picture, each bit can be obtained through (100,000/16) * b_i. i is a value indicating the order of the picture, and b_i can be said to be a positive value.

또 다른 예로, GOP 단위가 16장의 픽처로 구성되었으며, 각각의 픽처 단위로 6,250비트를 사용할 수 있다고 가정할 수 있다. 한 픽처내에 24개의 CU로 구성이 되어있다고 가정할 때, 하나의 CU당 약 260비트를 사용할 수 있다. 또 다른 방법으로는 픽처 내의 인식된 물체의 개수 또는 물체의 중요도에 따라 CU마다 c_i라는 가중치를 통해서 비트를 CU마다 다르게 분배할 수 있다. CU마다의 비트를 책정할 때, (6,250/24)*c_i를 통해서 비트를 각각 구할 수 있다. i는 CU의 순서를 나타내는 값이며, c_i는 양의 값이라고 할 수 있다.As another example, it can be assumed that a GOP unit consists of 16 pictures, and that 6,250 bits are available for each picture unit. Assuming that one picture consists of 24 CUs, approximately 260 bits can be used per CU. In another method, bits can be distributed differently to each CU through a weight called c_i for each CU, depending on the number of recognized objects in the picture or the importance of the objects. When setting the bits for each CU, each bit can be obtained through (6,250/24)*c_i. i is a value indicating the order of CU, and c_i can be said to be a positive value.

또 다른 예로, 물체 기반 압축 단위를 CU 단계를 대체할 수 있다. 따라서, 픽처 단위의 비트를 책정하고 물체 기반 압축 단위에 사용될 비트를 책정하게 할 수 있다. 이때, 물체 기반 압축 단위는 물체가 있는 곳 관심 있는 영역에 대해서만 부호화/복호화를 하려고 한다. 따라서, CU 단위로 부호화/복호화하려고 한다면 24개의 CU로 구성되어 있지만, 물체 기반 압축 단위는 물체가 있는 곳에 대해서만 압축 단위를 구성하기 때문에 픽처마다 다를 수 있다. 따라서 비트의 책정을 물체 기반 압축 단위의 면적에 따라 다르게 할 수 있다. As another example, an object-based compression unit can replace the CU stage. Therefore, it is possible to set the bits for each picture unit and set the bits to be used in the object-based compression unit. At this time, the object-based compression unit attempts to encode/decode only the area of interest where the object is located. Therefore, if you try to encode/decode in CU units, it consists of 24 CUs, but the object-based compression unit may differ for each picture because it configures the compression unit only where the object is located. Therefore, bit allocation can vary depending on the area of the object-based compression unit.

물체가 존재하는 영역(Region of Interest)에 대해서는 더 많은 비트를 할당하여 압축을 할 수 있게하며, 물체가 존재하지 않는 영 (Region of non-Interest) 역에 대해서는 적은 비트를 할당하여 압축을 수행할 수 있도록 할 수 있다. Compression can be performed by allocating more bits to the region where objects exist (Region of Interest), and compression can be performed by allocating fewer bits to regions where objects exist (Region of non-Interest). You can do it.

이때, ROI 영역과 non-ROI을 서로 다르게 총 비트양을 할당할 수 있다.At this time, the total amount of bits can be allocated differently to the ROI area and the non-ROI.

이때, ROI 영역과 non-ROI 비트양을 가중치를 사용하여 할당할 수 있다.At this time, the ROI area and non-ROI bit amount can be allocated using weights.

화면내/화면간 예측Intra/interscreen prediction

표 1은 물체 기반 화면 간 예측 방법과 기존 영상 압축 과정을 함께 syntax로 나타낸다.Table 1 shows the object-based inter-screen prediction method and the existing image compression process together in syntax.

표 1에서의 isObjectflag[x0][y0]은 현재 부호화/복호화하는 블록에 물체의 유무를 나타낸다. 이때, x0와 y0는 블록의 왼쪽과 상단의 위치를 각각 나타낸다.isObjectflag[x0][y0] in Table 1 indicates the presence or absence of an object in the currently encoding/decoding block. At this time, x0 and y0 represent the left and top positions of the block, respectively.

표 1에서의 object_based_prediction_flag는 Rate Distortion Optimization 과정에서 기존 영상 압축 과정보다 효율에 따라 플래그의 값이 다르게 나올 수 있다. 이때, object_based_prediction_flag의 제1 값은 0이며, 제2 값은 1일 수 있다.The object_based_prediction_flag in Table 1 may have a different flag value depending on the efficiency of the Rate Distortion Optimization process compared to the existing video compression process. At this time, the first value of object_based_prediction_flag may be 0, and the second value may be 1.

예를 들어, object_based_prediction_flag가 제1 값인 경우, 기존 영상 압축 과정의 예측 방법을 수행할 수 있다. For example, if object_based_prediction_flag is the first value, the prediction method of the existing image compression process can be performed.

예를 들어, object_based_prediction_flag가 제2 값인 경우, 물체 기반 화면 간 예측 방법을 수행할 수 있다.For example, if object_based_prediction_flag is the second value, an object-based inter-screen prediction method can be performed.

[표 1][Table 1]

물체가 존재하는 경우는 [E1]의 정보를 활용한 또는 물체 인식 알고리즘을 통해 얻은 정보를 바탕으로 움직임 예측/보상(Motion Estimation/Compensation) 수행할 수 있다. If an object exists, motion estimation/compensation can be performed using the information in [E1] or based on information obtained through an object recognition algorithm.

또는 물체가 존재하는 부분에 대해서만 움직임 예측/보상을 수행할 수 있다.Alternatively, motion prediction/compensation can be performed only for the portion where the object exists.

물체가 1개 이상 있는 경우, 예컨대 수학식 1과 같이, 물체의 식별번호를 통해 움직임 예측/보상을 수행할 수 있다.If there is more than one object, motion prediction/compensation can be performed through the object's identification number, for example, as shown in Equation 1.

[수학식 1][Equation 1]

SAD는 Sum of Absolute Difference를 의미하며, 부호화 과정에서 SAD가 최소값인 순간의 m_x와 m_y를 사용할 수 있다.SAD stands for Sum of Absolute Difference, and during the encoding process, m _x and m _y can be used at the moment when SAD is the minimum value.

하나의 예로, 물체가 존재하지 않는 블록 또는 영역에 움직임 예측/보상을 덜 수행하여, 부호화 과정을 단순하게 하기위해서 z는 0 또는 F_k-1(x+i,y+j)을 사용할 수 있다.As an example, z can be 0 or F _k-1 (x+i,y+j) to simplify the encoding process by performing less motion prediction/compensation on blocks or areas where objects do not exist. .

또 다른 예로, 물체가 존재하지 않는 블록 또는 영역에 기존에 영상 압축 표준화 기술을 사용할 수 있다. As another example, existing image compression standardization technology can be used on blocks or areas where objects do not exist.

또는, 물체가 1개 이상 있는 경우, 예컨대 수학식 2와 같이, 물체의 식별번호를 통해 움직임 예측/보상을 수행할 수 있다.Alternatively, if there is more than one object, motion prediction/compensation can be performed through the identification number of the object, for example, as shown in Equation 2.

[수학식 2][Equation 2]

은 물체가 존재하면 사용하고 그 외의 경우는 z를 사용하는 것을 의미한다. 그리고, SAD는 Sum of Absolute Difference를 의미하며, 부호화 과정에서 SAD가 최소값인 순간의 와 사용할 수 있다. If an object exists This means using z in other cases. And, SAD stands for Sum of Absolute Difference, and is the moment when SAD is the minimum value during the encoding process. and You can use it.

하나의 예로, 물체가 존재하지 않는 블록 또는 영역에 움직임 예측/보상을 덜 수행하여, 부호화 과정을 단순하게 하기위해서 z는 0 또는 을 사용할 수 있다.As an example, to simplify the encoding process by performing less motion prediction/compensation on blocks or areas where objects do not exist, z is 0 or can be used.

또 다른 예로, 물체가 존재하지 않는 블록 또는 영역에 기존에 영상 압축 표준화 기술을 사용할 수 있다. 이때, z는 기존 영상 압축 표준화 기술을 통해서 얻은 움직임 벡터 값일 수 있다.As another example, existing image compression standardization technology can be used on blocks or areas where objects do not exist. At this time, z may be a motion vector value obtained through existing image compression standardization technology.

상기 수학식 1과 상기 수학식 2와 관련하여,In relation to Equation 1 and Equation 2 above,

하나의 예로, m_x와 m_y는 영상 압축 표준화에서 사용되는 방법인 움직임 예측/보상 과정을 통해서 유도할 수 있다.As an example, m _x and m _y can be derived through a motion prediction/compensation process, a method used in image compression standardization.

또 다른 예로, m_x와 m_y는 물체 인식 알고리즘을 통해서 얻은 정보를 바탕으로 유도할 수 있다. 구체적으로, 도 13을 통해서 설명할 수 있다.As another example, m _x and m _y can be derived based on information obtained through an object recognition algorithm. Specifically, it can be explained through FIG. 13.

도 13은 본 발명의 일 실시예에 따른 화면간 예측을 설명하기 위한 도면이다.Figure 13 is a diagram for explaining inter-screen prediction according to an embodiment of the present invention.

[도 13] 화면간 예측[Figure 13] Inter-screen prediction

도 13의 (1)은 현재 부호화하고 있는 픽처이며, 도 13의 (2)는 참조 픽처라고 할 수 있다. 이때, 도 13의 (3)은 현재 픽처에서 참조 픽처의 m_x와 m_y 값을 나타낸 것이라고 볼 수 있다. 물체 인식 알고리즘을 통해서 x축, y축 좌표 정보를 알 수 있기 때문에 도 13의 (3)과 같이, 차분값을 통해서 m_x와 m_y 값을 구할 수 있다. 이때, 인식한 물체가 사각형이기 때문에, 4개의 모서리에 대해서 값을 가지기 때문에 부호화 과정에서 모든 4개의 값을 통해서 움직임 예측/보상을 수행할 수 있다. 이렇게 같은 색상의 물체(같은 식별 번호)에 대해서 움직임 예측/보상을 수행할 수 있다.(1) in FIG. 13 is a picture currently being encoded, and (2) in FIG. 13 can be said to be a reference picture. At this time, (3) in FIG. 13 can be viewed as showing the m _x and m _y values of the reference picture in the current picture. Since the x-axis and y-axis coordinate information can be known through the object recognition algorithm, the m _x and m _y values can be obtained through the difference value, as shown in (3) of FIG. 13. At this time, because the recognized object is a square and has values for four corners, motion prediction/compensation can be performed through all four values during the encoding process. In this way, motion prediction/compensation can be performed for objects of the same color (same identification number).

또 다른 예로, 도 13의 (3)에서 현재 픽처와 참조 픽처의 인식된 사물의 4개의 모서리를 통해서 각각의 차분값인 m_x와 m_y값을 구할 수 있다. 이때, 현재 픽처의 인식된 사물의 4개의 모서리와 참조 픽처의 인식된 사물의 4개의 모서리를 통해서 어파인 변환 기법을 사용하여, 현재 부호화 하고자 하는 블록에 대해서 m_x와 m_y를 유도할 수 있다. 이때, 유도된 움직임 벡터를 활용하여, 움직임 예측/보상을 수행할 수 있다.As another example, in (3) of FIG. 13, the difference values m _x and m _y can be obtained through the four corners of the recognized object in the current picture and the reference picture. At this _time , _m . At this time, motion prediction/compensation can be performed using the derived motion vector.

변환/양자화(역변환/역양자화)Transformation/quantization (inverse transformation/inverse quantization)

물체가 존재하는 여부에 따라서 양자화 파라미터를 변경해주는 방법을 사용할 수 있다.A method of changing the quantization parameter depending on whether an object exists can be used.

예를 들어, [E2]압축 단위 재구성을 통해서 물체 기반 압축 단위가 도 6 (1)과 같이 디자인이 되었다고 가정할 때, 물체 기반 압축 단위에서는 픽처에서 사용하는 양자화 파라미터 값을 유도해와서 사용할 수 있다. 이때, 물체 기반 압축 단위에서는 물체가 인식되었기 때문에, 더 나은 화질을 가지기 위해서 픽처에서 유도해온 양자화 파라미터에 임의의 양수 N을 빼주어서 더 낮은 양자화 파라미터를 현재 부호화하고자하는 압축 단위에 적용하여 압축할 수 있다. 마찬가지로 복호화 과정에서는, 인식된 물체의 정보를 가지기 때문에 부호화 과정과 마찬가지로 임의의 양수 N을 빼주어서 역양자화를 할 수 있다. 또 다른 예로, 압축 단위 재구성을 통해서 물체 기반 압축 단위가 도 6 (2)과 같이 디자인이 되었다고 가정할 때, 물체 기반 압축 단위에서는 픽처에서 사용하는 양자화 파라미터 값을 유도해와서 사용할 수 있다. 이때, 물체 기반 압축 단위에서는 물체가 인식되었기 때문에, 더 나은 화질을 가지기 위해서 픽처에서 유도해온 양자화 파라미터에 임의의 양수 N을 빼주어서 더 낮은 양자화 파라미터를 현재 부호화하고자하는 압축 단위에 적용하여 압축할 수 있다. 마찬가지로 복호화 과정에서는, 인식된 물체의 정보를 가지기 때문에 부호화 과정과 마찬가지로 임의의 양수 N을 빼주어서 역양자화를 할 수 있다. For example, assuming that an object-based compression unit is designed as shown in Figure 6 (1) through [E2] compression unit reconstruction, the quantization parameter value used in the picture can be derived and used in the object-based compression unit. . At this time, since the object is recognized in the object-based compression unit, in order to have better image quality, a random positive number N can be subtracted from the quantization parameter derived from the picture and compression can be performed by applying a lower quantization parameter to the compression unit to be currently encoded. there is. Likewise, in the decoding process, since information about the recognized object is included, inverse quantization can be performed by subtracting an arbitrary positive number N, as in the encoding process. As another example, assuming that an object-based compression unit is designed as shown in Figure 6 (2) through compression unit reconstruction, the quantization parameter value used in the picture can be derived and used in the object-based compression unit. At this time, since the object is recognized in the object-based compression unit, in order to have better image quality, a random positive number N can be subtracted from the quantization parameter derived from the picture and compression can be performed by applying a lower quantization parameter to the compression unit to be currently encoded. there is. Likewise, in the decoding process, since information about the recognized object is included, inverse quantization can be performed by subtracting an arbitrary positive number N, as in the encoding process.

또 다른 예로, [E2]압축 단위 재구성을 사용하지 않는 경우, 현재 부호화하고 있는 슬라이스/CTU열/CTU행 안에 인식된 물체가 존재한다고 할때, 더 나은 화질을 가지기 위해서 픽처에서 유도해온 양자화 파라미터에 임의의 양수 N을 빼주어서 더 낮은 양자화 파라미터를 현재 부호화하고자하는 슬라이스/CTU열/CTU행에 적용하여 압축할 수 있다. 마찬가지로 복호화 과정에서는, 인식된 물체의 정보를 가지기 때문에 부호화 과정과 마찬가지로 임의의 양수 N을 빼주어서 역양자화를 할 수 있다. As another example, when [E2] compression unit reconstruction is not used, when a recognized object exists in the slice/CTU column/CTU row that is currently being encoded, the quantization parameters derived from the picture are used to obtain better image quality. Compression can be achieved by subtracting an arbitrary positive number N and applying a lower quantization parameter to the slice/CTU column/CTU row currently being encoded. Likewise, in the decoding process, since information about the recognized object is included, inverse quantization can be performed by subtracting an arbitrary positive number N, as in the encoding process.

또 다른 예로, [E2]압축 단위 재구성을 사용하지 않는 경우, 현재 부호화하고 있는 CTU가 인식된 물체의 영역에 포함이 된다고 할때, 더 나은 화질을 가지기 위해서 슬라이스에서 유도해온 양자화 파라미터에 임의의 양수 N을 빼주어서 더 낮은 양자화 파라미터를 현재 부호화하고자 하는 CTU에 적용하여 압축할 수 있다. 마찬가지로 복호화 과정에서는, 인식된 물체의 정보를 가지기 때문에 부호화 과정과 마찬가지로 임의의 양수 N을 빼주어서 역양자화를 할 수 있다. As another example, when [E2] compression unit reconstruction is not used, when the currently encoding CTU is included in the area of the recognized object, a random positive number is added to the quantization parameter derived from the slice in order to have better image quality. Compression can be achieved by subtracting N and applying a lower quantization parameter to the CTU currently being encoded. Likewise, in the decoding process, since information about the recognized object is included, inverse quantization can be performed by subtracting an arbitrary positive number N, as in the encoding process.

낮은 양자화 마라미터를 사용하는 경우, 영상에서의 높은 주파수 영역 부분을 살릴수 있기 때문에 더 높은 화질을 가져올 수 있다. 단, 화질이 올라가는 대신 비트의 상승도 가져올 수 있다.If a low quantization parameter is used, higher image quality can be obtained because the high frequency region of the image can be preserved. However, instead of increasing the image quality, it may also result in an increase in bits.

엔트로피 부호화/복호화Entropy encoding/decoding

[E1] 물체 인식 알고리즘 적용을 통해서 얻은 데이터를 전송하는 방법을 나타낼 수 있다. [E1]을 통해서 얻은 물체 인식 정보(데이터)는 다음과 같을 수 있다. 인식된 물체의 개수, 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 외 기타 다른 추가적인 데이터를 전송할 수 있다. 이때, [E1]에서 제시된 방법 중에 2개 이상의 물체에 대해서는 2번째의 물체 정보에서 x축과 y축에 대해서는 차분값을 전송하는 방법을 사용할 수 있다. 단, 차분값을 유도하는 과정을 사용하지 않는 경우, 기존의 x축과 y축 정보, 물체의 너비와 인식된 물체의 높이를 전송하는 방법에 사용할 수 있다. 단, 이때 필요없는 데이터는 전송하지 않을 수 있다.[E1] Can indicate a method of transmitting data obtained through application of an object recognition algorithm. Object recognition information (data) obtained through [E1] may be as follows. The number of recognized objects, x- and y-axis information of recognized objects, width of recognized objects, height of recognized objects, type of object, recognized accuracy of objects, object identification number, and other additional data can be transmitted. You can. At this time, among the methods presented in [E1], for two or more objects, a method of transmitting difference values for the x-axis and y-axis in the second object information can be used. However, if the process of deriving the difference value is not used, it can be used to transmit the existing x-axis and y-axis information, the width of the object, and the height of the recognized object. However, at this time, unnecessary data may not be transmitted.

하나의 예로, 물체가 N개가 있다고 가정할 때, 차분값을 활용하지 않고 전송하는 경우, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도, 물체의 식별 번호 외 기타 정보는 모두 양의 값으로 구성되어 있기 때문에 전송할 수 있다.As an example, assuming that there are N objects, when transmitting without using the difference value, the width of the recognized object, the height of the recognized object, the type of object, the recognized accuracy of the object, the object identification number, etc. All other information can be transmitted because it consists of positive values.

또 다른 예로, 물체가 N개가 있다고 가정할 때, 차분값을 활용하여 전송하는 경우, 첫번째 물체들에 대해서는 모두 양의 값이라고 가정하여 전송할 수 있다. 2번째 이상의 물체에 대해서는 차분값을 활용하기 때문에 인식된 물체의 x축과 y축 정보, 인식된 물체의 너비, 인식된 물체의 높이, 물체의 종류, 물체의 인식된 정확도 등의 정보에 대해서는 음수의 값을 가질 수 있기 때문에 부호를 나타내는 플래그를 전송한 후, 차분값의 절대값을 전송할 수 있다. As another example, assuming that there are N objects and transmitting using difference values, the first objects can be transmitted assuming that all values are positive. Since the difference value is used for the second or higher object, the information such as x-axis and y-axis information of the recognized object, width of the recognized object, height of the recognized object, type of object, and recognized accuracy of the object are negative. Since it can have a value of , after transmitting a flag indicating the sign, the absolute value of the difference value can be transmitted.

이때의 N은 2개 이상의 양의 정수 값이다.At this time, N is two or more positive integer values.

부호화를 한 후, 비트스트림으로 전송한 값을 [D1]의 과정을 통해 부호화할 수 있다. [E7]에서 전송한 순서대로, [D1]을 복호화할 수 있다. 이때, N개 이상의 물체에 대해서 차분값을 통해서 전송하는 경우, 첫번째 복호화하는 물체의 경우 모두 양수의 값으로 전송되기 때문에 부호화하는 과정과 같이 복호화할 수 있다. 이때, 두번째 이상의 물체에 대해서는 첫번째 물체의 값에 복호화한 값에 부호를 적용시킨 후에 더해주어 복호화를 할 수 있다.After encoding, the value transmitted as a bitstream can be encoded through the process of [D1]. [D1] can be decrypted in the order transmitted in [E7]. At this time, when N or more objects are transmitted through difference values, the first object to be decoded is all transmitted as a positive value, so it can be decoded in the same way as the encoding process. At this time, the second or more objects can be decoded by applying a sign to the decoded value of the first object and then adding it.

[E2] 압축 단위 재구성에 사용하는 압축 단위의 재구성을 결정하는 object_based_coding_unit_reconstruction_flag 상기 플래그를 [E7]의 과정을 통하여 부호화한 후에 비트스트림을 통하여 전송할 수 있다. 이때, [D1]의 과정을 통해서, 비트스트림을 받은 후, 복호화할 수 있다.[E2] object_based_coding_unit_reconstruction_flag that determines the reconstruction of the compression unit used for compression unit reconstruction. The flag can be encoded through the process of [E7] and then transmitted through the bitstream. At this time, after receiving the bitstream through the process of [D1], it can be decoded.

물체 인식 알고리즘을 사용하기 위해, 상위 레벨 구문 요소에서 어떤 물체 인식 알고리즘을 사용할지 결정할 수 있다. 물체 인식 알고리즘 사용 플래그 또는 변수인 object_detection_algorithm_selection_index를 시그널링할 수 있다.To use an object recognition algorithm, a high-level syntax element can determine which object recognition algorithm to use. The object recognition algorithm use flag or variable object_detection_algorithm_selection_index can be signaled.

이때, 상기 변수가 제1 값인 경우, 물체 인식 알고리즘을 사용하지 않는다. 상기 변수가 제2 값인 경우, 사전에 결정되어 있는 첫번째 물체 인식 알고리즘을 사용한다. 상기 변수가 제3 값인 경우, 사전에 결정되어 있는 두번째 물체 인식 알고리즘을 사용할 수 있다.At this time, if the variable is the first value, the object recognition algorithm is not used. If the variable is the second value, the first object recognition algorithm determined in advance is used. If the variable is a third value, a second predetermined object recognition algorithm can be used.

이때, 제1 값은 0일수 있으며, 제2 값은 1일수 있으며, 제3 값은 2일수 있다. At this time, the first value may be 0, the second value may be 1, and the third value may be 2.

이때, 물체 인식 알고리즘이 N개 제공된다고 한다면, 제(N+1) 까지 위의 방법과 같이 정의할 수 있다. 제1 값은 0일수 있으며, 제2 값은 1일수 있으며, 제3 값은 2일수 있으며, …, 제(N+1) 값은 N일수 있다.At this time, if N object recognition algorithms are provided, up to the (N+1)th number can be defined as above. The first value may be 0, the second value may be 1, the third value may be 2,... , the (N+1)th value may be N.

물체 인식 알고리즘을 사용하기 위해, 상위 레벨 구문 요소에서 어떤 물체의 종류를 인식할지 나타내는 인덱싱 정보를 사용할지 결정할 수 있다. 인식된 물체의 종류를 나타내는 index_list_of_object_names를 시그널링할 수 있다.To use an object recognition algorithm, you can decide whether to use indexing information in a high-level syntax element to indicate what type of object to recognize. Index_list_of_object_names indicating the type of recognized object can be signaled.

이때, 인식된 물체의 종류 나타내는 인덱스는 총 10, 20, 30개로 정해진 상태일 수 있다.At this time, the index indicating the type of recognized object may be set to a total of 10, 20, or 30.

도 14 및 도 15는 본 발명의 실시예들에 적용될 수 있는 물체 분류표의 일 예들을 나타낸 도면들이다.14 and 15 are diagrams showing examples of object classification tables that can be applied to embodiments of the present invention.

[도 14] 물체 분류표 1[Figure 14] Object classification table 1

[도 15] 물체 분류표 2[Figure 15] Object classification table 2

도 14 및 도 15는 각각 voc.data 또는 coco.data로 물체 인식 영역에서 자주 사용되고 있다.Figures 14 and 15 are voc.data or coco.data, respectively, and are frequently used in the object recognition area.

도 14 및 도 15와 같이 물체의 종류를 인덱스 별로 구성하여 제공할 수 있다. 이때, 부호화/복호화 과정에서 index_list_of_object_names를 통해서 시그널링하여 제공할 수 있다.As shown in Figures 14 and 15, the types of objects can be organized and provided by index. At this time, signaling can be provided through index_list_of_object_names during the encoding/decoding process.

도 16은 본 발명의 일 실시예에 따른 영상 부호화 방법을 나타낸 흐름도이다.Figure 16 is a flowchart showing an image encoding method according to an embodiment of the present invention.

[도 16] 영상 부호화 방법[Figure 16] Video encoding method

도 16을 참조하면, 물체 인식 알고리즘에 기반하여, 입력 영상 내 물체 인식 정보를 획득하는 단계(S1310), 및 상기 물체 인식 정보에 기반하여 상기 입력 영상을 부호화하는 단계(S1320)를 포함하고, 상기 물체 인식 정보가 상기 입력 영상 내의 제1 물체에 대한 제1 물체 인식 정보 및 상기 입력 영상 내의 제2 물체에 대한 제2 물체 인식 정보를 포함하는 경우, 상기 제2 물체 인식 정보는 상기 제1 물체 인식 정보와의 차분 정보로서 부호화될 수 있다.Referring to FIG. 16, the method includes obtaining object recognition information in an input image based on an object recognition algorithm (S1310), and encoding the input image based on the object recognition information (S1320), When the object recognition information includes first object recognition information for a first object in the input image and second object recognition information for a second object in the input image, the second object recognition information is used to recognize the first object. It can be encoded as difference information from information.

또한, 상기 영상 부호화 방법에 의해 생성된 비트스트림은 컴퓨터 판독가능한 기록 매체에 저장되어 영상 복호화 장치로 전달될 수 있다.Additionally, the bitstream generated by the video encoding method can be stored in a computer-readable recording medium and transmitted to an video decoding device.

도 17은 본 발명의 일 실시예에 따른 영상 복호화 방법을 나타낸 흐름도이다.Figure 17 is a flowchart showing an image decoding method according to an embodiment of the present invention.

[도 17] 영상 복호화 방법[Figure 17] Video decoding method

도 17을 참조하면, 영상 복호화 장치에 의해 수행되는 영상 복호화 방법에 있어서, 입력 영상에 대한 물체 인식 정보를 획득하는 단계(S1410) 및 상기 물체 인식 정보에 기반하여, 상기 입력 영상을 복원하는 단계(S1420)를 포함하고, 상기 물체 인식 정보가 상기 입력 영상 내의 제1 물체에 대한 제1 물체 인식 정보 및 상기 입력 영상 내의 제2 물체에 대한 제2 물체 인식 정보를 포함하는 경우, 상기 제2 물체 인식 정보는 상기 제1 물체 인식 정보와의 차분 정보로서 유도될 수 있다.Referring to FIG. 17, in the image decoding method performed by the image decoding device, obtaining object recognition information for the input image (S1410) and restoring the input image based on the object recognition information (S1410) S1420), and when the object recognition information includes first object recognition information for the first object in the input image and second object recognition information for the second object in the input image, the second object is recognized Information may be derived as difference information with the first object recognition information.

일 실시예에서, 상기 입력 영상을 복원하는 단계는, 상기 입력 영상의 제1 압축 단위를 재구성할지 여부를 결정하는 단계, 및 상기 제1 압축 단위를 재구성하기로 결정된 경우 상기 물체 인식 정보에 기반하여 상기 제1 압축 단위를 소정의 제2 압축 단위로 재구성하는 단계, 및 상기 제2 압축 단위에 기반하여 상기 입력 영상을 복원하는 단계를 포함할 수 있다.In one embodiment, reconstructing the input image includes determining whether to reconstruct a first compression unit of the input image, and if it is determined to reconstruct the first compression unit, based on the object recognition information. It may include reconstructing the first compression unit into a predetermined second compression unit, and restoring the input image based on the second compression unit.

일 실시예에서, 상기 제2 압축 단위는, 픽처, 슬라이스, 타일, 타일 그룹, CTU(coding tree unit) 행 또는 CTU 열의 상위 단위로서 결정될 수 있다.In one embodiment, the second compression unit may be determined as a higher-order unit of a picture, slice, tile, tile group, CTU (coding tree unit) row, or CTU column.

일 실시예에서, 상기 입력 영상 내에서 인식된 물체가 존재하지 않는 것에 기반하여, 상기 입력 영상을 복원하는 단계 및 상기 복원된 입력 영상을 필터링하는 단계는 스킵될 수 있다.In one embodiment, based on the fact that the recognized object does not exist in the input image, the steps of restoring the input image and filtering the restored input image may be skipped.

일 실시예에서, 상기 제1 압축 단위를 재구성할지 여부는, 비트스트림으로부터 획득되는 소정의 플래그에 기반하여 결정될 수 있다.In one embodiment, whether to reconstruct the first compression unit may be determined based on a predetermined flag obtained from the bitstream.

일 실시예에서, 상기 입력 영상을 복원하는 단계 및 상기 복원된 입력 영상을 필터링하는 단계는, 상기 입력 영상 내에서 물체가 인식된 영역에 대해서만 선택적으로 수행될 수 있다.In one embodiment, the steps of restoring the input image and filtering the restored input image may be selectively performed only for areas in the input image where an object is recognized.

일 실시예에서, 상기 입력 영상을 복원하는 단계 및 상기 복원된 입력 영상을 필터링하는 단계는, 상기 입력 영상 내에서 물체가 인식된 제1 영역 및 물체가 인식되지 않은 제2 영역에 대해 차등적으로 수행될 수 있다.In one embodiment, the step of restoring the input image and the step of filtering the restored input image are performed differentially for a first area in which an object is recognized and a second area in which an object is not recognized in the input image. It can be done.

일 실시예에서, 상기 제2 압축 단위는 상기 입력 영상 내에서 인식된 각각의 물체에 대하여 개별적으로 결정될 수 있다.In one embodiment, the second compression unit may be individually determined for each object recognized in the input image.

일 실시예에서, 상기 입력 영상 내에서 인식된 물체들 중에서 상호 중첩된 제1 물체들에 대하여, 상기 제2 압축 단위는 상기 제1 물체들을 완전히 커버하는 소정의 영역으로 결정될 수 있다.In one embodiment, for first objects that overlap each other among the objects recognized in the input image, the second compression unit may be determined to be a predetermined area that completely covers the first objects.

일 실시예에서, 상기 입력 영상의 비트율(bitrate)은 상기 물체 인식 정보에 기반하여 소정 단위로 결정될 수 있다.In one embodiment, the bitrate of the input image may be determined in predetermined units based on the object recognition information.

일 실시예에서, 상기 소정의 단위는 제1 압축 단위 및 제2 압축 단위 중 어느 하나로 결정되되, 상기 제1 압축 단위는 픽처 그룹(Group of Picture, GoP) 단위, 픽처 단위 및 CU(coding unit) 단위를 포함하고, 상기 제2 압축 단위는, 상기 물체 인식 정보에 기반하여 상기 제1 압축 단위로부터 재구성된 단위일 수 있다.In one embodiment, the predetermined unit is determined to be one of a first compression unit and a second compression unit, and the first compression unit is a group of picture (GoP) unit, a picture unit, and a coding unit (CU). The second compression unit may be a unit reconstructed from the first compression unit based on the object recognition information.

일 실시예에서, 상기 입력 영상 내에서, 인식된 물체를 포함하는 제1 영역의 비트율은 상기 인식된 물체를 포함하지 않는 제2 영역의 비트율보다 높을 수 있다.In one embodiment, within the input image, a bit rate of a first area including a recognized object may be higher than a bit rate of a second area not including the recognized object.

일 실시예에서, 상기 비트율은, 상기 입력 영상 내에서 인식된 물체를 포함하는 각각의 영역마다 개별적으로 결정될 수 있다.In one embodiment, the bit rate may be determined individually for each area containing a recognized object within the input image.

일 실시예에서, 상기 비트율은 상기 각각의 영역에 포함된 물체의 중요도에 기반하여 상이하게 결정될 수 있다.In one embodiment, the bit rate may be determined differently based on the importance of objects included in each area.

일 실시예에서, 상기 복원된 입력 영상은 상기 물체 인식 정보에 기반하여 인터 예측을 수행함으로써 생성될 수 있다.In one embodiment, the reconstructed input image may be generated by performing inter prediction based on the object recognition information.

일 실시예에서, 상기 인터 예측은 상기 입력 영상 내에서 인식된 물체를 포함하는 영역에 대해서만 수행될 수 있다.In one embodiment, the inter prediction may be performed only on an area containing a recognized object within the input image.

일 실시예에서, 상기 복원된 입력 영상은 상기 물체 인식 정보에 기반하여 역양자화를 수행함으로써 생성될 수 있다.In one embodiment, the restored input image may be generated by performing inverse quantization based on the object recognition information.

일 실시예에서, 상기 입력 영상 내에서, 인식된 물체를 포함하는 제1 영역의 양자화 파라미터는 상기 인식된 물체를 포함하지 않는 제2 영역의 양자화 파라미터보다 작을 수 있다.In one embodiment, within the input image, a quantization parameter of a first area containing the recognized object may be smaller than a quantization parameter of a second area not containing the recognized object.

일 실시예에서, 상기 물체 인식 정보는 입력 영상 정보를 포함하는 제1 비트스트림으로부터 획득될 수 있다.In one embodiment, the object recognition information may be obtained from a first bitstream including input image information.

일 실시예에서, 상기 물체 인식 정보는 입력 영상 정보를 포함하는 제1 비트스트림과는 다른 제2 비트스트림으로부터 획득될 수 있다.In one embodiment, the object recognition information may be obtained from a second bitstream that is different from the first bitstream including input image information.

도 18은 본 개시의 일 실시예에 따른 영상 부호화/복호화 장치를 포함하는 전자기기를 개략적으로 나타낸 블록도이다.Figure 18 is a block diagram schematically showing an electronic device including an image encoding/decoding device according to an embodiment of the present disclosure.

[도 18] 전자기기[Figure 18] Electronic devices

도 18을 참조하면, 전자기기(1800)는 스마트폰, 태블릿 PC, 스마트 웨어러블 기기 등을 포함하는 포괄적 개념으로서, 디스플레이(1810), 메모리(1820) 및 프로세서(1830)를 포함할 수 있다.Referring to FIG. 18, the electronic device 1800 is a comprehensive concept including a smartphone, tablet PC, smart wearable device, etc., and may include a display 1810, a memory 1820, and a processor 1830.

디스플레이(1810)는 OLED(Organic Light Emitting Diode), LCD(Liguid Crystal Display), PDP(Plasma Display Panel) 디스플레이 등을 포함할 수 있으며, 각종 영상을 화면에 표시할 수 있다. 또한, 디스플레이(1810)는 사용자 인터페이스 기능을 제공할 수도 있다. 예를 들어, 디스플레이(1810)는 사용자가 각종 명령을 입력하기 위한 수단을 제공할 수 있다.The display 1810 may include an Organic Light Emitting Diode (OLED), Liquid Crystal Display (LCD), or Plasma Display Panel (PDP) display, and may display various images on the screen. Additionally, the display 1810 may provide user interface functions. For example, the display 1810 may provide a means for a user to input various commands.

메모리(1820)는 전자기기(1800)의 동작에 필요한 데이터 또는 멀티미디어 데이터 등을 저장하는 저장 매체일 수 있다. 메모리(1820)는 반도체 소자를 기반으로 하는 저장 장치를 포함할 수 있다. 예를 들어, 메모리(1820)는 DRAM, SDRAM(Synchronous DRAM), DDR SDRAM(Double Data Rate SDRAM), LPDDR SDRAM(Low Power Double Data Rate SDRAM), GDDR SDRAM(Graphics Double Data Rate SDRAM), DDR2 SDRAM, DDR3 SDRAM, DDR4 SDRAM 등과 같은 동적 랜덤 억세스 메모리 장치 또는 PRAM(Phase change Random Access Memory), MRAM(Magnetic Random Access Memory), RRAM(Resistive Random Access Memory) 등과 같은 저항성 메모리 장치를 포함할 수 있다. 또한, 메모리(1820)는 저장 장치로서 솔리드 스테이트 드라이브(SSD), 하드 디스크 드라이브(HDD) 및 광학 드라이브(ODD) 중 적어도 하나를 포함할 수 있다. The memory 1820 may be a storage medium that stores data or multimedia data necessary for the operation of the electronic device 1800. The memory 1820 may include a storage device based on a semiconductor device. For example, the memory 1820 may include DRAM, synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), low power double data rate SDRAM (LPDDR SDRAM), graphics double data rate SDRAM (GDDR SDRAM), and DDR2 SDRAM. It may include a dynamic random access memory device such as DDR3 SDRAM, DDR4 SDRAM, or a resistive memory device such as PRAM (Phase change Random Access Memory), MRAM (Magnetic Random Access Memory), or RRAM (Resistive Random Access Memory). Additionally, the memory 1820 may include at least one of a solid state drive (SSD), a hard disk drive (HDD), and an optical drive (ODD) as a storage device.

프로세서(1830)는 신경망에 기반하여 도 1 내지 도 17을 참조하여 전술한 영상 부호화/복호화 방법을 수행할 수 있다. 구체적으로, 프로세서(1830)는, 신경망 기반으로, 입력 영상으로부터 제1 영상 특징을 획득할 수 있다. 프로세서(1830)는, 신경망 기반으로, 상기 입력 영상의 블록 정보로부터 상기 입력 영상의 블록 정보 특징을 획득할 수 있다. 프로세서(1830)는, 상기 블록 정보 특징에 기반하여, 신경망 기반으로, 상기 제1 영상 특징의 노이즈 및 왜곡을 제거함으로써, 제2 영상 특징을 획득할 수 있다. 그리고, 프로세서(1830)는, 상기 제2 영상 특징에 기반하여, 신경망 기반으로, 상기 입력 영상을 복원할 수 있다. 이 때, 상기 블록 정보는, 상기 입력 영상의 블록 분할 구조를 나타내는 블록 경계맵 및 상기 입력 영상의 부호화 정보를 나타내는 블록 분포맵 중 적어도 하나를 포함할 수 있다.The processor 1830 may perform the image encoding/decoding method described above with reference to FIGS. 1 to 17 based on a neural network. Specifically, the processor 1830 may obtain the first image feature from the input image based on a neural network. The processor 1830 may obtain block information features of the input image from block information of the input image based on a neural network. The processor 1830 may obtain a second image feature by removing noise and distortion of the first image feature based on the block information feature and a neural network. And, the processor 1830 may restore the input image based on a neural network based on the second image feature. At this time, the block information may include at least one of a block boundary map indicating a block division structure of the input image and a block distribution map indicating encoding information of the input image.

프로세서(1830)는 CPU(Central Processing Unit) 또는 마이크로프로세서 유닛(MCU), 시스템 온 칩(SoC) 등일 수 있으며, 버스(1840)를 통해 디스플레이(1810) 및/또는 메모리(1820)와 각종 데이터를 교환할 수 있다.The processor 1830 may be a central processing unit (CPU), a microprocessor unit (MCU), a system-on-chip (SoC), etc., and communicates with the display 1810 and/or memory 1820 and various data through the bus 1840. It can be exchanged.

상술한 실시예들에서, 방법들은 일련의 단계 또는 유닛으로서 순서도를 기초로 설명되고 있으나, 본 발명은 단계들의 순서에 한정되는 것은 아니며, 어떤 단계는 상술한 바와 다른 단계와 다른 순서로 또는 동시에 발생할 수 있다. 또한, 당해 기술 분야에서 통상의 지식을 가진 자라면 순서도에 나타난 단계들이 배타적이지 않고, 다른 단계가 포함되거나, 순서도의 하나 또는 그 이상의 단계가 본 발명의 범위에 영향을 미치지 않고 삭제될 수 있음을 이해할 수 있을 것이다. In the above-described embodiments, the methods are described based on flowcharts as a series of steps or units, but the present invention is not limited to the order of steps, and some steps may occur in a different order or simultaneously with other steps as described above. You can. Additionally, a person of ordinary skill in the art will recognize that the steps shown in the flowchart are not exclusive and that other steps may be included or one or more steps in the flowchart may be deleted without affecting the scope of the present invention. You will understand.

상술한 실시예는 다양한 양태의 예시들을 포함한다. 다양한 양태들을 나타내기 위한 모든 가능한 조합을 기술할 수는 없지만, 해당 기술 분야의 통상의 지식을 가진 자는 다른 조합이 가능함을 인식할 수 있을 것이다. 따라서, 본 발명은 이하의 특허청구범위 내에 속하는 모든 다른 교체, 수정 및 변경을 포함한다고 할 것이다.The above-described embodiments include examples of various aspects. Although it is not possible to describe all possible combinations for representing the various aspects, those skilled in the art will recognize that other combinations are possible. Accordingly, the present invention is intended to include all other substitutions, modifications and changes falling within the scope of the following claims.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable by those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.In the above, the present invention has been described with specific details such as specific components and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , a person skilled in the art to which the present invention pertains can make various modifications and variations from this description.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all modifications equivalent to or equivalent to the scope of the claims fall within the scope of the spirit of the present invention. They will say they do it.

Claims

In an image decoding method performed by an image decoding device,
Obtaining object recognition information for an input image; and
Based on the object recognition information, restoring the input image,
When the object recognition information includes first object recognition information for the first object in the input image and second object recognition information for the second object in the input image, the second object recognition information is the first object recognition information. Derived as differential information from recognition information
Video decoding method.

In an image encoding method performed by an image encoding device,
Obtaining object recognition information in an input image based on an object recognition algorithm; and
Based on the object recognition information, encoding the input image,
When the object recognition information includes first object recognition information for the first object in the input image and second object recognition information for the second object in the input image, the second object recognition information is the first object recognition information. Encoded as difference information from recognition information
Video encoding method.

A computer-readable recording medium storing a bitstream generated by an image encoding method, the image encoding method comprising:
Obtaining object recognition information in an input image based on an object recognition algorithm; and
Generating a first bitstream by encoding the input image based on the object recognition information,
When the object recognition information includes first object recognition information for the first object in the input image and second object recognition information for the second object in the input image, the second object recognition information is the first object recognition information. It is encoded as difference information with recognition information,
The object recognition information is selectively included in either the first bitstream or the second bitstream that does not include input image information.
Computer-readable recording medium.