JP6946826B2

JP6946826B2 - Video processing equipment and methods

Info

Publication number: JP6946826B2
Application number: JP2017147449A
Authority: JP
Inventors: 正宏近藤
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2021-10-06
Anticipated expiration: 2037-07-31
Also published as: JP2019029817A

Description

本発明は、符号化映像データを復号化し、復号化後の映像データが示す映像中の特定映像領域を検出し、その検出結果に基づいて特定映像領域とそれ以外の領域とで異なる画質レベルで符号化する映像処理装置及び方法に関する。 The present invention decodes the encoded video data, detects a specific video region in the video indicated by the decoded video data, and based on the detection result, the specific video region and the other regions have different image quality levels. The present invention relates to a video processing apparatus and a method for encoding.

遠隔地に配置されたカメラから出力された映像データをネットワークを介して効率良く伝送するために伝送前に映像データを符号化し、符号化後の映像データを伝送することが一般的である。例えば、映像データに対して、特定の条件に合致した映像であるか否かを解析し、映像中の特定の条件を満たす領域を検出し、その領域のみを高画質に、その他の領域を低画質に符号化することが行われる。 In order to efficiently transmit the video data output from a camera located at a remote location via a network, it is common to encode the video data before transmission and transmit the encoded video data. For example, for video data, it is analyzed whether or not the video meets specific conditions, a region in the video that satisfies a specific condition is detected, only that region has high image quality, and the other regions have low image quality. Coding to image quality is performed.

特許文献１には、そのような符号化を行う従来の映像処理装置が開示されている。かかる従来の映像処理装置は、特定の条件を人の顔としており、特定領域検出部、高画質化領域設定部、符号化部及び出力部から構成されている。特定領域検出部は、入力された映像データから所定の条件に適合する１以上の特定領域を検出し、その特定領域の検出評価値を算出する。高画質化領域設定部は、特定領域検出部で検出された１以上の特定領域の各々を、高画質化領域に設定するか否かを決定する。具体的には、高画質化領域設定部は、閾値αを保持しており、特定領域の検出評価値と当該閾値αとを比較し、検出評価値が閾値αより大きい特定領域を高画質化領域とする。符号化部は映像データを符号化する。典型的には、映像データをＤＣＴ（Discrete Cosine Transform：離散コサイン）変換し、量子化し、エントロピー符号化することによって符号化データを生成する。量子化の際に高画質化領域では第１の量子化パラメータが用いられ、その他の領域では第２の量子化パラメータが用いられ、高画質化領域ではその他の領域より低圧縮率で符号化される。出力部は符号化部で符号化された映像データを出力する。 Patent Document 1 discloses a conventional video processing apparatus that performs such coding. Such a conventional video processing apparatus has a specific condition as a human face, and is composed of a specific area detection unit, a high image quality area setting unit, a coding unit, and an output unit. The specific area detection unit detects one or more specific areas that meet a predetermined condition from the input video data, and calculates a detection evaluation value of the specific area. The high image quality area setting unit determines whether or not to set each of the one or more specific areas detected by the specific area detection unit in the high image quality area. Specifically, the high image quality area setting unit holds the threshold value α, compares the detection evaluation value of the specific area with the threshold value α, and improves the image quality of the specific area whose detection evaluation value is larger than the threshold value α. Let it be an area. The coding unit encodes the video data. Typically, video data is DCT (Discrete Cosine Transform) transformed, quantized, and entropy-coded to generate coded data. During quantization, the first quantization parameter is used in the high image quality region, the second quantization parameter is used in the other regions, and the high image quality region is encoded with a lower compression rate than the other regions. NS. The output unit outputs the video data encoded by the coding unit.

特開２０１１−８７０９０号公報Japanese Unexamined Patent Publication No. 2011-87090

上記した従来の映像処理装置では、映像源がネットワークカメラのようにネットワークに接続され、映像源から符号化された映像データがネットワークを介して送られて来る場合には、受信機能と共に復号化機能を更に備える必要がある。 In the conventional video processing device described above, when the video source is connected to a network like a network camera and the video data encoded from the video source is sent via the network, the decoding function is provided together with the reception function. Need to be further prepared.

しかしながら、受信した符号化映像データのフレーム毎に復号化、特定領域検出、及び符号化の各動作を行うことは処理能力が問題となる。特に、ＣＰＵ（中央演算装置）を用いてソフトウエアの実行によって復号化、特定領域検出、及び符号化の各動作を実現する場合にはＣＰＵの処理負荷が大きくなり、ＣＰＵの性能によっては受信した符号化映像データのフレームレート（例えば、３０フレーム／秒）ではフレーム毎に映像データを処理することができず、フレームの喪失、すなわちフレームスキップを生ずる可能性がある。フレームスキップを生じた場合には受信した符号化映像データの全てを処理することができず、処理できなかったフレームの映像データを出力することができないので、コマ飛びの再生映像となるという問題があった。 However, the processing capacity becomes a problem in performing each operation of decoding, detecting a specific area, and coding for each frame of the received coded video data. In particular, when each operation of decoding, specific area detection, and coding is realized by executing software using a CPU (Central Processing Unit), the processing load of the CPU becomes large, and it is received depending on the performance of the CPU. At the frame rate of the encoded video data (for example, 30 frames / second), the video data cannot be processed frame by frame, which may cause frame loss, that is, frame skipping. When a frame skip occurs, it is not possible to process all of the received encoded video data, and it is not possible to output the video data of the frame that could not be processed, so there is a problem that the playback video is skipped frame by frame. there were.

そこで、本発明の目的は、符号化映像データを復号化し、復号化後の映像データが示す映像中の特定画像領域を検出し、その検出結果に基づいて特定画像領域とそれ以外の領域とで異なる画質レベルで符号化する場合にフレームスキップを防止しつつ符号化映像データを出力することができる映像処理装置及び方法を提供することである。 Therefore, an object of the present invention is to decode the encoded video data, detect a specific image region in the video indicated by the decoded video data, and use the specific image region and other regions based on the detection result. It is an object of the present invention to provide an image processing device and a method capable of outputting encoded image data while preventing frame skipping when encoding at different image quality levels.

本発明の映像処理装置は、入力される第１の符号化映像データを記憶する第１の映像データ記憶部と、前記第１の符号化映像データを１フレーム毎に復号化して映像データを生成する復号化部と、当該復号化後の１フレーム分の前記映像データが示す映像中の特定映像領域を検出する特定映像領域検出部と、前記１フレーム分の前記映像データ中の前記特定映像領域の部分を第１の量子化パラメータを用いて符号化し、前記１フレーム分の前記映像データ中の前記特定映像領域以外の部分を前記第１の量子化パラメータより大なる第２の量子化パラメータを用いて符号化して第２の符号化映像データを生成する符号化部と、前記第２の符号化映像データを記憶する第２の映像データ記憶部と、前記１フレーム時間毎の送信開始タイミングで前記符号化部が前記１フレーム分の前記映像データの符号化を完了したか否かを判定する符号化判定部と、前記符号化判定部の判定結果が前記１フレーム分の前記映像データの符号化完了であるとき前記第２の映像データ記憶部に記憶された前記第２の符号化映像データを読み出して送信し、前記符号化判定部の判定結果が前記１フレーム分の前記映像データの符号化未完了であるとき前記第１の映像データ記憶部に記憶された前記第１の符号化映像データを読み出して送信する送信部と、を備えることを特徴としている。 The video processing apparatus of the present invention generates video data by decoding a first video data storage unit that stores input first coded video data and the first coded video data for each frame. Decoding unit, a specific video area detection unit that detects a specific video area in the video indicated by the video data for one frame after decoding, and the specific video area in the video data for one frame. Is encoded using the first quantization parameter, and the portion other than the specific video region in the video data for the one frame is subjected to a second quantization parameter larger than the first quantization parameter. A coding unit that encodes using the data to generate the second encoded video data, a second video data storage unit that stores the second encoded video data, and a transmission start timing for each frame time. The coding determination unit that determines whether or not the coding unit has completed the coding of the video data for the one frame, and the determination result of the coding determination unit are the codes of the video data for the one frame. When the conversion is completed, the second coded video data stored in the second video data storage unit is read and transmitted, and the determination result of the coding determination unit is the code of the video data for the one frame. It is characterized by including a transmission unit that reads out and transmits the first coded video data stored in the first video data storage unit when the conversion is not completed.

本発明の映像処理方法は、入力される第１の符号化映像データに対して映像処理を行う映像処理装置の映像処理方法であって、前記第１の符号化映像データを第１の映像データ記憶部に記憶させるステップと、前記第１の符号化映像データを１フレーム毎に復号化して映像データを生成する復号化ステップと、当該復号化後の１フレーム分の前記映像データが示す映像中の特定映像領域を検出する特定映像領域検出ステップと、前記１フレーム分の前記映像データ中の前記特定映像領域の部分を第１の量子化パラメータを用いて符号化し、前記１フレーム分の前記映像データ中の前記特定映像領域以外の部分を前記第１の量子化パラメータより大なる第２の量子化パラメータを用いて符号化して第２の符号化映像データを生成する符号化ステップと、前記第２の符号化映像データを第２の映像データ記憶部に記憶させるステップと、前記１フレーム時間毎の送信開始タイミングで前記符号化ステップが前記１フレーム分の前記映像データの符号化を完了したか否かを判定する符号化判定ステップと、前記符号化判定ステップの判定結果が前記１フレーム分の前記映像データの符号化完了であるとき前記第２の映像データ記憶部に記憶された前記第２の符号化映像データを読み出して送信し、前記符号化判定ステップの判定結果が前記１フレーム分の前記映像データの符号化未完了であるとき前記第１の映像データ記憶部に記憶された前記第１の符号化映像データを読み出して送信する送信ステップと、を含むことを特徴としている。 The video processing method of the present invention is a video processing method of a video processing device that performs video processing on the input first coded video data, and the first coded video data is used as the first video data. A step of storing in the storage unit, a decoding step of decoding the first encoded video data for each frame to generate video data, and a video indicated by the video data for one frame after the decoding. The specific video region detection step for detecting the specific video region of the above and the portion of the specific video region in the video data for the one frame are encoded by using the first quantization parameter, and the video for the one frame is encoded. A coding step of encoding a portion of the data other than the specific video region using a second quantization parameter larger than the first quantization parameter to generate a second coded video data, and the first. the second encoded video data and the step of storing the second image data storage unit, or the encoding step at the transmission start timing of every one frame time is completed the encoding of the video data of one frame When the coding determination step for determining whether or not the coding determination step and the determination result of the coding determination step are the completion of coding of the video data for the one frame, the second image data storage unit stores the second image data. When the coded video data of the above is read and transmitted, and the determination result of the coding determination step is that the coding of the video data for the one frame is not completed, the first video data storage unit stores the coded video data. It is characterized by including a transmission step of reading and transmitting the coded video data of 1.

本発明の映像処理装置及び方法によれば、復号した映像データを送信開始タイミング時に第２の符号化映像データとして符号化し終えることができない場合には第２の符号化映像データに代えて入力された第１の符号化映像データをそのまま出力するので、フレームスキップを防止することができる。 According to the video processing apparatus and method of the present invention, if the decoded video data cannot be completely encoded as the second coded video data at the transmission start timing, it is input in place of the second coded video data. Since the first encoded video data is output as it is, frame skipping can be prevented.

本発明による映像処理装置の接続環境を示す図である。It is a figure which shows the connection environment of the image processing apparatus by this invention. 本発明による映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image processing apparatus by this invention. 図２の装置の動作を示すフローチャートである。It is a flowchart which shows the operation of the apparatus of FIG. 図２の装置の各部の動作タイミングを示すタイミングチャートである。It is a timing chart which shows the operation timing of each part of the apparatus of FIG.

以下、本発明の実施例を、図面を参照しつつ詳細に説明する。 Hereinafter, examples of the present invention will be described in detail with reference to the drawings.

図１は本発明による映像処理装置１０の接続環境を示している。この映像処理装置１０は、インターネットやＬＡＮ（ローカルエリアネットワーク）等のネットワーク１に接続されている。ネットワーク１は図１では２つの独立したネットワークとして示されているが、単一のネットワークでも良い。ネットワーク１にはネットワークカメラ２と映像サーバ３とが接続されているとする。ネットワークカメラ２はカメラ本体（図示せず）を内蔵してカメラによって撮影された映像を示す符号化映像データを含むパケットをネットワーク１に送出する。図１の映像処理装置１０ではネットワークカメラ２及び映像サーバ３が通信相手とされている。また、特定映像として人の顔が適用された例である。なお、ネットワークカメラ２は撮影した映像を示す映像データを画面全領域に亘り高画質レベルで符号化して符号化映像データを作り出し、それをパケットで１フレームの時間毎に送信するとする。ネットワークカメラ２では例えば、横１９２０画素×縦１０８０画素の映像データが生成され、符号化映像データはＩＴＵ−ＴＨ．２６４標準（以下、単にＨ．２６４と記す）に従った符号化データである。 FIG. 1 shows a connection environment of the video processing apparatus 10 according to the present invention. The video processing device 10 is connected to a network 1 such as the Internet or a LAN (local area network). Network 1 is shown as two independent networks in FIG. 1, but may be a single network. It is assumed that the network camera 2 and the video server 3 are connected to the network 1. The network camera 2 has a built-in camera body (not shown) and sends a packet including coded video data indicating a video captured by the camera to the network 1. In the video processing device 10 of FIG. 1, the network camera 2 and the video server 3 are communication partners. It is also an example in which a human face is applied as a specific image. It is assumed that the network camera 2 encodes video data indicating a captured video at a high image quality level over the entire screen area to produce encoded video data, and transmits the encoded video data in packets every time of one frame. In the network camera 2, for example, video data of 1920 horizontal pixels × 1080 vertical pixels is generated, and the coded video data is ITU-TH. It is the coded data according to the 264 standard (hereinafter, simply referred to as H.264).

映像処理装置１０は、図２に示すように、受信インターフェース（ＩＦ）部１１、復号化部１２、特定映像領域検出部１３、符号化部１４、符号化メモリ１５、メモリ判定部１６、バスパスメモリ１７、及び送信ＩＦ部１８を備えている。 As shown in FIG. 2, the video processing device 10 includes a reception interface (IF) unit 11, a decoding unit 12, a specific video area detection unit 13, a coding unit 14, a coding memory 15, a memory determination unit 16, and a bus path memory. 17 and a transmission IF unit 18 are provided.

受信ＩＦ部１１は、ネットワーク１に接続され、通信相手のネットワークカメラ２から送られて来たパケットＡ０を受信し、受信パケットＡ０から符号化映像データＡ１（第１の符号化映像データ）を取り出して出力する。パケットＡ０は例えば、ネットワーク１に対応したＴＣＰ／ＩＰ等のプロトコルに従って送受信される。なお、後述のパケットＡ８も同様である。受信ＩＦ部１１の出力には符号化部１２及びバイパスメモリ１７が接続されている。 The reception IF unit 11 is connected to the network 1, receives the packet A0 sent from the network camera 2 of the communication partner, and extracts the coded video data A1 (first coded video data) from the received packet A0. And output. Packet A0 is transmitted and received according to a protocol such as TCP / IP corresponding to network 1, for example. The same applies to the packet A8 described later. The coding unit 12 and the bypass memory 17 are connected to the output of the receiving IF unit 11.

復号化部１２は受信ＩＦ部１１から出力される符号化映像データＡ１を入力し、それを復号処理して映像データＡ２を出力する。復号化部１２の出力には特定映像領域検出部１３が接続されている。 The decoding unit 12 inputs the coded video data A1 output from the receiving IF unit 11, decodes it, and outputs the video data A2. A specific video area detection unit 13 is connected to the output of the decoding unit 12.

特定映像領域検出部１３は復号化部１２から出力される映像データＡ２を入力し、映像データＡ２が示す映像中の特定映像領域を検出し、特定映像領域を示す特定映像領域信号Ｂ３を映像データＡ３と共に出力する。特定映像領域は人の顔部分の領域である。顔検出方法は特に限定されないが、例えば、特許文献１に開示されたように、眼、鼻、口の特徴を含む部分が所定の範囲内でかつ所定の位置関係で存在することを検出する公知方法を用いても良い。特定映像領域検出部１３の出力には符号化部１４及びメモリ判定部１６が接続されている。なお、特定映像領域検出部１３と符号化部１４との間に特許文献１に示された如き高画質化領域設定部を配置しても良い。 The specific video area detection unit 13 inputs the video data A2 output from the decoding unit 12, detects the specific video area in the video indicated by the video data A2, and outputs the specific video area signal B3 indicating the specific video area to the video data. Output with A3. The specific video area is the area of the human face. The face detection method is not particularly limited, but for example, as disclosed in Patent Document 1, it is known to detect that a portion including features of eyes, nose, and mouth exists within a predetermined range and in a predetermined positional relationship. The method may be used. A coding unit 14 and a memory determination unit 16 are connected to the output of the specific video area detection unit 13. A high image quality region setting unit as shown in Patent Document 1 may be arranged between the specific video region detection unit 13 and the coding unit 14.

符号化部１４は特定映像領域検出部１３から出力される映像データＡ３及び特定映像領域信号Ｂ３を入力し、特定映像領域信号Ｂ３に応じて映像データＡ３の特定映像領域に相当する部分のみを高画質用の第１の量子化パラメータを用いて符号化し、それ以外の領域に相当する部分を低画質用の第２の量子化パラメータを用いて符号化し、符号化した映像データＡ４（第２の符号化映像データ）を生成する。第２の量子化パラメータは第１の量子化パラメータより大きい。また、符号化部１４は１フレーム分の映像符号化が完了する毎に符号化完了信号Ｂ４を出力する。符号化部１４の出力には符号化メモリ１５及びメモリ判定部１６が接続されている。 The coding unit 14 inputs the video data A3 and the specific video area signal B3 output from the specific video area detection unit 13, and raises only the portion corresponding to the specific video area of the video data A3 according to the specific video area signal B3. The video data A4 (second) encoded by encoding using the first quantization parameter for image quality and encoding the portion corresponding to the other region using the second quantization parameter for low image quality. Encoded video data) is generated. The second quantization parameter is larger than the first quantization parameter. Further, the coding unit 14 outputs a coding completion signal B4 every time the video coding for one frame is completed. A coding memory 15 and a memory determination unit 16 are connected to the output of the coding unit 14.

符号化メモリ１５及びバイパスメモリ１７は第１の映像データ記憶部及び第２の映像データ記憶部に各々相当し、内部にメモリエリアを有し、メモリエリアに対するデータの書き込み及び読み出し制御機能を有している。符号化メモリ１５は、符号化部１４から出力される符号化映像データＡ４を入力し、内部のメモリエリアに符号化映像データＡ４を保存する。また、符号化メモリ１５は読み出し時には保存した符号化映像データＡ４を読み出してそれをＡ５として出力する。符号化メモリ１５の読み出し出力には送信ＩＦ部１８が接続されている。 The coded memory 15 and the bypass memory 17 correspond to the first video data storage unit and the second video data storage unit, respectively, have a memory area inside, and have a data write / read control function for the memory area. ing. The coded memory 15 inputs the coded video data A4 output from the coding unit 14, and stores the coded video data A4 in the internal memory area. Further, the coding memory 15 reads the stored coded video data A4 at the time of reading and outputs it as A5. A transmission IF unit 18 is connected to the read output of the coding memory 15.

バイパスメモリ１７は、受信ＩＦ部１１から出力される符号化映像データＡ１を入力し、その内部のメモリエリアに符号化映像データＡ１を保存する。また、バイパスメモリ１７は読み出し時には保存した符号化映像データＡ１を読み出してそれをＡ７として出力する。バイパスメモリ１７の読み出し出力には送信ＩＦ部１８が接続されている。 The bypass memory 17 inputs the coded video data A1 output from the reception IF unit 11, and stores the coded video data A1 in a memory area inside the coded video data A1. Further, the bypass memory 17 reads the stored coded video data A1 at the time of reading and outputs it as A7. A transmission IF unit 18 is connected to the read output of the bypass memory 17.

送信ＩＦ部１８は、送信部に相当し、メモリ判定部１６から供給されるメモリ選択信号Ｂ６に応じて１フレーム毎に符号化メモリ１５とバイパスメモリ１７とのいずれか１つから読み出された符号化映像データＡ５又はＡ７を入力し、それをパケット化してネットワーク１に送信する。また、送信ＩＦ部１８は、１フレーム時間毎のパケットの送信開始時に送信開始信号Ｂ８を生成してそれをメモリ判定部１６に供給する。 The transmission IF unit 18 corresponds to the transmission unit, and is read from any one of the coding memory 15 and the bypass memory 17 for each frame according to the memory selection signal B6 supplied from the memory determination unit 16. The encoded video data A5 or A7 is input, packetized and transmitted to the network 1. Further, the transmission IF unit 18 generates a transmission start signal B8 at the start of transmitting a packet every frame time and supplies it to the memory determination unit 16.

メモリ判定部１６は特定映像領域検出部１３から供給される特定映像領域信号Ｂ３と、符号化部１４から供給される符号化完了信号Ｂ４と、送信ＩＦ部１８から供給される送信開始信号Ｂ８とを入力する。メモリ判定部１６は、送信開始信号Ｂ８の供給タイミングで１フレーム時間前までの期間における特定映像領域信号Ｂ３の有無及び符号化完了信号Ｂ４の有無に応じてメモリ選択信号Ｂ６を生成する。メモリ選択信号Ｂ６は送信ＩＦ部１８に供給される。 The memory determination unit 16 includes a specific video area signal B3 supplied from the specific video area detection unit 13, a coding completion signal B4 supplied from the coding unit 14, and a transmission start signal B8 supplied from the transmission IF unit 18. Enter. The memory determination unit 16 generates a memory selection signal B6 according to the presence / absence of the specific video area signal B3 and the presence / absence of the coding completion signal B4 in the period up to one frame time before the supply timing of the transmission start signal B8. The memory selection signal B6 is supplied to the transmission IF unit 18.

復号化部１２、特定映像領域検出部１３、符号化部１４、及びメモリ判定部１６は１つ以上のＣＰＵがソフトウエアを実行することによって実現することができる。また、そのＣＰＵが本装置全体の動作タイミングを制御しても良い。 The decoding unit 12, the specific video area detection unit 13, the coding unit 14, and the memory determination unit 16 can be realized by executing software by one or more CPUs. Further, the CPU may control the operation timing of the entire apparatus.

次に、上記した構成の映像処理装置１０の動作について図３及び図４を用いて説明する。図３は図２の装置の動作を示すフローチャートであり、図４は図２の装置の各部の動作タイミングを示すタイミングチャートである。 Next, the operation of the video processing device 10 having the above configuration will be described with reference to FIGS. 3 and 4. FIG. 3 is a flowchart showing the operation of the device of FIG. 2, and FIG. 4 is a timing chart showing the operation timing of each part of the device of FIG.

先ず、受信ＩＦ部１１では、１フレームの時間毎にネットワークカメラ２から送信されたパケットＡ０を受信し、受信パケットＡ０から符号化映像データＡ１を取り出す（ステップＳ１０１）。符号化映像データＡ１は上記したように画面全領域に亘り高画質レベル、すなわちＨ．２６４標準で符号化された映像データである。符号化映像データＡ１は復号化部１２に供給され、そこにおいて復号化され、復号化の結果として１フレーム分（横１９２０画素×縦１０８０画素）の映像データＡ２が生成される（ステップＳ１０２）。復号化部１２による復号化の処理速度はネットワークカメラ２が符号化したフレーム速度（通常、１秒間３０フレーム）よりも早い。なお、これよりも遅い処理速度の場合には、フレームスキップが生じる。例えば、１秒間に１５フレームしか復号処理できない場合には、１秒当たり通常映像の半分のフレーム数となり、コマ送りのような動画像となる。 First, the reception IF unit 11 receives the packet A0 transmitted from the network camera 2 every time of one frame, and extracts the encoded video data A1 from the received packet A0 (step S101). As described above, the coded video data A1 has a high image quality level over the entire screen area, that is, H. It is video data encoded by the 264 standard. The coded video data A1 is supplied to the decoding unit 12, where it is decoded, and as a result of the decoding, one frame of video data A2 (width 1920 pixels × height 1080 pixels) is generated (step S102). The decoding processing speed by the decoding unit 12 is faster than the frame speed encoded by the network camera 2 (usually 30 frames per second). If the processing speed is slower than this, frame skipping will occur. For example, if only 15 frames can be decoded per second, the number of frames per second is half that of the normal video, resulting in a moving image such as frame advance.

また、符号化映像データＡ１はバイパスメモリ１７に供給され、そのメモリエリアに書き込まれる（ステップＳ１０３）。 Further, the coded video data A1 is supplied to the bypass memory 17 and written in the memory area (step S103).

復号化後の映像データＡ２は特定映像領域検出部１３に供給される。特定映像領域検出部１３は映像データＡ２が示す映像フレーム中の特定映像領域である顔部分を検出する（ステップＳ１０４）。また、特定映像領域検出部１３は１フレーム分の特定映像領域検出動作後、そのフレーム中に特定映像領域が存在したか否かを判別し（ステップＳ１０５）、特定映像領域が存在した場合にはその特定映像領域を示す特定映像領域信号Ｂ３を符号化部１４及びメモリ判定部１６に出力する（ステップＳ１０６）。なお、フレーム中に複数の特定映像領域が存在する場合にはそれら全てを検出することが行われる。また、特定映像領域が検出されなかったフレームの場合には特定映像領域の不検出を示す特定映像領域信号を符号化部１４及びメモリ判定部１６に出力しても良い。 The decoded video data A2 is supplied to the specific video area detection unit 13. The specific video area detection unit 13 detects a face portion that is a specific video area in the video frame indicated by the video data A2 (step S104). Further, the specific video area detection unit 13 determines whether or not the specific video area exists in the frame after the specific video area detection operation for one frame (step S105), and if the specific video area exists, the specific video area detection unit 13 determines whether or not the specific video area exists in the frame. The specific video area signal B3 indicating the specific video area is output to the coding unit 14 and the memory determination unit 16 (step S106). If a plurality of specific video areas exist in the frame, all of them are detected. Further, in the case of a frame in which the specific video area is not detected, a specific video area signal indicating that the specific video area is not detected may be output to the coding unit 14 and the memory determination unit 16.

符号化部１４は特定映像領域検出部１３から出力された映像データ及び特定映像領域信号Ｂ３を受け入れ、特定映像領域信号Ｂ３に応じて映像データの特定映像領域に相当する部分のみを高画質で符号化し、それ以外の領域に相当する部分を低画質で符号化する（ステップＳ１０７）。特定映像領域が検出されなかった場合には、映像データは全て低画質で符号化される。符号化部１４は符号化を完了すると、符号化した映像データＡ４を符号化メモリ１５に出力すると共に符号化完了信号Ｂ４をメモリ判定部１６に出力する。符号化映像データＡ４は符号化メモリ１５に書き込まれる（ステップＳ１０８）。 The coding unit 14 receives the video data and the specific video area signal B3 output from the specific video area detection unit 13, and encodes only the portion of the video data corresponding to the specific video area with high image quality according to the specific video area signal B3. The portion corresponding to the other region is encoded with low image quality (step S107). If the specific video area is not detected, all the video data is encoded with low image quality. When the coding unit 14 completes the coding, the coding unit 14 outputs the coded video data A4 to the coding memory 15 and outputs the coding completion signal B4 to the memory determination unit 16. The coded video data A4 is written to the coded memory 15 (step S108).

送信ＩＦ部１８は、１フレーム時間が経過したか否かを判別し（ステップＳ１０９）、１フレーム時間が経過したならば送信開始信号Ｂ８を生成する（ステップＳ１１０）。送信開始信号Ｂ８はメモリ判定部１６に供給される。 The transmission IF unit 18 determines whether or not one frame time has elapsed (step S109), and if one frame time has elapsed, generates a transmission start signal B8 (step S110). The transmission start signal B8 is supplied to the memory determination unit 16.

メモリ判定部１６は、送信開始信号Ｂ８に応答して符号化完了信号Ｂ４が供給されているか否かを判別する（ステップＳ１１１）。符号化完了信号Ｂ４が供給されているならば、符号化メモリ１５の選択を示すメモリ選択信号Ｂ６を生成し（ステップＳ１１２）、一方、符号化完了信号Ｂ４が供給されていないならば、バイパスメモリ１７の選択を示すメモリ選択信号Ｂ６を生成する（ステップＳ１１３）。 The memory determination unit 16 determines whether or not the coding completion signal B4 is supplied in response to the transmission start signal B8 (step S111). If the coding completion signal B4 is supplied, a memory selection signal B6 indicating the selection of the coding memory 15 is generated (step S112), while if the coding completion signal B4 is not supplied, the bypass memory is generated. A memory selection signal B6 indicating the selection of 17 is generated (step S113).

送信ＩＦ部１８は、メモリ判定部１６からメモリ選択信号Ｂ６が供給されると、そのメモリ選択信号Ｂ６に応答して読み出すべきメモリを設定する。送信ＩＦ部１８は、メモリ選択信号Ｂ６が符号化メモリ１５の選択を示すならば、符号化メモリ１５に対して復号化映像データＡ４の読み出しを指示し、これにより符号化メモリ１５のメモリエリアから符号化映像データＡ４がＡ５として読み出される（ステップＳ１１４）。一方、メモリ選択信号Ｂ６がバイパスメモリ１７の選択を示すならば、バイパスメモリ１７に対して復号化映像データＡ１の読み出しを指示し、これによりバイパスメモリ１７のメモリエリアから符号化映像データＡ１がＡ７として読み出される（ステップＳ１１５）。 When the memory selection signal B6 is supplied from the memory determination unit 16, the transmission IF unit 18 sets the memory to be read in response to the memory selection signal B6. If the memory selection signal B6 indicates selection of the coded memory 15, the transmission IF unit 18 instructs the coded memory 15 to read the decoded video data A4, thereby causing the coded memory 15 to be read from the memory area of the coded memory 15. The encoded video data A4 is read out as A5 (step S114). On the other hand, if the memory selection signal B6 indicates the selection of the bypass memory 17, the bypass memory 17 is instructed to read the decoded video data A1, whereby the encoded video data A1 is A7 from the memory area of the bypass memory 17. Is read as (step S115).

送信ＩＦ部１８は、読み出された符号化映像データＡ５又はＡ７をパケット化してそのパケットＡ８をネットワーク１に送出する（ステップＳ１１６）。送出された符号化映像データを有するパケットＡ８はネットワーク１を介して例えば、映像サーバ３に到達する。 The transmission IF unit 18 packetizes the read encoded video data A5 or A7 and sends the packet A8 to the network 1 (step S116). The packet A8 having the transmitted encoded video data reaches, for example, the video server 3 via the network 1.

図４のタイミングチャートには、受信ＩＦ部１１におけるパケットの受信開始からの１フレーム時間内に符号化部１４による符号化が完了した場合と、符号化が完了しなかった場合とが例示されている。受信ＩＦ部１１では受信パケットＡ０により１フレーム時間毎に符号化映像データＡ１(D1)、Ａ１(D2)、Ａ１(D3)、Ａ１(D4)、Ａ１(D5)、．．．が順次取り出されるとする。 The timing chart of FIG. 4 illustrates a case where the coding by the coding unit 14 is completed within one frame time from the start of receiving the packet in the receiving IF unit 11 and a case where the coding is not completed. There is. In the receiving IF unit 11, the video data A1 (D1), A1 (D2), A1 (D3), A1 (D4), A1 (D5), encoded by the received packet A0 every frame time. .. .. Are sequentially taken out.

図４において受信符号化映像データＡ１(D1)について説明すると、符号化映像データＡ１(D1)はバイパスメモリ１７にそのまま書き込まれる。バイパスメモリ１７のＡ１(D1)の記憶期間は次の符号化映像データＡ１(D2)が供給されるまでである。また、符号化映像データＡ１(D1)は復号化部１２において復号化時間Ｔ１に亘って復号され、映像データＡ２(D1)となる。映像データＡ２(D1)は特定映像領域検出部１３において検出時間Ｔ２に亘って特定映像領域検出処理される。特定映像領域検出部１３から出力される映像データＡ３(D1)は符号化部１４において符号化時間Ｔ３に亘って符号化され、符号化映像データＡ４(D1)となる。符号化映像データＡ４(D1)は符号化メモリ１５に書き込まれる。送信ＩＦ部１８から送信開始信号が生成される時点ｔ１以前に符号化時間Ｔ３が終了しているので、符号化メモリ１５に書き込まれた符号化映像データＡ４(D1)が時点ｔ１から読み出され、それが符号化映像データＡ５(D1)となる。符号化映像データＡ５(D1)は送信ＩＦ部１８においてパケット化されてパケットＡ８となる。 Explaining the received coded video data A1 (D1) in FIG. 4, the coded video data A1 (D1) is written to the bypass memory 17 as it is. The storage period of A1 (D1) of the bypass memory 17 is until the next encoded video data A1 (D2) is supplied. Further, the coded video data A1 (D1) is decoded by the decoding unit 12 over the decoding time T1 to become the video data A2 (D1). The video data A2 (D1) is processed by the specific video area detection unit 13 for the specific video area detection process over the detection time T2. The video data A3 (D1) output from the specific video region detection unit 13 is encoded by the coding unit 14 over the coding time T3 to become the coded video data A4 (D1). The coded video data A4 (D1) is written in the coded memory 15. Since the coding time T3 ends before the time point t1 when the transmission start signal is generated from the transmission IF unit 18, the coded video data A4 (D1) written in the coding memory 15 is read from the time point t1. , That becomes the encoded video data A5 (D1). The coded video data A5 (D1) is packetized in the transmission IF unit 18 to become the packet A8.

符号化映像データＡ４(D1)の送信時間は次の符号化映像データＡ１(D2)のパケットからの取り出しが終了する前に終了する。すなわち、符号化映像データＡ１(D1)の場合には、復号化時間Ｔ１の開始から符号化映像データＡ４(D1)の送信終了までの期間が１フレーム時間内に収まるように復号化、特定映像領域検出、符号化及びメモリ読み出しの各動作が行われたことになる。これにより、符号化映像データＡ１(D1)については符号化映像データＡ４(D1)が完全に生成され、フレームスキップを起こすことなく、その符号化映像データＡ４(D1)を含むパケットＡ８が送信され得る。このことは図４に示した符号化映像データＡ１(D2)及びＡ１(D4)についても同様である。 The transmission time of the coded video data A4 (D1) ends before the extraction of the next coded video data A1 (D2) from the packet is completed. That is, in the case of the coded video data A1 (D1), the decoded and specified video is decoded so that the period from the start of the decoding time T1 to the end of the transmission of the coded video data A4 (D1) is within one frame time. Each operation of area detection, coding, and memory reading has been performed. As a result, the coded video data A4 (D1) is completely generated for the coded video data A1 (D1), and the packet A8 including the coded video data A4 (D1) is transmitted without causing frame skipping. obtain. This also applies to the coded video data A1 (D2) and A1 (D4) shown in FIG.

ところが、符号化映像データＡ１(D3)の場合には、特定映像領域検出部１３から出力される映像データＡ３(D3)に対する符号化部１４における符号化時間がＴ３より大なる時間Ｔ３’となったため、送信ＩＦ部１８から送信開始信号が生成される時点ｔ３以前に符号化時間Ｔ３’が終了していない。それ故、バイパスメモリ１７の選択を示すメモリ選択信号Ｂ６が生成され、バイパスメモリ１７に書き込まれた符号化映像データＡ１(D3)が時点ｔ３から読み出され、それが符号化映像データＡ７(D3)となって送信ＩＦ部１８に供給される。符号化映像データＡ７(D3)は送信ＩＦ部１８においてパケット化されてパケットＡ８となる。符号化映像データＡ７(D3)を有するパケットＡ８の送信は次の符号化映像データＡ１(D4)のパケットからの取り出しが終了する前に終了するので、例えば、ＣＰＵの性能のために符号化時間がＴ３’のように長く掛かってもフレームスキップを防止することができる。 However, in the case of the coded video data A1 (D3), the coding time in the coding unit 14 for the video data A3 (D3) output from the specific video area detection unit 13 is T3', which is longer than T3. Therefore, the coding time T3'has not ended before the time t3 when the transmission start signal is generated from the transmission IF unit 18. Therefore, a memory selection signal B6 indicating the selection of the bypass memory 17 is generated, and the coded video data A1 (D3) written in the bypass memory 17 is read from the time point t3, which is the coded video data A7 (D3). ) And is supplied to the transmission IF unit 18. The coded video data A7 (D3) is packetized in the transmission IF unit 18 to become the packet A8. Since the transmission of the packet A8 having the coded video data A7 (D3) ends before the extraction of the next coded video data A1 (D4) from the packet is completed, for example, the coding time for the performance of the CPU Frame skip can be prevented even if it takes a long time like T3'.

上記した実施例においては、符号化メモリ１５に代えてバイパスメモリ１７を選択する条件として１フレーム分の映像データの符号化を完了していないことが設定されている。これにより、全フレームのスキップを防止することができる。しかしながら、バイパスメモリ１７の選択条件は本発明ではこれに限定されない。例えば、符号化部１５が１フレーム分の映像データの符号化を完了していないことに加えて、特定映像領域検出部１３が顔を検出していることをバイパスメモリ１７の選択条件とすることができる。また、符号化部１５が１フレーム分の映像データの符号化を完了していないことに加えて、特定映像領域検出部１３が複数の顔（Ｎ以上の顔、ただし、Ｎは２以上の整数）を検出していることをバイパスメモリ１７の選択条件とすることができる。前者の場合には顔ありフレームではフレームスキップを防止することができ、後者の場合には複数の顔ありフレームではフレームスキップを防止することができる。ネットワークカメラ２が監視カメラとして利用されている場合に、再生映像から人の顔を有するフレームの映像を必ず確認することができるという利点がある。 In the above-described embodiment, it is set that the coding of the video data for one frame is not completed as a condition for selecting the bypass memory 17 instead of the coding memory 15. As a result, skipping of all frames can be prevented. However, the selection condition of the bypass memory 17 is not limited to this in the present invention. For example, the selection condition of the bypass memory 17 is that the coding unit 15 has not completed the coding of the video data for one frame and the specific video area detection unit 13 has detected the face. Can be done. Further, in addition to the fact that the coding unit 15 has not completed the coding of the video data for one frame, the specific video area detection unit 13 has a plurality of faces (faces of N or more, where N is an integer of 2 or more). ) Can be used as a selection condition for the bypass memory 17. In the former case, frame skipping can be prevented in a frame with faces, and in the latter case, frame skipping can be prevented in a frame with a plurality of faces. When the network camera 2 is used as a surveillance camera, there is an advantage that the image of the frame having a human face can always be confirmed from the reproduced image.

なお、本発明は符号化映像データを復号化し、復号化後の映像データが示す映像中の特定映像領域を検出し、その検出結果に基づいて特定映像領域とそれ以外の領域とで異なる画質レベルで符号化するが、符号化映像データを復号化した後に、再度符号化する装置、例えば、セッションボーダコントローラ装置に適用することができる。 The present invention decodes the encoded video data, detects a specific video region in the video indicated by the decoded video data, and based on the detection result, the image quality level differs between the specific video region and the other regions. However, it can be applied to a device that encodes the encoded video data again after decoding, for example, a session border controller device.

上記した実施例においては、ネットワーク１に接続されたネットワークカメラ２から送信された符号化映像データを受信しているが、本発明はこれに限定されない。例えば、サーバから配信される符号化映像データを受信しても良い。また、送信先としては映像サーバ３に限定されず、ネットワーク１に接続された映像モニタ装置やパソコン等の他の装置でも良い。 In the above-described embodiment, the encoded video data transmitted from the network camera 2 connected to the network 1 is received, but the present invention is not limited thereto. For example, the coded video data distributed from the server may be received. Further, the transmission destination is not limited to the video server 3, and other devices such as a video monitor device and a personal computer connected to the network 1 may be used.

また、上記した実施例においては、映像中の特定映像領域として人の顔の領域を示したが、本発明はこれに限定されない。例えば、人間以外の熊、猿等の動物、或いは車両等の移動手段であっても良い。 Further, in the above-described embodiment, the region of the human face is shown as the specific video region in the video, but the present invention is not limited to this. For example, it may be an animal other than a human, such as a bear or a monkey, or a means of transportation such as a vehicle.

更に、上記した実施例においては、パケットによって符号化映像データを送受信しているが、パケットを使用することなく符号化映像データを送受信しても良い。 Further, in the above-described embodiment, the encoded video data is transmitted / received by the packet, but the encoded video data may be transmitted / received without using the packet.

１ネットワーク
２ネットワークカメラ
３映像サーバ
１０映像処理装置
１１受信ＩＦ部
１２符号化部
１３特定映像領域検出部
１４復号化部
１５符号化メモリ
１６メモリ判定部
１７バスパスメモリ
１８送信ＩＦ部 1 Network 2 Network camera 3 Video server 10 Video processing device 11 Reception IF unit 12 Coding unit 13 Specific video area detection unit 14 Decoding unit 15 Coding memory 16 Memory judgment unit 17 Bus path memory 18 Transmission IF unit

Claims

A first video data storage unit that stores the input first encoded video data,
A decoding unit that generates video data by decoding the first coded video data frame by frame, and
A specific video area detection unit that detects a specific video area in the video indicated by the video data for one frame after the decoding, and a specific video area detection unit.
The portion of the specific video region in the video data for one frame is encoded using the first quantization parameter, and the portion other than the specific video region in the video data for one frame is the first. A coding unit that encodes using a second quantization parameter that is larger than the quantization parameter of, and generates a second coded video data.
A second video data storage unit that stores the second coded video data, and
A coding determination unit that determines whether or not the coding unit has completed coding of the video data for one frame at the transmission start timing for each frame time.
When the determination result of the coding determination unit is the completion of coding of the video data for the one frame, the second coded video data stored in the second video data storage unit is read out and transmitted. When the determination result of the coding determination unit is that the coding of the video data for the one frame is not completed, the first coded video data stored in the first video data storage unit is read and transmitted. A video processing device including a transmitter.

The video processing apparatus according to claim 1, wherein the specific video region is a region of a human face.

In the transmission unit, when the determination result of the coding determination unit is that the coding of the video data for the one frame is not completed, and the specific video area detection unit detects the area of the person's face. The video processing apparatus according to claim 2, wherein the first encoded video data stored in the first video data storage unit is read and transmitted only occasionally.

The transmission unit detects a plurality of regions of the person's face when the determination result of the coding determination unit is that the coding of the video data for the one frame is not completed and the specific video region detection unit detects a plurality of regions of the person's face. The video processing apparatus according to claim 2, wherein the first encoded video data stored in the first video data storage unit is read and transmitted only when the data is stored.

It is a video processing method of a video processing device that performs video processing on the input first encoded video data.
A step of storing the first coded video data in the first video data storage unit, and
A decoding step of generating video data by decoding the first coded video data frame by frame, and
A specific video area detection step for detecting a specific video area in the video indicated by the video data for one frame after the decoding, and a specific video area detection step.
The portion of the specific video region in the video data for one frame is encoded using the first quantization parameter, and the portion other than the specific video region in the video data for one frame is the first. A coding step that encodes with a second quantization parameter that is greater than the quantization parameter of to generate the second coded video data.
A step of storing the second coded video data in the second video data storage unit, and
An encoding determining whether the coding steps are completed the encoding of the video data of one frame at the transmission start timing of every one frame time,
When the determination result of the coding determination step is the completion of coding of the video data for the one frame, the second coded video data stored in the second video data storage unit is read and transmitted. When the determination result of the coding determination step is that the coding of the video data for the one frame is not completed, the first coded video data stored in the first video data storage unit is read and transmitted. A video processing method comprising a transmission step.