JP4606955B2

JP4606955B2 - Video recognition system, video recognition method, video correction system, and video correction method

Info

Publication number: JP4606955B2
Application number: JP2005199418A
Authority: JP
Inventors: 錫哲奇; 嘉莉 ▲超▼; 海兵任; ▲徳▼ ▲牟▼ 王
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2004-07-07
Filing date: 2005-07-07
Publication date: 2011-01-05
Anticipated expiration: 2025-07-07
Also published as: JP2006024219A

Description

本発明は、保存された映像とマッチングされる映像との比較を通じた映像認識に係り、さらに詳細には、映像のサブ領域の明るさ、表情および／または他の条件を補完するように、比較および／または正規化することによって、比較の正確性を改善した顔認識に関する。 The present invention relates to video recognition through comparison of stored video with matched video, and more particularly, comparison to complement brightness, facial expression and / or other conditions of video sub-regions. And / or face recognition with improved comparison accuracy by normalization.

顔認識は、生体認識の応用分野における重要な課題である。特に、自動顔認識は、紅彩または指紋認識技術と比較されるものとして関心が高い。そのような顔認識の技術は、セキュリティ目的のために特に関心が高まっている。例えば、ここ１、２年の間に、多くの国家で生体認識パスポートのために、自動顔認識技術が必須分野として選択された。さらに、顔認識の技術は、犯罪の予防、国家セキュリティおよび個人セキュリティを目的とする他の分野でも効果的であると考えられる。また、顔認識の技術は、パターン認識およびコンピュータビジョンの開発を促進する側面がある。 Face recognition is an important issue in the application field of biometric recognition. In particular, automatic face recognition is of great interest as compared to red or fingerprint recognition technology. Such face recognition techniques are of particular interest for security purposes. For example, in the last one or two years, automatic face recognition technology has been selected as an essential field for biometric recognition passports in many countries. In addition, facial recognition technology is considered effective in other areas aimed at crime prevention, national security and personal security. In addition, face recognition technology has an aspect of promoting the development of pattern recognition and computer vision.

従来の自動顔認識の技術における問題点は、認識の精度が低いため、監視者（inspector）（すなわち、ユーザー）が顔認識の補助をする必要性が大きいということである。特に、顔が剛体ではなく、多様な表情が可能であるため、顔認識を行うために、顔テクスチャ、顔の３次元幾何を計算する必要があり、さらに、ガラスや髪の毛による特徴の遮断、および複雑な照明環境を計算する必要がある。そのような要素は、顔認識の精度を低下させる。
最近、従来の顔認識アルゴリズムおよび技術を比較して評価する研究が行われた。そのような研究としては、非特許文献１および非特許文献２がある。
これらの研究によると、従来のアルゴリズムが顔表情、照明、姿勢および遮蔽の変化に強くないということを示している。 The problem with the conventional automatic face recognition technology is that the accuracy of recognition is low, so that an inspector (ie, a user) needs to assist face recognition. In particular, since the face is not a rigid body and various expressions are possible, it is necessary to calculate the face texture and the three-dimensional geometry of the face in order to perform face recognition. It is necessary to calculate a complex lighting environment. Such an element reduces the accuracy of face recognition.
Recently, research has been conducted to compare and evaluate traditional face recognition algorithms and techniques. Such research includes Non-Patent Document 1 and Non-Patent Document 2.
These studies show that conventional algorithms are not resistant to changes in facial expression, lighting, posture and shielding.

また、顔認識を行うに当って、特徴の選択が適切に行われることが重要である。適切な特徴が選択されれば、分類が相対的に簡単な作業となるためである。例えば、適切な特徴が選択されれば、ユークリッド距離に基づくＫ−ｍｅａｎｓ法またはＫ近傍法（K Nearest Neighbor：ＫＮＮ）のような簡単な分類技術であっても良好な結果が得られる。しかし、そのような方法は、適切な特徴サブ空間で同じクラスに属するサンプルがガウス分布を有し、他のクラス間の重畳が少なければならないという仮定のもとで行われる。しかし、その方法を顔認識に試験して適用するには、多大な作業が必要であるため、その方法を行うために必要な適切な特徴サブ空間をうまく選択することができない。例えば、顔の表現および特徴の選択のための適切な特徴サブ空間を、主成分分析（Principal Components Analysis：ＰＣＡ）、線型判別分析（Linear Discriminant Analysis：ＬＤＡ）またはＬＰＰ（Locality Preserving Projection）法で決定することは難しい。
なお、前記の主成分分析、線型判別分析およびＬＰＰ方法は、非特許文献３、非特許文献４および非特許文献５に説明されている。 In addition, when performing face recognition, it is important that features are selected appropriately. This is because classification is a relatively simple task if appropriate features are selected. For example, if an appropriate feature is selected, even a simple classification technique such as a K-means method or a K-nearest neighbor (K Nearest Neighbor: KNN) based on the Euclidean distance can provide good results. However, such a method is performed under the assumption that samples belonging to the same class in the appropriate feature subspace have a Gaussian distribution and there must be little overlap between other classes. However, testing and applying the method to face recognition requires a great deal of work, and it is not possible to successfully select the appropriate feature subspace necessary to perform the method. For example, an appropriate feature subspace for facial expression and feature selection is determined by Principal Components Analysis (PCA), Linear Discriminant Analysis (LDA), or LPP (Locality Preserving Projection) method Difficult to do.
The principal component analysis, linear discriminant analysis, and LPP method are described in Non-Patent Document 3, Non-Patent Document 4, and Non-Patent Document 5.

顔認識における特徴の選択が難しい理由のうち１つは、顔映像が非線型多様体（すなわち、非線型面または非線型空間）からなることである。複雑な顔多様体のために、映像間の対応を決定するために使用される従来のユークリッド距離（すなわち、２つの点間の直線距離）は、顔認識の作業ではうまく作用できない。この問題を解決するために、ＩＳＯＭＡＰを使用した測地線距離（すなわち、線型または非線型の２つの点間の最短距離）が導入された。これについての詳細は、非特許文献６に説明されている。しかし、ＩＳＯＭＡＰを実際に使用するためには、パラメータ空間を、重なり合う凸状のピースに分解しなければならないということが何人かの研究者によって明らかにされている。そのように、多様体を用いた手法の難しさは、実際の使用時に、人物の特定の多様体を記述できる程度に充分なサンプルを提供できないという点にある。したがって、多様体を用いた手法を実際に使用するためは困難さが伴う。
D．Blackburn et al "Facial Recognition Vendor Test 2000:Evaluation Report, 2000" P.J.Phillips et al "The FERETE valuation Methodology for Face Recognition Algorithms: IEEE Trans. On PAMI, 22(10):1090〜1103, 2000" M.Turk et al "Face Recognition Using Eigenfaces, IEEE, 1991" P.N.Belhumeur et al "Eigenfaces vs. Fisherfaces : Recognition Using Class Specific Projection, IEEE Trans. PAMI, vol19, No.7, pp.711-720, 1997" Xiaofei He et al "Learning a Locality Preserving Subspace for Visual Recognition, Proceedings of the Ninth IEEE International Conference on Computer Vision, pp.385-392, ICCV,2003" J.B.Tenebaum et al "A Global Geometric Framework for Nonlinear Dimensional Reduction, Science, vol290, 22 December 2000" One of the reasons why it is difficult to select features in face recognition is that a face image is composed of a nonlinear manifold (ie, a nonlinear surface or a nonlinear space). Due to complex face manifolds, the traditional Euclidean distance used to determine the correspondence between images (ie, the linear distance between two points) cannot work well in face recognition tasks. To solve this problem, geodesic distance using ISOMAP (ie, the shortest distance between two points, linear or non-linear) was introduced. Details of this are described in Non-Patent Document 6. However, some researchers have shown that in order to actually use ISOMAP, the parameter space must be broken down into overlapping convex pieces. As such, the difficulty of the technique using manifolds is that, in actual use, it is not possible to provide enough samples to describe a particular manifold of a person. Therefore, it is difficult to actually use the technique using manifolds.
D. Blackburn et al "Facial Recognition Vendor Test 2000: Evaluation Report, 2000" PJPhillips et al "The FERETE valuation Methodology for Face Recognition Algorithms: IEEE Trans. On PAMI, 22 (10): 1090-1103, 2000" M. Turk et al "Face Recognition Using Eigenfaces, IEEE, 1991" PNBelhumeur et al "Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Projection, IEEE Trans.PAMI, vol19, No.7, pp.711-720, 1997" Xiaofei He et al "Learning a Locality Preserving Subspace for Visual Recognition, Proceedings of the Ninth IEEE International Conference on Computer Vision, pp.385-392, ICCV, 2003" JBTenebaum et al "A Global Geometric Framework for Nonlinear Dimensional Reduction, Science, vol290, 22 December 2000"

本発明が解決しようとする技術的課題は、リファレンス映像が少ない場合であっても、映像の認識率を向上できる映像認識システムおよび映像認識方法を提供することである。
また、本発明が解決しようとする他の技術的課題は、リファレンス映像が少ない場合であっても映像の認識率を向上させることができる映像補正システムおよび映像補正方法を提供することである。 The technical problem to be solved by the present invention is to provide a video recognition system and a video recognition method capable of improving the video recognition rate even when the number of reference videos is small.
Another technical problem to be solved by the present invention is to provide a video correction system and a video correction method capable of improving the video recognition rate even when the number of reference videos is small.

前記の技術的課題を解決するためになされた本発明に係る映像認識システムは、第１映像を入力する映像入力装置と、複数のリファレンス映像を保存するデータベースと、第１映像およびリファレンス映像を複数のサブ領域に分割し、第１映像の各サブ領域とリファレンス映像の対応するサブ領域とをそれぞれ比較し、この比較結果に基づいて、第１映像と最大の相関を有するリファレンス映像を決定する比較部とを含むことを特徴としている。 A video recognition system according to the present invention made to solve the above technical problem includes a video input device for inputting a first video, a database for storing a plurality of reference videos, a plurality of first videos and reference videos. And comparing each sub-region of the first video with the corresponding sub-region of the reference video, and determining the reference video having the maximum correlation with the first video based on the comparison result Part.

また、前記の技術的課題を解決するためになされた本発明に係る映像認識方法は、複数のサブ領域に分割された取得映像と、サブ領域に対応するサブ領域に分割されたリファレンス映像との対応関係を決定する映像認識方法であって、取得映像のサブ領域のうち１つと、リファレンス映像の対応するサブ領域との間に最大となる第１の相関を決定するステップと、取得映像の他のサブ領域と、前記リファレンス映像の対応するサブ領域との間に最大となる第２の相関を決定するステップと、第１の相関および第２の相関に基づいて、リファレンス映像のうち１つを選択するステップとを含むことを特徴としている。 In addition, the video recognition method according to the present invention made to solve the above technical problem includes an acquired video divided into a plurality of sub-regions and a reference video divided into sub-regions corresponding to the sub-regions. A video recognition method for determining a correspondence relationship, the step of determining a maximum first correlation between one of the sub-regions of the acquired video and the corresponding sub-region of the reference video; Determining a maximum second correlation between the sub-region and the corresponding sub-region of the reference image, and based on the first correlation and the second correlation, And a step of selecting.

また、前記の他の技術的課題を解決するためになされた本発明に係る映像補正システムは、第１映像を入力する映像入力装置と、リファレンス映像セットを保存するデータベースと、複数のサブ領域に分割された第１映像を取得し、第１映像の各サブ領域を対応する第１映像のサブ領域の平均映像と比較して、照明および／または遮蔽の影響を除去することで補正された第１映像のサブ領域を生成し、補正された第１映像のサブ領域に基づいて補正された第１映像を生成する補正部とを含むことを特徴としている。 The video correction system according to the present invention, which has been made to solve the other technical problems, includes a video input device that inputs a first video, a database that stores a reference video set, and a plurality of sub-regions. The divided first image is acquired, each sub-region of the first image is compared with the average image of the corresponding sub-region of the first image, and corrected by removing the influence of illumination and / or occlusion. And a correction unit that generates a sub-region of one video and generates a corrected first video based on the corrected sub-region of the first video.

さらに、前記の他の技術的課題を解決するためになされた本発明に係る映像補正方法は、取得映像の照明および／または遮蔽の影響を除去する方法において、取得映像の各サブ領域について、サブ領域と、サブ領域の平均映像との差を最小化するサブ領域ファクターを決定するステップと、取得映像の全体が補正されるように、サブ領域ファクターを対応するサブ領域に適用するステップと、を含むことを特徴とする。 Furthermore, an image correction method according to the present invention, which has been made to solve the other technical problems described above, is a method for removing the influence of illumination and / or occlusion of an acquired image. Determining a sub-region factor that minimizes the difference between the region and the average image of the sub-region, and applying the sub-region factor to the corresponding sub-region so that the entire acquired image is corrected. It is characterized by including.

本発明によれば、映像セットごとのレファレンス映像の枚数が少ない場合であっても、映像の認識率を向上させることができる。
また、入力された映像に対してサブ領域別に輝度を調節することで、相異なる照明条件下でも認識率を改善できる。
また、最重要特徴抽出（ＭＩＦＥ：Most Informative Feature Extraction）を通じて、各サブ領域が分類器として動作し、各サブ領域が最も多く属するクラスを、最終クラスとして分類する多数決方式を適用することで、認識率を改善できる。 According to the present invention, even when the number of reference videos for each video set is small, the video recognition rate can be improved.
In addition, the recognition rate can be improved even under different illumination conditions by adjusting the luminance for each sub-region with respect to the input video.
In addition, each sub-region operates as a classifier through the most important feature extraction (MIFE), and recognition is performed by applying a majority method that classifies the class to which each sub-region belongs most as the final class. The rate can be improved.

従来技術の多くの情報特徴抽出方法または他の方法を使用するときに、本発明に係るガンマ補正またはサブ領域に基づくヒストグラム等分布化変換（Sub-region based Histogram Equalization：ＳＨＥ）を適用することで、相異なる照明下で行われる顔認識作業の認識率を効果的に改善できる。 When using many information feature extraction methods or other methods of the prior art, applying gamma correction or sub-region based Histogram Equalization (SHE) according to the present invention The recognition rate of face recognition work performed under different lighting can be effectively improved.

また、ガンマ補正またはＳＨＥは、輝度の調節に効果的であり、相異なる表情、照明変化または遮蔽があるとき、顔認識において相対的にエラー率を低減することができる。
また、最重要特徴抽出（ＭＩＦＥ：Most Informative Feature Extraction）は、単純なユークリッド距離と複雑な測地線距離とをトレードオフすることで、他の分類方法、例えば、高次元の特徴ベクトルを１次元距離にマッピングするＫ−ｍｅａｎｓクラスタリングまたはＫ近傍法では使用できない分類情報を利用でき、したがって、多くの学習サンプルを必要とする複雑な多様体および測地線距離を計算する必要がないという長所がある。 In addition, gamma correction or SHE is effective in adjusting luminance, and can reduce the error rate relatively in face recognition when there are different facial expressions, illumination changes or occlusions.
In addition, Most Informative Feature Extraction (MIFE) is a trade-off between simple Euclidean distance and complex geodesic distance, so that other classification methods, for example, high-dimensional feature vectors can be converted into one-dimensional distances. There is an advantage that classification information that cannot be used in K-means clustering or K-neighboring method to map to can be used, and thus it is not necessary to calculate complex manifolds and geodesic distances that require many learning samples.

以下、添付した図面を参照しつつ、本発明の実施の形態を詳細に説明する。
図９は、本実施の形態に係る顔認識を行う映像認識システムの装置構成の概略ブロック図である。図９に示すように、カメラ９０は、コンピュータ９１に連結されている。コンピュータ９１は、既知の顔についてのリファレンス映像が保存されているデータベース９２と接続している。カメラ９０は、識別される顔の映像を取得するために使用される。本実施の形態において、カメラ９０は、５メガピクセルであって、３２０×２４０（ピクセル）の解像度を有するデジタルカメラを使用した。
なお、カメラ９０は、他の解像度を有してもよく、ＰＤＡ（Personal Digital Assistant）、電話機、セキュリティシステム等に使用されるカメラまたは写真撮影の可能な他の類似する装置であってもよい。また、カメラ９０以外に、デジタル写真ではない映像を走査して入力するためのスキャナ（図示せず）をデジタル映像入力装置として使用してもよく、また、デジタル映像を直接コンピュータ９１に入力することも可能である。
さらに、図９では、カメラ９０が直接コンピュータ９２と接続されているが、必ずしも接続されている必要はない。その代わりに、スキャナ（図示せず）を通じて、または記録媒体から映像をアップロードして映像を伝送したり、有線または無線伝送技術を使用して、ネットワークを通じて伝送することも可能である。
テスト映像がコンピュータ９０にローディングされれば、コンピュータ９１は、テスト映像の特徴点を識別して、テスト映像をサブ領域に分割する。コンピュータ９１は、分割されたサブ領域について後述する、サブ領域に基づく適応的なガンマ補正、またはＳＨＥおよび／またはＭＩＦＥを行う。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 9 is a schematic block diagram of an apparatus configuration of a video recognition system that performs face recognition according to the present embodiment. As shown in FIG. 9, the camera 90 is connected to a computer 91. The computer 91 is connected to a database 92 that stores a reference video for a known face. The camera 90 is used to acquire an image of the face to be identified. In the present embodiment, the camera 90 is 5 megapixels, and a digital camera having a resolution of 320 × 240 (pixels) is used.
The camera 90 may have other resolutions, and may be a camera used for a PDA (Personal Digital Assistant), a telephone, a security system, or other similar device capable of taking a photograph. In addition to the camera 90, a scanner (not shown) for scanning and inputting an image that is not a digital photograph may be used as the digital image input device, and the digital image may be directly input to the computer 91. Is also possible.
Further, in FIG. 9, the camera 90 is directly connected to the computer 92, but it is not always necessary to be connected. Alternatively, the video can be transmitted through a scanner (not shown), uploaded from a recording medium and transmitted, or transmitted over a network using wired or wireless transmission technology.
When the test video is loaded on the computer 90, the computer 91 identifies the feature points of the test video and divides the test video into sub-regions. The computer 91 performs adaptive gamma correction based on the sub-region, or SHE and / or MIFE described later for the divided sub-region.

コンピュータ９１は、補正された入力映像のサブ領域を、データベース９２に保存されたリファレンス映像の該当するサブ領域と比較する。各補正されたサブ領域の比較結果に基づいて、コンピュータ９１は、後述される多数決方式を利用して、どのリファレンス映像がテスト映像と最も近いかを判別する。 The computer 91 compares the corrected sub-region of the input video with the corresponding sub-region of the reference video stored in the database 92. Based on the comparison result of each corrected sub-region, the computer 91 determines which reference video is closest to the test video by using a majority method described later.

図９に示した実施の形態で、コンピュータ９１は、ＣＰＵクロックが１ＧＨｚ、ＲＡＭ（Random Access Memory）容量が２５６Ｍｂｙｔｅｓのパーソナルコンピュータである。しかし、コンピュータ９１は、他の形態のコンピュータを用いることも可能であり、汎用または特殊目的用のコンピュータであってもよく、携帯可能または携帯できない形態であってもよい。さらに、コンピュータ９１は、グリッドコンピュータまたは並列コンピュータを通じて、テスト映像と、データベース９２に保存されたリファレンス映像との対応するサブ領域を集合的に分析する形態を有するコンピュータであってもよい。コンピュータ９１が携帯可能であれば、コンピュータ９１は、ノートブックタイプのポータブルコンピュータであってもよく、ＰＤＡ（Personal Digital Assistant）等にコンピュータ９１が計算した比較結果を送信して表示させる構成であってもよい。 In the embodiment shown in FIG. 9, the computer 91 is a personal computer having a CPU clock of 1 GHz and a RAM (Random Access Memory) capacity of 256 Mbytes. However, the computer 91 can use other forms of computers, may be general-purpose or special-purpose computers, and may be portable or non-portable. Further, the computer 91 may be a computer having a form in which sub-regions corresponding to the test video and the reference video stored in the database 92 are collectively analyzed through a grid computer or a parallel computer. If the computer 91 is portable, the computer 91 may be a notebook type portable computer, and is configured to transmit and display the comparison result calculated by the computer 91 on a PDA (Personal Digital Assistant) or the like. Also good.

また、データベース９２は、説明のためにコンピュータ９１と別体に分離されて示されているが、データベース９２は、ネットワーク等を通じた伝送による時間を減らすために、コンピュータ９１に内蔵されていることが好ましい。データベース９２がコンピュータ９１と分離されていれば、データベース９２は、ＬＡＮ（Local Area Network）、インターネット、または、他の有線または無線ネットワークを介してコンピュータ９１に接続される。このような場合、リファレンス映像が、セキュリティ目的で、人を識別するために使用される場合は、それぞれ異なる位置で、カメラ９０によって撮像された人を認識するように、データベース９２のリファレンス映像を、相異なる位置にある複数のコンピュータ９１が使用することも可能である。
したがって、データベース９２は、複数のコンピュータ９１に対して、１つであることも可能である。また、該当コンピュータ９１内で使用されるように、各場所にメーリングまたは伝送されることも可能である。
または、１箇所のデータベース９２は、各場所からネットワークを通じて更新することも可能である。このようなデータベース９２は、司法当局によるパスポートの確認または人物の識別を目的として別個の場所（例えば、政府機関）に設置することも可能である。 The database 92 is shown separately from the computer 91 for the sake of explanation. However, the database 92 may be built in the computer 91 in order to reduce the time required for transmission through a network or the like. preferable. If the database 92 is separated from the computer 91, the database 92 is connected to the computer 91 via a LAN (Local Area Network), the Internet, or another wired or wireless network. In such a case, if the reference video is used to identify a person for security purposes, the reference video in the database 92 is recognized so as to recognize the person imaged by the camera 90 at different positions. It is also possible to use a plurality of computers 91 at different positions.
Therefore, one database 92 can be provided for a plurality of computers 91. It can also be mailed or transmitted to each location for use within the computer 91.
Alternatively, the database 92 at one place can be updated from each place through the network. Such a database 92 can also be placed in a separate location (eg, a government agency) for the purpose of verifying a passport or identifying a person by a judicial authority.

前記のようなデータベース９２は、記録媒体、例えば、ハードディスクドライブのような磁気記録媒体、光磁気記録媒体、ＣＤ（Compact Disc）またはＤＶＤ（Digital Versatile Disc）のような光記録媒体、または、ブルーレイディスクおよびＡＯＤ（Advanced Optical Disc）のような次世代光ディスク等に保存することが可能である。また、データベース９２は、読み取り専用、追記または再記録が可能な記録媒体であってもよい。データベース９２が、追記または再記録が可能であれば、リファレンス映像として、すべての映像をデータベース９２に再伝送することなく、新たな映像をリファレンス映像に反映することができる。すなわち、ある人物が任意の場所で初めて認識され、その他の場所において、その人物の映像をデータベース９２に反映するように更新するときに有効である。 The database 92 is a recording medium, for example, a magnetic recording medium such as a hard disk drive, a magneto-optical recording medium, an optical recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc), or a Blu-ray disc. It can be stored on a next generation optical disc such as an AOD (Advanced Optical Disc). The database 92 may be a recording medium that is read-only, rewritable, or re-recordable. If the database 92 can be additionally recorded or re-recorded, a new video can be reflected on the reference video without retransmitting all the videos to the database 92 as the reference video. That is, it is effective when a person is recognized for the first time at an arbitrary place and is updated so that the video of the person is reflected in the database 92 at other places.

図１は、本実施の形態に係る顔認識方法についてのフローチャートである。図１を参照しつつ、顔認識方法を詳しく説明する。まず、図９に示したコンピュータ９１で、カメラ９０を通じて入力された顔映像について前処理する（ステップ１０）。この前処理には、顔映像に混入されたノイズの除去、顔のサイズや位置を所望のサイズと位置に合わせる顔映像の正規化などが含まれる。 FIG. 1 is a flowchart of the face recognition method according to the present embodiment. The face recognition method will be described in detail with reference to FIG. First, the face image input through the camera 90 is preprocessed by the computer 91 shown in FIG. 9 (step 10). This preprocessing includes removal of noise mixed in the face image, normalization of the face image that matches the size and position of the face with a desired size and position, and the like.

次に、前処理された顔映像から特徴を検出する（ステップ１１）。特徴の検出は、顔構成成分の位置、形状、幅、長さなどに注目して、幾何学的な特徴点を探す構造的な方法、極座標系変換、ウェーブレット変換のような数学的な変換フィルタ関数を利用する方法、そして、主成分分析法、局所特徴分析法（Local Feature Analysis）、線形判別分析のような統計的なモデルを利用する方法などがある。他の例としては、Ｋ−Ｌ変換、神経網モデル、および３次元情報を得るためのベイズ確率モデルを使用してもよい。
特徴を検出する方法を例として挙げれば、まず、顔映像の特徴部分に特徴点を形成する。特徴点は、例えば、瞳孔、鼻および口のような部分に形成できる。そして、特徴点が形成されれば、コンピュータ９１は、映像上で特徴点の座標を計算して、この座標を利用して映像を正規化する。そして、特徴抽出が完了すれば、特徴点間の共線性と距離との比率を維持するように、アフィン変換などを通じて映像を変換する。例えば、図２Ａに示した映像を、図２Ｂに示した映像のように変換する。 Next, a feature is detected from the preprocessed face image (step 11). Feature detection is a structural method that searches for geometric feature points by focusing on the position, shape, width, length, etc. of face components, mathematical transformation filters such as polar coordinate system transformation, wavelet transformation, etc. There are methods that use functions, and methods that use statistical models such as principal component analysis, local feature analysis, and linear discriminant analysis. As other examples, a KL transformation, a neural network model, and a Bayesian probability model for obtaining three-dimensional information may be used.
Taking a method for detecting features as an example, first, feature points are formed in feature portions of a face image. The feature points can be formed in parts such as the pupil, nose and mouth, for example. When the feature points are formed, the computer 91 calculates the coordinates of the feature points on the video and normalizes the video using the coordinates. When the feature extraction is completed, the video is converted through affine transformation or the like so as to maintain the ratio between the collinearity and the distance between the feature points. For example, the video shown in FIG. 2A is converted into the video shown in FIG. 2B.

次に、顔映像を複数のサブ領域に分割する（ステップ１２）。ここで、顔認識を行うために、各サブ領域の適切なサイズを選択することが重要である。もし、サブ領域が非常に小さければ、異なる人の顔が、局所的には同じになって、認識が難しくなることもある。それに対し、サブ領域が非常に大きければ、異なる位置から照明が照らした同じ人物に対する映像が除外されて、認識がさらに難しくなることもある。
このサブ領域数は、実験的に決めることもある。例えば、サブ領域のサイズ別に映像に対するエラー率を求める実験結果によって、そのサイズを決めることができる。ここで、次の表１は、図４Ａに示した顔映像のデータベースであるイェールＢ（yale-B）の映像に対するエラー率を求めた実験結果を示したものである。 Next, the face image is divided into a plurality of sub-regions (step 12). Here, in order to perform face recognition, it is important to select an appropriate size of each sub-region. If the sub-region is very small, different people's faces may be locally the same, making recognition difficult. On the other hand, if the sub-region is very large, images for the same person illuminated from different positions are excluded, which may make recognition more difficult.
The number of subregions may be determined experimentally. For example, the size can be determined by an experimental result for obtaining the error rate for the video for each size of the sub-region. Here, the following Table 1 shows an experimental result of obtaining an error rate for the image of Yale B, which is a database of face images shown in FIG. 4A.

表１によれば、９×９（ピクセル）のサイズが最もエラー率が低く、適切なサイズであることが分かる。これにより、図２Ｂの顔映像を図２Ｃに示したように、複数のサブ領域に分割できる。図２Ｂに示した顔映像の高さがＨ、幅がＷであれば、顔映像のサイズは、Ｈ×Ｗで表現され、それぞれ高さ方向に９０個、幅方向に６３個ずつの特徴ベクトルを有するならば、全体特徴ベクトルＩｉ（但し、ｉ＝１、２、…Ｎ）は、９０×６３＝５６７０個になる。それを、ｈ×ｗ＝９×９＝８１個のサイズのサブ領域に分ければ、全体サブ領域数は、Ｄ＝ｉｎｔ（Ｈ／ｈ）×ｉｎｔ（Ｗ／ｗ）＝１０×７＝７０になる。図２Ｃに示したように、顔映像は７０個のサブ領域に分割され、そのサブ領域が顔映像の特徴空間になる。 According to Table 1, it can be seen that the size of 9 × 9 (pixels) has the lowest error rate and is an appropriate size. Thereby, the face image of FIG. 2B can be divided into a plurality of sub-regions as shown in FIG. 2C. If the height of the face image shown in FIG. 2B is H and the width is W, the size of the face image is represented by H × W, and each of the feature vectors is 90 in the height direction and 63 in the width direction. , The total feature vector Ii (where i = 1, 2,... N) is 90 × 63 = 5670. If it is divided into h × w = 9 × 9 = 81 size sub-regions, the total number of sub-regions becomes D = int (H / h) × int (W / w) = 10 × 7 = 70. Become. As shown in FIG. 2C, the face image is divided into 70 sub-regions, and the sub-region becomes a feature space of the face image.

次に、コンピュータ９１は、分割されたサブ領域に対して選択的に輝度を調節する（ステップ１３）。輝度の調節は、ガンマ補正またはＳＨＥ等を使用することができる。 Next, the computer 91 selectively adjusts the luminance for the divided sub-regions (step 13). The brightness can be adjusted using gamma correction or SHE.

ガンマ補正は、照明の変化による顔映像の変化を補正するためのものであって、相異なる照明下で取得された映像を、同じ照明下で取得された映像のように調節する。輝度は、リファレンス映像全体について平均した平均映像の輝度を基準とする。サブ領域ファクターの例であるガンマパラメータは、元の映像Ｉと、平均映像Ｉ₀との各サブ領域別距離を最小化するように選択される。ここで、平均映像は、次の数式（１）のように、リファレンス映像として形成された学習映像セットを平均することで得られる。 The gamma correction is for correcting a change in face image due to a change in illumination, and adjusts an image acquired under different illuminations like an image acquired under the same illumination. The luminance is based on the average video luminance averaged over the entire reference video. The gamma parameter, which is an example of the sub-region factor, is selected so as to minimize the distance for each sub-region between the original image I and the average image I ₀ . Here, the average video is obtained by averaging the learning video set formed as the reference video as in the following formula (1).

ここで、Ｎは、学習セットに含まれるリファレンス映像の枚数である。
各サブ領域のガンマパラメータは、テスト映像の第ｋ番目のサブ領域Ｉと、平均映像の第ｋ番目のサブ領域との距離を最小化するように選択される。ガンマパラメータγによって、テスト映像の第ｋ番目のサブ領域のピクセル値
は、次式のように補正される。 Here, N is the number of reference videos included in the learning set.
The gamma parameter for each sub-region is selected to minimize the distance between the k-th sub-region I of the test video and the k-th sub-region of the average video. The pixel value of the kth sub-region of the test video according to the gamma parameter γ
Is corrected as follows:

ここで、
は、ガンマ補正された映像の第ｋ番目のサブ領域のピクセル値であり、関数ｄｉｓは、距離関数であり、ｃは係数である。また、
は、平均映像の第ｋ番目のサブ領域のピクセル値である。 here,
Is the pixel value of the kth sub-region of the gamma-corrected image, the function dis is a distance function, and c is a coefficient. Also,
Is the pixel value of the kth sub-region of the average video.

ここで、図３Ａは、イェールＡという顔映像データベースに含まれる顔映像を示す図面である。イェールＡデータベースは、それぞれ異なる条件、例えば、異なる表情、異なる照明およびガラスの後ろに遮蔽された条件等で取得した顔映像を含んでいる。図３Ｂは、図３ＡのイェールＡの顔映像について、前記の数式４によってガンマ補正した結果を示す図面である。
図４Ａは、異なる顔映像データベースであるイェールＢの第４サブセットに含まれる顔映像を示す図面である。図４Ｂは、図４Ａの顔映像のガンマ補正結果を示す図面である。さらに、図４Ｃは、図４Ａの顔映像について、ＳＨＥを行った結果を示す図面である。
図３Ｂ、図４Ｂおよび図４Ｃのガンマ補正された映像によれば、照明の影響が大きく減少したことが分かる。
ガンマ補正は、従来の主成分分析法または相関方法に適用してもよい。 Here, FIG. 3A is a view showing a face image included in the face image database Yale A. The Yale A database includes facial images acquired under different conditions, for example, different facial expressions, different lighting, and conditions shielded behind the glass. FIG. 3B is a diagram illustrating a result of gamma correction performed on the face image of Yale A in FIG.
FIG. 4A is a diagram illustrating face images included in a fourth subset of Yale B, which is a different face image database. FIG. 4B is a diagram illustrating a gamma correction result of the face image of FIG. 4A. Furthermore, FIG. 4C is a diagram illustrating a result of performing SHE on the face image of FIG. 4A.
According to the gamma-corrected images of FIGS. 3B, 4B, and 4C, it can be seen that the influence of illumination is greatly reduced.
Gamma correction may be applied to conventional principal component analysis methods or correlation methods.

次に、相関方法は、リファレンス映像とテスト映像との直接ユークリッド距離、または正規化されたユークリッド距離を計算して最小距離を探し、それにより、テスト映像についてのラベルを求める方法、すなわち、テスト映像が属する最終クラスを得る方法である。 Next, the correlation method calculates a direct Euclidean distance between the reference image and the test image, or calculates a normalized Euclidean distance to find a minimum distance, thereby obtaining a label for the test image, that is, the test image. Is the way to get the final class to which

次いで、テスト映像に対してＭＩＦＥを行う（ステップ１４）。
ＭＩＦＥを簡単に説明するために、１つの映像ベクトルについてのクラス分類方法を説明する。まず、Ｃ個のクラスに属するＮ個の学習サンプルを仮定する。Ｄ次元の特徴空間で各サンプルｘ_iは、ベクトルであって、
、ｉ＝１，２，…，Ｎ（ここで、Ｔは、転置行列を示す記号である）のように表現することができる。
各サンプルベクトルｘ_iは、クラスラベルｋ＝ｌ（ｘ_i）を有し、これは、ｘ_iが第ｋ番目のクラスに属するということを意味する。これを数式で表現すれば、次の通りである。 Next, MIFE is performed on the test video (step 14).
In order to explain MIFE simply, a class classification method for one video vector will be described. First, N learning samples belonging to C classes are assumed. Each sample x _i in the D-dimensional feature space is a vector,
, I = 1, 2,..., N (where T is a symbol indicating a transposed matrix).
Each sample vector x _i has a class label k = 1 (x _i ), which means that x _i belongs to the kth class. This can be expressed in mathematical formulas as follows.

テストサンプルｚについて、ユークリッド距離またはマハラノビス距離に基づいて、ｚが第ｌ番目のクラスに属することを決定するクラスタリング基準を、ｚとサンプルベクトルとの距離を利用して、次式のように求めることができる。 For the test sample z, based on the Euclidean distance or Mahalanobis distance, a clustering criterion for determining that z belongs to the l-th class is obtained using the distance between z and the sample vector as follows: Can do.

ここで、関数ｄｉｓは、テストサンプルと平均ベクトルとのユークリッド距離またはマハラノビス距離である。
数式５および数式６によって、ｚが属するクラスｙ’_i（ｚ）は、次式のように表現される。 Here, the function dis is the Euclidean distance or Mahalanobis distance between the test sample and the average vector.
The class y ′ _i (z) to which z belongs is expressed by the following formulas using the formulas 5 and 6.

顔認識で分類されるクラスは、認識対象になる人物である。すなわち、１人が１つのクラスに該当し、１つのクラスに表情および周辺環境によって、それぞれ異なる顔映像をリファレンス映像として複数を備え、同じクラスに属するリファレンス映像に対して、同じ人物であることを認識するように学習する。学習が完了すれば、テスト映像に対して認識が行われる。学習とテスト過程は同じであり、認識対象をあらかじめ知っていているか否かの違いがある。すなわち、学習は、認識対象をあらかじめ知っている場合であり、認識結果と認識対象との相違点を減らすように、認識アルゴリズムを繰り返して行うことであり、テストは、前記認識アルゴリズムによって、テスト映像と、各クラスを構成する映像であるリファレンス映像とを比較して、最終認識結果を出力することである。 A class classified by face recognition is a person to be recognized. That is, one person corresponds to one class, one class includes a plurality of different face images as reference images depending on facial expressions and the surrounding environment, and the same person for reference images belonging to the same class. Learn to recognize. When the learning is completed, the test video is recognized. The learning and testing processes are the same, and there is a difference whether or not the recognition target is known in advance. That is, learning is when the recognition target is known in advance, and the recognition algorithm is repeatedly performed so as to reduce the difference between the recognition result and the recognition target. The test is performed using the recognition algorithm according to the test video. And a reference video that is a video constituting each class, and a final recognition result is output.

本実施の形態で前記したサブ領域は、特徴空間を表すものであり、分割されたサブ領域の個数が特徴空間の次元になる。図２Ｂに示したサブ領域によれば、映像ベクトルは総５６７０個であり、サブ領域は７０個である。各サブ領域の映像ベクトルの次元は８１個である。 The sub-region described in the present embodiment represents a feature space, and the number of divided sub-regions becomes the dimension of the feature space. According to the sub-region shown in FIG. 2B, there are a total of 5670 video vectors and 70 sub-regions. The dimension of the video vector in each sub-region is 81.

ここで、図５は、図１のＭＩＦＥについての詳細フローチャートである。図５に示したフローチャートを参照しつつ、ＭＩＦＥの手順を詳しく説明する。
まず、テスト映像の第ｊサブ領域と、リファレンス映像の第ｊサブ領域とをそれぞれ比較して（ステップ５０）、各サブ領域の対応関係を調べる。比較結果、対応関係が最も近いリファレンス映像が属するクラスにテスト映像の第ｊサブ領域をラベリングする（ステップ５１）。テスト映像Ｉ_xの第ｊサブ領域の映像ベクトルをｚ_jxとすれば、第ｊサブ領域のラベルは、次式のように決定される。 Here, FIG. 5 is a detailed flowchart of the MIFE in FIG. The MIFE procedure will be described in detail with reference to the flowchart shown in FIG.
First, the jth sub-region of the test video and the j-th sub-region of the reference video are respectively compared (step 50), and the correspondence between the sub-regions is examined. As a result of comparison, the j-th sub-region of the test video is labeled to the class to which the reference video with the closest correspondence belongs (step 51). If the video vector of the j-th sub-region of the test video I _x is z _jx , the label of the j-th sub-region is determined as follows:

ここで、Ｎは、リファレンス映像の個数である。
数式８によって、第ｊサブ領域が第ｌ番目のクラスに属すれば、次式のように表すことができる。 Here, N is the number of reference videos.
If the j-th sub-region belongs to the l-th class according to Expression 8, it can be expressed as the following expression.

その結果、Ｉ_xに対してＤ次元の決定行列である
が得られ、サブ領域ごとに、どのクラスに属するかを分類できる。 The result is a D-dimensional decision matrix for I _x
Can be classified for each sub-region.

図６は、各サブ領域別に分類する過程および分類結果を示す図面である。
参照符号６０のテスト映像に対してガンマ補正を行えば、参照符号６１の映像が得られる。同様に、Ｎ個のリファレンス映像６２に対してもガンマ補正が行われる。
次に、参照符号６１の映像の第ｊサブ領域と、ガンマ補正されたリファレンス映像６３の第ｊサブ領域とをそれぞれ比較する。例えば、図６に示すように、参照符号６１の左側上端の第１のサブ領域と、リファレンス映像６３の第１のサブ領域とがそれぞれ比較される（６４）。比較結果、参照符号６５のように、各サブ領域別に最も隣接したクラスを探すことができる。
次に、各サブ領域による多数決方式によって、テスト映像についての最終クラスを決定する（ステップ５２）。最終クラスの決定は、決定行列Ｙに対して、次式のように、サブ領域別に最も多く対応するクラスを最終クラスとして区分する。 FIG. 6 is a diagram illustrating a process of classifying each sub-region and a classification result.
If gamma correction is performed on the test image of reference numeral 60, an image of reference numeral 61 is obtained. Similarly, gamma correction is performed on N reference images 62.
Next, the j-th sub-region of the video with reference numeral 61 and the j-th sub-region of the gamma-corrected reference video 63 are compared. For example, as shown in FIG. 6, the first sub-region at the upper left end of the reference numeral 61 is compared with the first sub-region of the reference video 63 (64). As a result of comparison, as shown by reference numeral 65, the closest class can be found for each sub-region.
Next, the final class for the test video is determined by the majority method based on each sub-region (step 52). In determining the final class, the class corresponding most to each sub-region is classified as the final class for the decision matrix Y as shown in the following equation.

数式１０によれば、図６に示したリファレンス映像のうち、第１リファレンス映像６６が最終クラスとして区分される。そして、最終クラスに区部された第１リファレンス映像６６に該当する人物（そのＩＤ）を、テスト映像についての認識結果として出力する（図１のステップ１５）。 According to Equation 10, the first reference video 66 is classified as the final class among the reference videos shown in FIG. Then, the person (its ID) corresponding to the first reference video 66 divided into the final class is output as a recognition result for the test video (step 15 in FIG. 1).

図１０は、本実施の形態の顔認識システムの概念ブロック図を示す図面である。図１０に示した顔認識システムは、図１に示した装置構成を使用して具現され、またはマルチプロセッサを使用して具現することも可能である。
図１０に示すように、リファレンス映像は、前処理部１０１に入力されてワーピングされるか、または正規化される。特に、前処理部１０１は、ユーザーにより手作業で抽出することも可能な特徴点に基づいてリファレンス映像を正規化する。特徴点は、リファレンス映像の、瞳孔、口の中央、鼻などを含んでいる。特徴点として抽出された瞳孔のｘ−ｙ座標および口のｙ座標等は、リファレンス映像にもとづいて適切に位置するようにワーピングすることも可能である。 FIG. 10 is a conceptual block diagram of the face recognition system of the present embodiment. The face recognition system illustrated in FIG. 10 may be implemented using the apparatus configuration illustrated in FIG. 1 or may be implemented using a multiprocessor.
As shown in FIG. 10, the reference video is input to the preprocessing unit 101 and warped or normalized. In particular, the preprocessing unit 101 normalizes the reference video based on feature points that can be manually extracted by the user. The feature points include the pupil of the reference image, the center of the mouth, the nose, and the like. The xy coordinates of the pupil extracted as the feature points, the y coordinate of the mouth, and the like can be warped so as to be appropriately positioned based on the reference image.

輝度調節部１０２は、正規化されたリファレンス映像をサブ領域に分割し、サブ領域ごとにガンマ補正またはＳＨＥを行って、リファレンス映像の輝度を調節する。
１つ以上のテスト映像も同様に、前処理部１０３と輝度調節部１０４とによって、それぞれ正規化され、輝度が調節される。
ＭＩＦＥプロセッサ１０５は、輝度が調節された映像について、図５および図６を用いて説明したようにＭＩＦＥを実行して、テスト映像をリファレンス映像のうち、１つに対応するとして認識する。
ここで、２つの前処理部１０１、１０３は、１つのユニットで具現することも可能である。また、２つの輝度調節部１０２、１０４も、１つのユニットで具現し、補正された映像を共通のＭＩＦＥプロセッサ１０５に結合する複数のコンピュータで具現することも可能である。 The luminance adjusting unit 102 divides the normalized reference video into sub-regions, and performs gamma correction or SHE for each sub-region to adjust the luminance of the reference video.
Similarly, one or more test videos are normalized by the pre-processing unit 103 and the luminance adjusting unit 104, and the luminance is adjusted.
The MIFE processor 105 executes MIFE on the video whose luminance has been adjusted as described with reference to FIGS. 5 and 6, and recognizes the test video as corresponding to one of the reference videos.
Here, the two pre-processing units 101 and 103 can be implemented as one unit. Also, the two brightness adjusting units 102 and 104 may be implemented as a single unit, and may be implemented as a plurality of computers that couple the corrected video to the common MIFE processor 105.

図７は、本発明と従来技術とによる認識結果をそれぞれ比較して示す図面である。ここで、リファレンス映像は、図３Ａに示したイェールＡ映像であり、横軸は、人別のリファレンス映像の枚数を示している。
図７に示すように、従来技術である主成分分析法（ＰＣＡ）および相関方法による結果より、本発明のＭＩＦＥによる認識率が高いということが分かる。特に、本発明のＭＩＦＥとガンマ補正とを結合した場合、リファレンス映像が４個であるときに１００％認識でき、リファレンス映像が１個であるときにも約９０％の認識率を示す。 FIG. 7 is a diagram showing comparison results of recognition according to the present invention and the prior art. Here, the reference video is the Yale A video shown in FIG. 3A, and the horizontal axis indicates the number of reference videos for each person.
As shown in FIG. 7, it can be seen that the recognition rate by the MIFE of the present invention is high from the results of the principal component analysis method (PCA) and the correlation method which are the prior art. In particular, when the MIFE and gamma correction of the present invention are combined, 100% recognition is possible when there are four reference images, and a recognition rate of about 90% is exhibited even when there is only one reference image.

また、主成分分析法および相関方法の従来技術に、本発明のガンマ補正を適用した場合の認識率が、主成分分析法または相関方法を単独で適用した場合より改善されることが分かる。さらに、本発明のガンマ補正を適用した後にＭＩＦＥを行った場合が、認識率が最も高いことが分かる。 It can also be seen that the recognition rate when the gamma correction of the present invention is applied to the prior art of the principal component analysis method and the correlation method is improved as compared with the case where the principal component analysis method or the correlation method is applied alone. Furthermore, it can be seen that the recognition rate is highest when MIFE is performed after applying the gamma correction of the present invention.

図８Ａないし図８Ｄは、イェールＢ顔映像データベースのうち、第１サブセットないし第４サブセットをそれぞれ示す図面である。図示した顔映像は、１０の顔に対して４５個の相異なる照明条件で撮像されたものである。各サブセットは、撮像手段の正面軸に対する照明角によって区分されたものである。第１サブセット（図８Ａ）は、照明角が０゜ないし１２゜である場合、第２サブセット（図８Ｂ）は、１２゜ないし２５゜である場合、第３サブセット（図８Ｃ）は、２５゜ないし５０゜である場合、そして、第４サブセット（図８Ｄ）は、５０゜ないし７７゜である場合である。 8A to 8D are diagrams illustrating first to fourth subsets of the Yale B face image database, respectively. The illustrated face image is captured with 45 different illumination conditions for 10 faces. Each subset is divided by the illumination angle with respect to the front axis of the imaging means. The first subset (FIG. 8A) has an illumination angle of 0 ° to 12 °, the second subset (FIG. 8B) has an angle of 12 ° to 25 °, and the third subset (FIG. 8C) has an angle of 25 °. The fourth subset (FIG. 8D) is the case of 50 ° to 77 °.

次の表は、図８Ａないし図８Ｄに示した顔映像を利用して従来技術と本発明とによる認識結果を表したものである。ここで、図８Ａの第１サブセットは学習に使用され、図８Ｂないし図８Ｄはテストに使用された。 The following table shows the recognition results according to the prior art and the present invention using the face images shown in FIGS. 8A to 8D. Here, the first subset of FIG. 8A was used for learning, and FIGS. 8B-8D were used for testing.

表２において、ＩＣＴＣＡＳは、Shiguang Shan et al “Illumination Normalization for Robust Face Recognition against varying Lighting Condition, IEEE International Workshop on Analysis and Modeling of Faces and Gestures(AMFG),pp.157-164,Nice,France,Oct.2003."による結果である。主成分分析法は、散乱行列の固有ベクトルおよび固有値を利用して計算されたものである。「ＰＣＡｗｉｔｈｏｕｔ１ｓｔ３」は、最大の固有値を有する３個の固有顔（eigenface）を除いた後の主成分分析法を適用した結果である。 In Table 2, ICTCAS is Shiguang Shan et al “Illumination Normalization for Robust Face Recognition against varying Lighting Condition, IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), pp.157-164, Nice, France, Oct. 2003. " The principal component analysis method is calculated using eigenvectors and eigenvalues of the scattering matrix. “PCA without 1st 3” is the result of applying the principal component analysis method after removing the three eigenfaces having the largest eigenvalues.

表２に示すように、本発明に係るガンマ補正が適用された場合、認識エラー率が改善されたことが分かり、特に、ガンマ補正が行われた映像に対してＭＩＦＥを適用して顔認識を行うとき、最もエラー率が低いことが分かる。 As shown in Table 2, it can be seen that when the gamma correction according to the present invention is applied, the recognition error rate is improved. In particular, the face recognition is performed by applying the MIFE to the video subjected to the gamma correction. When done, it can be seen that the error rate is the lowest.

本発明の顔認識方法は、メモリ制限のあるチップ基盤アプリケーション、または、機械可読旅行文書（Machine Readable Travel Document：ＭＲＴＤ）のデータを記録したメモリ制限のあるＩＤカードに適用することが可能である。ここで、ＭＲＴＤは、所有者の目の画像を含むパスポートまたはビザのような国際旅行文書であって、機械で読み取り可能なデータを含んでいる。そのような場合、図９に示した装置構成は、大型コンピュータの一部として具現される。 The face recognition method of the present invention can be applied to a chip-based application having a memory restriction or an ID card having a memory restriction in which data of a machine readable travel document (MRTD) is recorded. Here, the MRTD is an international travel document such as a passport or visa that contains an image of the owner's eye and contains machine-readable data. In such a case, the apparatus configuration shown in FIG. 9 is implemented as a part of a large computer.

ＭＲＴＤの情報により、パスポートに記載された人物と、パスポートを所持している人物が同一であるかを確認する顔認識技術が使用できる。特に、ＭＲＴＤのフォーマットは標準化され、２行の機械読取領域（machine readable zone：ＭＲＺ）化された必須ＩＤエレメントとともに、写真またはデジタル映像を含む所持者についての確認情報を含んでいることが好ましい。このように、ＭＲＴＤに記録される情報を標準化することで、同じフォーマットを有する他の国のＭＲＴＤのＭＲＺも読み取り可能となる。本発明に係る映像認識方法を使用すれば、写真またはデジタル映像は、ＭＲＴＤの所持者の映像と比較して、両者が一致するか否かを判定できる。なお、ＩＤをさらに確認できるように、データベースに保存されたＩＤ情報（予め登録された顔映像等）と比較することもできる。さらに、本発明は、例えば、運転免許証、学生証、銀行カード、会員カード、および身体認識を利用する他の形態のＩＤにも適用できる。 A face recognition technique for confirming whether the person described in the passport is the same as the person carrying the passport can be used based on the information of the MRTD. In particular, the format of the MRTD is standardized, and preferably includes confirmation information about the owner including a photograph or digital video, together with a mandatory ID element that is made into a two-line machine readable zone (MRZ). Thus, by standardizing the information recorded in the MRTD, MRZ of MRTDs of other countries having the same format can be read. If the image recognition method according to the present invention is used, it is possible to determine whether a photograph or a digital image matches the image of the MRTD holder. In addition, it is possible to compare with ID information (such as a face image registered in advance) stored in the database so that the ID can be further confirmed. Further, the present invention can be applied to other forms of IDs that utilize, for example, a driver's license, student card, bank card, membership card, and body recognition.

さらに、本発明は、コンピュータが読み取り可能な記録媒体に、プログラムコードとして記録することで具現可能である。コンピュータが読み取り可能な記録媒体には、コンピュータシステムによって読み取り可能なデータが保存されるあらゆる種類の記録装置を含む。このような記録媒体の例としては、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ、磁気テープ、フレキシブルディスク、光情報記録装置などがあり、また、キャリアウェーブ（例えばインターネットを介した伝送）の形態で具現されることも含む。また、このような読記録媒体は、ネットワークに接続されたコンピュータシステムに分散してプログラムコードを保存して具現することも可能であり、これにより、分散型コンピュータに本発明を実行させることも可能である。そして、本発明を具現するための機能的なプログラム、コードおよびコードセグメントは、本発明が属する技術分野のプログラマーであれば容易に推論することが可能である。 Furthermore, the present invention can be embodied by recording the program code on a computer-readable recording medium. Computer-readable recording media include all types of recording devices that can store data that can be read by a computer system. Examples of such a recording medium include a ROM (Read Only Memory), a RAM (Random Access Memory), a CD-ROM, a magnetic tape, a flexible disk, and an optical information recording device. (Transmission via). In addition, such a reading / recording medium can be realized by storing the program code in a distributed manner in a computer system connected to a network, thereby allowing the distributed computer to execute the present invention. It is. A functional program, code, and code segment for embodying the present invention can be easily inferred by a programmer in the technical field to which the present invention belongs.

本発明について、前記した実施の形態を用いて説明したが、これは例示的なものに過ぎず、当業者ならば、これから多様な変形および均等な他の実施形態が可能であるということが理解できるであろう。従って、本発明の真の技術的保護範囲は、特許請求の範囲に記載された技術的思想によって定められる。 Although the present invention has been described using the above-described embodiment, this is merely an example, and those skilled in the art will appreciate that various modifications and other equivalent embodiments can be made therefrom. It will be possible. Therefore, the true technical protection scope of the present invention is defined by the technical idea described in the claims.

本発明は、顔認識に係り、生体認識パスポートのように、自動顔認識を用いたＩＤ確認、入出口管理システムのような保安分野にも有効である。また、顔認識は、パターン認識およびコンピュータビジョンの開発を促進させる側面を有する。 The present invention relates to face recognition, and is also effective in the security field such as ID confirmation and entrance / exit management system using automatic face recognition like a biometric recognition passport. Face recognition also has aspects that facilitate the development of pattern recognition and computer vision.

本発明に係る顔認識方法についてのフローチャートである。It is a flowchart about the face recognition method which concerns on this invention. 本発明の実施形態に係る正規化およびサブ領域分割の例を示す図面である。3 is a diagram illustrating an example of normalization and sub-region division according to an embodiment of the present invention. 本発明の実施形態に係る正規化およびサブ領域分割の例を示す図面である。3 is a diagram illustrating an example of normalization and sub-region division according to an embodiment of the present invention. 本発明の実施形態に係る正規化およびサブ領域分割の例を示す図面である。3 is a diagram illustrating an example of normalization and sub-region division according to an embodiment of the present invention. 顔データベースに含まれる顔映像を示す図面である。It is drawing which shows the face image | video contained in a face database. 図３Ａの顔映像についてのガンマ補正結果を示す図面である。It is drawing which shows the gamma correction result about the face image of FIG. 3A. 他の顔映像データベースに含まれる顔映像を示す図面である。It is drawing which shows the face image | video contained in another face image | video database. 図４Ａの顔映像についてのガンマ補正結果を示す図面である。It is drawing which shows the gamma correction result about the face image of FIG. 4A. 図４Ａの顔映像についてのＳＨＥの結果を示す図面である。It is drawing which shows the result of SHE about the face image of FIG. 4A. 図１のＭＩＦＥについての詳細フローチャートである。It is a detailed flowchart about MIFE of FIG. 各サブ領域別に分類する過程、および分類結果を示す図面である。It is drawing which shows the process classified according to each sub-region, and the classification result. 本発明および従来技術による認識結果をそれぞれ比較して示す図面である。6 is a drawing showing comparison results of recognition according to the present invention and the prior art. 顔映像データベースの第１サブセットを示す図面である。4 is a diagram illustrating a first subset of a face image database. 顔映像データベースの第２サブセットを示す図面である。It is drawing which shows the 2nd subset of a face image | video database. 顔映像データベースの第３サブセットを示す図面である。It is drawing which shows the 3rd subset of a face image | video database. 顔映像データベースの第４サブセットを示す図面である。It is drawing which shows the 4th subset of a face image | video database. 本発明に係る顔認識システムについてのブロック図である。It is a block diagram about the face recognition system concerning the present invention. 本発明に係る輝度調節およびＭＩＦＥを行うシステムのブロック図である。1 is a block diagram of a system for performing brightness adjustment and MIFE according to the present invention. FIG.

Explanation of symbols

９０カメラ
９１コンピュータ
９２データベース
１０１、１０３前処理部
１０２、１０４輝度調節部
１０５ＭＩＦＥプロセッサ 90 Camera 91 Computer 92 Database 101, 103 Pre-processing unit 102, 104 Brightness adjustment unit 105 MIFE processor

Claims

A video input device for inputting the first video;
A database for storing multiple reference videos;
The first video and the reference video are divided into a plurality of sub-regions, each sub-region of the first video and the sub-region of the reference video are respectively compared, and based on the comparison result, the first video and a comparing unit for determining a reference image having the highest correlation,
In the including image recognition system,
Each sub-region of the first image is compared with the sub-region of the average image obtained by averaging the plurality of reference images, and the corrected first image is generated by removing the influence of illumination and / or shielding. A correction unit,
The comparison unit compares the corrected sub-region of the first video with the sub-regions of the plurality of reference videos to determine a reference video having the maximum correlation ;
A video recognition system characterized by

The comparison unit includes:
After comparing the sub-region of the first video and the sub-region of the reference video, ID information of a reference video having a sub-region having the largest correlation with the sub-region of the first video is stored, and a predetermined number of After comparing the first video and reference video sub-regions, the stored ID information is searched, and the reference video having the largest number of reference video sub-regions corresponding to the first video sub-region is Determining the reference video having the largest correlation with the first video,
The video recognition system according to claim 1.

The comparison unit includes:
The i-th sub-region of the first video is compared with the i-th sub-region of each reference video, and a reference video including the i-th sub-region of the reference video having the maximum correlation with the i-th sub-region of the first video is obtained. Where i is an integer from 1 to D, and D is the number of sub-regions of the first video ,
The video recognition system according to claim 1.

The video input device
A camera that inputs the first video to the comparison unit, a scanner that digitizes the first video and inputs the first video to the comparison unit, and a first video read from a memory that stores the first video to the comparison unit One of the output memory readers,
The video recognition system according to claim 1.

When the video input device is the memory reader, the memory is included in an ID card;
The video recognition system according to claim 4 .

The ID card is
Including travel document cards,
The video recognition system according to claim 5 .

The comparison unit includes:
A processor that compares each sub-region of the first video and the sub-region of the reference video, and determines a reference video having the maximum correlation with the first video based on the comparison result;
The video recognition system according to claim 1.

The comparison unit includes:
A plurality of processors for comparing at least one sub-region of the first video and a sub-region of the reference video, and determining a reference video having the maximum correlation with the first video based on the comparison result; thing,
The video recognition system according to claim 1.

The comparison unit and the correction unit are configured by a processor;
The video recognition system according to claim 1 .

The comparison unit is a first processor, and the correction unit is a second processor different from the first processor;
The video recognition system according to claim 1 .

The comparison unit includes:
A function of storing a sub-region of the first video in a database;
The video recognition system according to claim 1 .

The database is
A computer having the comparison unit, or a recording medium provided in a computer different from the computer;
The video recognition system according to claim 1.

The first image is
The video is about a non-rigid surface,
The video recognition system according to claim 1.

The non-rigid surface is a face surface;
The video recognition system according to claim 13 .

Each sub-region of the first image has a height h and a width w;
The first image has a height H and a width W;
The number of sub-regions of the first video is int (H / h) × int (W / w);
The video recognition system according to claim 1.

The predetermined number of sub-regions of the first video and the reference video is equal to or less than a total number of sub-regions of the first video;
The video recognition system according to claim 2.

The predetermined number of sub-regions of the first video and the reference video is
Based on a comparison of sub-regions of the first video and the reference video that are formed by the predetermined number, when the other reference video cannot statistically have the maximum correlation with the first video, the reference video Of which one is the number of subregions determined to have the greatest correlation,
The video recognition system according to claim 2.

The comparison unit includes:
Outputting ID information of the reference image having the maximum correlation;
The video recognition system according to claim 1.

The ID information includes a name of a person recorded in the reference video;
The video recognition system according to claim 18 .

The comparison unit includes:
Compare the jth sub-region of the first video and the reference video, and calculate the label 1 for the j-th sub-region as in the following Equation 1.
D-dimensional decision matrix
Is calculated as shown in Equation 2 below .
Here, j _jx is the j-th sub-region of the first video, x _jk is the j-th sub-region of the reference video, D is the number of sub-regions, and N is the reference video Being a number ,
The video recognition system according to claim 1.

A feature confirmation unit for confirming characteristics of at least one of the first images in order to normalize the first image to be compared with the reference image;
The comparison unit receives the normalized first video divided into a plurality of sub-regions;
The video recognition system according to claim 1.

Each reference video sub-region is compared with the average video of the reference video sub-region, and in each reference video sub-region, the influence of illumination and / or occlusion is removed and a corrected reference video is generated. A correction unit,
The comparison unit compares the corrected first video sub-region with the corrected reference video sub-region to determine a reference video having the maximum correlation;
The video recognition system according to claim 1 .

The database stores the corrected reference video, and each reference video sub-region is compared with the average video of the reference video sub-region, respectively, and the effect of illumination and / or occlusion in each reference video sub-region is compared. Remove,
The comparison unit, according to claim 1, wherein by comparing the corrected sub-regions of the first image sub-region and the corrected reference image was, and determines a reference image having the maximum correlation Video recognition system.

The average image is made average brute and on the number of sub-regions of the calculated reference picture by Equation 3,
Here , N is the number of videos in the learning video set ,
The learning video set is the reference video;
The video recognition system according to claim 1 .

A video input device for inputting the first video;
A database for storing multiple reference videos;
Illumination and / or occlusion by acquiring the first video divided into a plurality of sub-regions and comparing each sub-region of the first video with the sub-region of an average video obtained by averaging the plurality of reference videos A correction unit that generates a corrected first video sub-region by removing the influence of the first video, and a corrected first video;
A video correction system comprising:

The correction unit is
Further performing adaptive gamma correction for each sub-region of the first video to generate the corrected sub-region of the first video;
26. The video correction system according to claim 25 .

The correction unit is
And sub-region of the k of the first image, the distance between the sub-region of the k of the average image to minimize the gamma parameter for each sub-region of the test image selected on the basis of the following equation ,
Here, I _k is the kth sub-region of the first video, I _k0 is the k-th sub-region of the average video, I is the first video, and I ₀ Is the average image, c is a constant ,
The video correction system according to claim 26 .

The average video is an average based on the number of sub-regions of the reference video calculated as in Equation 3 below.
Here , N is the number of videos in the learning video set ,
The learning video set is the reference video;
The video correction system according to claim 27 .

The correction unit is
Further performing histogram equalization for each sub-region of the first video,
The video correction system according to claim 27 .

The video input device
A camera that inputs the first video to the comparison unit, a scanner that converts the first video to a number and inputs the number to the comparison unit, and a memory that stores the first video are accommodated and read from the memory One of the memory readers that outputs the first video to the comparison unit;
26. The video correction system according to claim 25 .

A recording unit that records the corrected first video on a recording medium as a part of the corrected video database;
26. The video correction system according to claim 25 .

A comparison for receiving the corrected first image from the correction unit, comparing the first image with the reference image, and determining a reference image having a maximum correlation with the first image based on the comparison. Further comprising a correlation system comprising
26. The video correction system according to claim 25 .

The comparison unit performs one of principal component analysis, linear discriminant analysis, and correlation method to determine the maximum correlation;
The video correction system according to claim 32 , wherein:

A video recognition method for determining a correspondence relationship between an acquired video divided into a plurality of sub-regions and a reference video divided into sub-regions corresponding to the sub-regions,
One of a sub-region of the acquisition image, determining a maximum to become first-correlation relationship between the corresponding sub-region of the reference image,
Determining the other sub-area of the acquisition image, the maximum and becomes the second-correlation relationship between the corresponding sub-region of the reference image,
On the basis of the first-correlation relationship and the second-correlation relation among the reference images, look including the step of selecting one,
Each sub-region of the acquired image is compared with the sub-region of the average image obtained by averaging the reference image, and the corrected acquired image is generated by removing the influence of lighting and / or occlusion, and the corrected Comparing the sub-region of the acquired video with the sub-region of the reference video to determine the first and second correlations ;
A method characterized by.

Further comprising determining a maximum correlation between other sub-regions of the acquired video and corresponding sub-regions of the reference video;
The step of selecting includes
Selecting the reference video that has been determined to have the greatest correlation with respect to the sub-region of the acquired video;
The video recognition method according to claim 34 , wherein:

An image correction method for removing the influence of illumination and / or shielding of an acquired image,
Determining, for each sub-region of the acquired video, a sub-region factor that minimizes a difference between the sub-region and the sub-region of an average image obtained by averaging a plurality of reference images ;
Applying each sub-region factor to a sub-region so as to correct the whole of the acquired video, respectively,
A video correction method comprising:

The sub-region factor is
Including an adaptive gamma correction gamma parameter that minimizes the difference between each sub-region of the acquired video and the sub-region of the average video ;
The video correction method according to claim 36 , wherein:

Further comprising storing the corrected acquired image;
The video correction method according to claim 36 , wherein:

Using the corrected acquired image to determine a correlation between the corrected acquired image and other images;
The video correction method according to claim 36 , wherein:

Using the corrected acquired image comprises:
Determining a correlation between one of the corrected sub-regions of the acquired video and a corresponding sub-region of the other video;
Determining another correlation with the other one of the corrected sub-regions of the acquired video and a corresponding sub-region of the other video;
Determining an overall correspondence between the corrected acquired video and the other video based on the correlation and the other correlation;
40. The image correction method according to claim 39 .

Using the corrected acquired image using principal component analysis or linear discriminant analysis;
40. The image correction method according to claim 39 .

A computer-readable recording medium having recorded thereon a program for causing a computer to execute the video recognition method according to claim 34 .

A computer-readable recording medium having a program recorded thereon for causing the computer to execute the video correction method according to claim 36 .