JP2017211720A

JP2017211720A - Image retrieval device, method, and program

Info

Publication number: JP2017211720A
Application number: JP2016102692A
Authority: JP
Inventors: 豪入江; Takeshi Irie; 勇五十嵐; Isamu Igarashi; 之人渡邉; Yukito Watanabe; 隆行黒住; Takayuki Kurozumi; 杵渕　哲也; Tetsuya Kinebuchi; 哲也杵渕
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-05-23
Filing date: 2016-05-23
Publication date: 2017-11-30
Anticipated expiration: 2036-05-23
Also published as: JP6650829B2

Abstract

PROBLEM TO BE SOLVED: To accurately retrieve the same object robustly against brightness variations of the object from a query image imaging the object having a texture.SOLUTION: A correspondence establishment part 12 performs positioning between a first query image and a second query image, and makes a pixel of the second query image correspond to one or more pixels of the first query image. An image synthesizing part 13 selects a pixel of interest from the first query image. When the corresponding pixel of the second query image corresponding to the selected pixel of interest exists, a brightness value of the selected pixel of interest and the brightness value of the corresponding pixel of the second query image are compared. When the brightness value of the corresponding pixel is lower than the brightness value of the pixel of interest, the pixel value of the corresponding pixel is updated by the pixel value of the corresponding pixel, thereby generating a synthesized image. A retrieval part 14 outputs a reference image within the reference image database which is closest to the synthesized image.SELECTED DRAWING: Figure 2

Description

本発明は、画像検索装置、方法、及びプログラムに係り、特に、テクスチャのある物体を撮影したクエリ画像から、精度よく同一の物体を検索するための画像検索装置、方法、及びプログラムに関する。 The present invention relates to an image search apparatus, method, and program, and more particularly, to an image search apparatus, method, and program for accurately searching for the same object from a query image obtained by photographing a textured object.

物体認識技術の進展が目覚ましい。これまでは、顔・指紋認証やファクトリーオートメーション等、認識する対象や環境が限定されている利用キーポイントが中心的であった。最近では、スマートフォン等の小型な撮像デバイスの普及に伴い、一般利用者が自由な場所や環境で、任意の物体を撮影したような自由撮影画像からの物体の認識に対する産業上の要請も増えてきている。実世界とウェブ世界の商品を相互につなぐＯ２Ｏサービスや、実環境に存在する様々なランドマークを認識して情報を提供する情報案内／ナビゲーションサービスなどへの期待は特に高い。 The progress of object recognition technology is remarkable. Until now, the key points of use, such as face / fingerprint authentication and factory automation, where the recognition target and environment are limited have been the focus. Recently, with the spread of small imaging devices such as smartphones, there has been an increase in industrial demands for recognition of objects from free-shot images such as images of arbitrary objects taken by general users in free places and environments. ing. Expectations are especially high for O2O services that connect products in the real world and the web world, and information guidance / navigation services that recognize various landmarks in the real environment and provide information.

このような新たな用途に供される物体認識技術にはいくつかの形態がありうるが、代表的なものの一つが物体検索である。以下に物体検索の典型的な手続きを概説する。まず、各々の画像の輝度値を解析することで、特徴的な輝度分布を持つ微小なキーポイント（キーポイントなどと呼ばれる）を多数抽出し、各キーポイントをその輝度の変化量によって表現する（局所特徴量と呼ばれる）。次に、互いに異なる二つの画像に含まれる局所特徴量同士の距離を測ることで、異なる画像間のキーポイント同士の対応を取り、多数の対応が存在するペアほど、同一の物体が写っている画像であると見做す。 There can be several forms of object recognition technology for such new applications, but one of the typical ones is object retrieval. The following outlines a typical procedure for object retrieval. First, by analyzing the luminance value of each image, a large number of minute key points (called key points) having a characteristic luminance distribution are extracted, and each key point is expressed by the amount of change in luminance ( Called local features). Next, by measuring the distance between the local feature quantities included in two different images, the correspondence between key points between different images is taken, and the pairs with many correspondences show the same object. Assume it is an image.

事前に認識対象とする物体を撮影した画像（参照画像）のデータベースを構築したとする。このとき、物体検索によって、データベース内の参照画像のうち、撮影したクエリ画像と同一の物体が写っているものを検索することによって、クエリ画像中に存在する物体を特定するのである。 Assume that a database of images (reference images) obtained by capturing an object to be recognized in advance is constructed. At this time, an object present in the query image is specified by searching for a reference image in the database that contains the same object as the captured query image.

物体検索の最大の特徴の一つは、一枚の画像を一つ以上の微小キーポイント（およびそれを記述する局所特徴量）の集合として表現することである。単に同一の物体が写った画像と言っても、どの画像にも同じ位置や姿勢（微小キーポイントの角度）、大きさで写っているわけではなく、画像によってさまざまな写り方で撮影されているのが普通である。まして、一般利用者が自由撮影したような画像においては、事前に物体の写り方を知ることは多くの場合ほぼ不可能である。然るに、画像を記述する特徴量ベクトルは、位置・姿勢・大きさに依らない不変性を持つことが望ましい。 One of the greatest features of object search is to express an image as a set of one or more minute key points (and local feature values describing it). Even if it is just an image of the same object, it is not captured in the same position, posture (small key point angle), and size in every image, but is captured in various ways depending on the image. Is normal. In addition, it is almost impossible to know in advance how to capture an object in an image that a general user freely shoots. However, it is desirable that the feature vector describing the image has invariance independent of the position, orientation, and size.

画像一枚全体を一つのベクトルで表現するような大域的な特徴量では、望ましい不変性を得ることは難しい。例えば各ピクセルの色（ＲＧＢ値）をベクトルに並べたものは、位置・姿勢・大きさいずれに対しても不変ではない。一方、一部の情報を抽象化したもの、例えば、色ヒストグラム等は、位置や姿勢に対する不変性は持ちうるが、大きさに対しては不変的ではない。また、物体の一部が欠けていたりする場合に対しても脆弱であるなど、精度が容易に低下しやすい。 It is difficult to obtain the desired invariance with a global feature amount in which an entire image is represented by a single vector. For example, the arrangement of the color (RGB value) of each pixel in a vector is not invariant with respect to position, orientation, and size. On the other hand, an abstraction of some information, such as a color histogram, can have invariance to position and orientation, but is not invariant to size. In addition, the accuracy is easily lowered, for example, the object is weak even when a part of the object is missing.

一方、物体検索では、微小キーポイントの集合によって画像を表現する。これらはキーポイントの集合であるから、位置に対しては不変である。また、キーポイントを記述する局所特徴量には、姿勢や大きさに対して不変性を持つものが発明されている。例えば非特許文献１に記載のＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ（ＳＩＦＴ）が代表例である。 On the other hand, in object search, an image is expressed by a set of minute key points. Since these are sets of key points, they are invariant to position. In addition, local feature quantities that describe key points have been invented that have invariance with respect to posture and size. For example, Scale Invariant Feature Transform (SIFT) described in Non-Patent Document 1 is a representative example.

以上の通り、物体検索の典型的な手続きによれば、画像を一つ以上のキーポイントの集合によって表現することで、位置・姿勢・大きさによらず、頑健に同一の物体を含む画像を検索することができるのである。 As described above, according to a typical procedure for object search, an image is represented by a set of one or more key points. You can search.

特開２００５−７００２６号公報JP-A-2005-70026

D.G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints ”, International Journal of Computer Vision, pp.91-110, 2004D.G.Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, pp.91-110, 2004 J. Philbin, O. Chum, M. Isard, Josef Sivic and Andrew Zisserman. Object retrieval with large vocabularies and fast spatial matching 1470-1477, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.J. Philbin, O. Chum, M. Isard, Josef Sivic and Andrew Zisserman.Object retrieval with large vocabularies and fast spatial matching 1470-1477, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.

しかしながら、物体検索にも問題がある。通常、実空間にある物体は様々な光源環境下に置かれており、撮影された物体画像は当然のことながらこの光源による反射の影響を受ける。特に、滑らかな面を持つ物体は、鏡面反射を起こすことが知られている。鏡面反射は非常に強い輝度として観測されることが多いため、しばしば物体本来の持つ輝度の変化の度合いを変化させたり、覆い隠したりしてしまう。結果として、物体検索において、誤対応が発生し、異なる物体が検索されてしまうことがあるのである。 However, there is a problem with object search. Usually, an object in real space is placed under various light source environments, and a captured object image is naturally affected by reflection by the light source. In particular, it is known that an object having a smooth surface causes specular reflection. Since specular reflection is often observed as very strong luminance, the degree of luminance change inherent in the object is often changed or obscured. As a result, in the object search, an incorrect correspondence occurs, and a different object may be searched.

このような課題を鑑み、いくつかの発明がなされてきている。 In view of such problems, several inventions have been made.

非特許文献２には、キーポイントの幾何検証に基づく画像検索方法が開示されている。同一の物体であれば、撮影視点の変化を除いて、キーポイントの空間的な分布も同一になるという、合理的な仮定に基づく方法である。まず、異なる画像間でキーポイント同士の対応を取ったのち、複数の対応を集合として見たときの空間的な幾何関係が、特定の線形変換に拘束されているような対応のみを有効な対応とみなすことにより、有効ではない対応を削除する。結果として、有効な対応の数が多い画像同士を、検索結果のより上位にランキングするのである。 Non-Patent Document 2 discloses an image search method based on geometric verification of key points. This is a method based on a reasonable assumption that the same object has the same spatial distribution of key points, except for changes in shooting viewpoint. First, after taking correspondences between keypoints between different images, only correspondences in which the spatial geometric relationship when multiple correspondences are viewed as a set are constrained by a specific linear transformation are effective. Deletes the correspondence that is not valid. As a result, images with a large number of effective correspondences are ranked higher in the search results.

特許文献１には、照明光の影響による鏡面反射を除去する装置が開示されている。事前に、参照画像とこれを撮影した際の照明光成分を（白色板を撮影した画像を用いて）計測しておく。実際に物体を撮影する際には、物体を写したクエリ画像と、白色板を撮影した画像の双方を取得し、クエリ画像を照明光成分に直交する空間に射影することにより、鏡面反射が除去された画像を生成する。 Patent Document 1 discloses an apparatus that removes specular reflection due to the influence of illumination light. In advance, a reference image and an illumination light component when the reference image is photographed are measured (using an image obtained by photographing a white plate). When actually shooting an object, both the query image that captures the object and the image that captured the white plate are acquired, and the query image is projected onto a space orthogonal to the illumination light component to eliminate specular reflection. Generated images.

非特許文献１、非特許文献２に開示されているように、既存の技術は、いずれも単純なキーポイントマッチングに基づく画像検索技術であるが、鏡面反射に起因する誤対応を回避する構成要素を持ち合わせてはいない。先述の通り、キーポイントは輝度の変化に基づいて決定され、また、局所特徴量は輝度変化を記述するものであるから、鏡面反射の影響を直接的に受けてしまい、結果として精度が大きく劣化する点が問題である。 As disclosed in Non-Patent Document 1 and Non-Patent Document 2, the existing techniques are both image search techniques based on simple keypoint matching, but components that avoid erroneous correspondence due to specular reflection. I do not have. As described above, the key points are determined based on the change in luminance, and the local feature value describes the change in luminance. Therefore, the key points are directly affected by specular reflection, resulting in a large deterioration in accuracy. The point is to do.

また、特許文献１に開示されているような鏡面反射除去法を適用すれば、画像を鏡面反射の無い画像へと変換することも可能である。その一方で、特許文献１に記載の技術は、参照画像、クエリ画像双方の撮影時において、常に白色板を撮影するカメラを備えていなければならない。また、常に白色板を撮影した画像を蓄積していかなければならず、必要な画像枚数が増えてしまう。大局的に見れば、既存の鏡面反射除去法は、光源、あるいは、撮影デバイスに、汎用カメラにない特殊な装備を備えていなければ利用できないこと、あるいは、非常に多くの画像を必要とすることなど、または処理時間がかかるなどの理由から、先に述べたような一般利用者が自由撮影した問い合わせ画像に基づいて画像検索をするような場合、必ずしも適しているものではない点が問題である。 Moreover, if the specular reflection removal method as disclosed in Patent Document 1 is applied, an image can be converted into an image having no specular reflection. On the other hand, the technique described in Patent Document 1 must include a camera that always captures a white plate when capturing both a reference image and a query image. In addition, it is necessary to always accumulate images obtained by photographing a white plate, and the number of necessary images increases. From a global perspective, existing specular reflection removal methods cannot be used unless the light source or imaging device is equipped with special equipment that is not available in general-purpose cameras, or requires a very large number of images. If the image search is based on the inquiry image freely photographed by the general user as described above due to reasons such as processing time, etc., it is not necessarily suitable. .

以上、現在に至るまで、特に自由撮影される鏡面反射が起こるような物体画像に対して、効果的に画像検索できる技術は発明されていなかった。 As described above, until now, there has not been invented a technique capable of effectively retrieving an image of an object image that causes specular reflection that is freely photographed.

本発明は、上記問題点を解決するために成されたものであり、テクスチャのある物体を撮影したクエリ画像から、物体の輝度変化に対してロバストに、かつ、精度よく同一の物体を検索することができる画像検索装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and searches for the same object with high accuracy and robustness against a change in luminance of an object from a query image obtained by photographing an object having a texture. An object of the present invention is to provide an image search apparatus, method, and program that can be used.

上記目的を達成するために、第１の発明に係る画像検索装置は、少なくとも一枚以上の参照画像を蓄積した参照画像データベースを備え、同一の物体を異なる視点から撮影した第一のクエリ画像と、少なくとも一枚の第二のクエリ画像との少なくとも二枚のクエリ画像を受け付け、前記第一のクエリ画像に写る物体と同一の物体を含む、前記参照画像データベース中の参照画像を出力する画像検索装置であって、前記第一のクエリ画像と、前記第二のクエリ画像との間で位置合わせを行い、前記第一のクエリ画像の一つ以上の画素に対して、前記第二のクエリ画像の画素を対応させる対応決定部と、前記第一のクエリ画像から注目画素を選択し、前記選択した注目画素に対応する前記第二のクエリ画像の対応画素が存在する場合、前記選択した注目画素の輝度値と、前記第二のクエリ画像の対応画素の輝度値とを比較し、前記対応画素の輝度値が前記注目画素の輝度値よりも低い場合には、前記注目画素の画素値を前記対応画素の画素値によって更新することで、合成画像を生成する画像合成部と、前記合成画像に最も近しい、前記参照画像データベース中の参照画像を出力する検索部と、を含んで構成されている。 In order to achieve the above object, an image search apparatus according to a first invention includes a reference image database in which at least one reference image is accumulated, and a first query image obtained by photographing the same object from different viewpoints, Image search for receiving at least two query images and at least one second query image and outputting a reference image in the reference image database including the same object as the object appearing in the first query image An apparatus for performing alignment between the first query image and the second query image, and for the one or more pixels of the first query image, the second query image A correspondence determination unit that associates the selected pixel, and a target pixel is selected from the first query image, and if there is a corresponding pixel of the second query image corresponding to the selected target pixel, the selection is performed. The luminance value of the target pixel is compared with the luminance value of the corresponding pixel of the second query image. When the luminance value of the corresponding pixel is lower than the luminance value of the target pixel, the pixel value of the target pixel Is updated with the pixel value of the corresponding pixel, and includes an image composition unit that generates a composite image, and a search unit that outputs a reference image in the reference image database that is closest to the composite image. ing.

また、第１の発明に係る画像検索装置において、前記対応決定部は、前記第一のクエリ画像から抽出したキーポイントと、前記第二のクエリ画像から抽出したキーポイントとに基づいて、前記第一のクエリ画像のキーポイントと前記第二のクエリ画像のキーポイントとを対応付け、キーポイント間の対応関係に基づいて、対応する前記第一のクエリ画像のキーポイントの画素及び前記第二のクエリ画像のキーポイントの画素の何れか一方から他方へ変換するための線形変換行列を求め、前記画像合成部は、前記線形変換行列に基づいて、前記第二のクエリ画像の座標を前記第一のクエリ画像の座標に変換し、前記変換された座標上で前記第一のクエリ画像と前記第二のクエリ画像とが重なる領域において、前記第一のクエリ画像から注目画素を選択し、前記選択した注目画素の輝度値と、前記第二のクエリ画像の対応画素の輝度値とを比較し、前記対応画素の輝度値が前記注目画素の輝度値よりも低い場合には、前記注目画素の画素値を前記対応画素の画素値によって更新することで、合成画像を生成するようにしてもよい。 Further, in the image search device according to the first invention, the correspondence determination unit is configured to perform the first operation based on the key points extracted from the first query image and the key points extracted from the second query image. The key points of one query image and the key points of the second query image are associated with each other, and based on the correspondence between the key points, the corresponding key point pixels of the first query image and the second query image A linear transformation matrix for transforming from one of the key image pixels of the query image to the other is obtained, and the image composition unit calculates the coordinates of the second query image based on the linear transformation matrix. In the region where the first query image and the second query image overlap on the converted coordinates, the target pixel from the first query image Select and compare the luminance value of the selected pixel of interest with the luminance value of the corresponding pixel of the second query image, and if the luminance value of the corresponding pixel is lower than the luminance value of the pixel of interest, A composite image may be generated by updating the pixel value of the target pixel with the pixel value of the corresponding pixel.

第２の発明に係る画像合成方法は、少なくとも一枚以上の参照画像を蓄積した参照画像データベースを備え、同一の物体を異なる視点から撮影した第一のクエリ画像と、少なくとも一枚の第二のクエリ画像との少なくとも二枚のクエリ画像を受け付け、前記第一のクエリ画像に写る物体と同一の物体を含む、前記参照画像データベース中の参照画像を出力する画像検索装置における画像検索方法であって、対応決定部が、前記第一のクエリ画像と、前記第二のクエリ画像との間で位置合わせを行い、前記第一のクエリ画像の一つ以上の画素に対して、前記第二のクエリ画像の画素を対応させるステップと、画像合成部が、前記第一のクエリ画像から注目画素を選択し、前記選択した注目画素に対応する前記第二のクエリ画像の対応画素が存在する場合、前記選択した注目画素の輝度値と、前記第二のクエリ画像の対応画素の輝度値とを比較し、前記対応画素の輝度値が前記注目画素の輝度値よりも低い場合には、前記注目画素の画素値を前記対応画素の画素値によって更新することで、合成画像を生成するステップと、検索部が、前記合成画像に最も近しい、前記参照画像データベース中の参照画像を出力するステップと、を含んで実行することを特徴とする。 An image composition method according to a second invention comprises a reference image database storing at least one reference image, a first query image obtained by photographing the same object from different viewpoints, and at least one second image. An image search method in an image search apparatus that receives a query image and at least two query images and outputs a reference image in the reference image database including the same object as the object reflected in the first query image. The correspondence determination unit performs alignment between the first query image and the second query image, and the second query is performed on one or more pixels of the first query image. A step of associating the pixels of the image, and the image composition unit selects a target pixel from the first query image, and there is a corresponding pixel of the second query image corresponding to the selected target pixel If the luminance value of the corresponding pixel is lower than the luminance value of the target pixel, the luminance value of the selected target pixel is compared with the luminance value of the corresponding pixel of the second query image. Updating a pixel value of the target pixel with a pixel value of the corresponding pixel, and generating a composite image; and a step of outputting a reference image in the reference image database closest to the composite image by the search unit And executing.

また、第２の発明に係る画像検索方法において、前記対応決定部が対応させるステップは、前記第一のクエリ画像から抽出したキーポイントと、前記第二のクエリ画像から抽出したキーポイントとに基づいて、前記第一のクエリ画像のキーポイントと前記第二のクエリ画像のキーポイントとを対応付け、キーポイント間の対応関係に基づいて、対応する前記第一のクエリ画像のキーポイントの画素及び前記第二のクエリ画像のキーポイントの画素の何れか一方から他方へ変換するための線形変換行列を求め、前記画像合成部が合成するステップは、前記線形変換行列に基づいて、前記第二のクエリ画像の座標を前記第一のクエリ画像の座標に変換し、前記変換された座標上で前記第一のクエリ画像と前記第二のクエリ画像とが重なる領域において、前記第一のクエリ画像から注目画素を選択し、前記選択した注目画素の輝度値と、前記第二のクエリ画像の対応画素の輝度値とを比較し、前記対応画素の輝度値が前記注目画素の輝度値よりも低い場合には、前記注目画素の画素値を前記対応画素の画素値によって更新することで、合成画像を生成するようにしてもよい。 In the image search method according to the second invention, the step of causing the correspondence determining unit to correspond is based on key points extracted from the first query image and key points extracted from the second query image. The key points of the first query image are associated with the key points of the second query image, and based on the correspondence between the key points, the corresponding key point pixels of the first query image and A step of obtaining a linear transformation matrix for transformation from any one of the key point pixels of the second query image to the other, and the step of synthesizing the image synthesis unit, based on the linear transformation matrix, The coordinates of the query image are converted into the coordinates of the first query image, and the first query image and the second query image overlap with each other on the converted coordinates. The target pixel is selected from the first query image, the brightness value of the selected target pixel is compared with the brightness value of the corresponding pixel of the second query image, and the brightness value of the corresponding pixel is When the luminance value is lower than the luminance value of the target pixel, the synthesized image may be generated by updating the pixel value of the target pixel with the pixel value of the corresponding pixel.

また、第３の発明に係るプログラムは、コンピュータを、上記第１の発明に係る画像検索装置の各部として機能させるためのプログラムである。 A program according to the third invention is a program for causing a computer to function as each part of the image search device according to the first invention.

本発明の画像検索装置、方法、及びプログラムによれば、第一のクエリ画像と、第二のクエリ画像との間で位置合わせを行い、第一のクエリ画像の一つ以上の画素に対して、第二のクエリ画像の画素を対応させ、第一のクエリ画像から注目画素を選択し、選択した注目画素に対応する前記第二のクエリ画像の対応画素が存在する場合、選択した注目画素の輝度値と、第二のクエリ画像の対応画素の輝度値とを比較し、対応画素の輝度値が前記注目画素の輝度値よりも低い場合には、注目画素の画素値を対応画素の画素値によって更新することで、合成画像を生成し、合成画像に最も近しい、参照画像データベース中の参照画像を出力することにより、テクスチャのある物体を撮影したクエリ画像から、物体の輝度変化に対してロバストに、かつ、精度よく同一の物体を検索することができる、という効果が得られる。 According to the image search device, method, and program of the present invention, alignment is performed between the first query image and the second query image, and one or more pixels of the first query image are detected. , When the corresponding pixel of the second query image is selected, the target pixel is selected from the first query image, and there is a corresponding pixel of the second query image corresponding to the selected target pixel, When the luminance value is compared with the luminance value of the corresponding pixel of the second query image and the luminance value of the corresponding pixel is lower than the luminance value of the target pixel, the pixel value of the target pixel is set to the pixel value of the corresponding pixel. To generate a composite image, and output the reference image in the reference image database that is closest to the composite image, so that robustness against changes in the brightness of the object can be obtained from the query image obtained by capturing a textured object. Or , It is possible to accurately find the same object, the effect is obtained that.

本発明の実施の形態に係る画像の位置合わせ及び画像合成の原理を説明する図である。It is a figure explaining the principle of the position alignment and image composition which concern on embodiment of this invention. 本発明の実施の形態に係る画像検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image search device which concerns on embodiment of this invention. 本発明の実施の形態に係る画像検索装置における画像検索処理ルーチンを示すフローチャートである。It is a flowchart which shows the image search process routine in the image search device which concerns on embodiment of this invention. 本発明の実施の形態に係る画像合成処理を説明する図である。It is a figure explaining the image composition process which concerns on embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る原理＞ <Principle according to the embodiment of the present invention>

まず、本発明の実施の形態における原理について、図１を用いて説明する。 First, the principle in the embodiment of the present invention will be described with reference to FIG.

物体の放射輝度は、通常、物体本来の見え方を表す拡散反射成分と物体表面のわずかに外側で起こる鏡面反射成分との和としてモデル化できることが知られており、二色性反射モデルと呼ばれる。完全拡散反射である場合、点光源からの平行光に対し、拡散反射成分は視点（方向）に依らず一定の強度として観測されるが、後者は視点に依存して強度が変化する。また、もし平行光でないような場合には、光源位置と視点に依存して、鏡面反射が現れる位置も変化する。したがって、仮に視点の異なる少なくとも二枚のクエリ画像を得た場合、図１に示す画像１、及び画像２のように、二枚の画像に写る物体上で、鏡面反射領域はそれぞれ異なる位置及び強度で現れる。 It is known that the radiance of an object can be modeled as the sum of a diffuse reflection component that usually represents the object's original appearance and a specular reflection component that occurs slightly outside the object surface, and is called a dichroic reflection model. . In the case of complete diffuse reflection, the diffuse reflection component is observed as a constant intensity regardless of the viewpoint (direction) with respect to the parallel light from the point light source, but the latter changes in intensity depending on the viewpoint. Also, if it is not parallel light, the position at which specular reflection appears also changes depending on the light source position and the viewpoint. Therefore, if at least two query images having different viewpoints are obtained, the specular reflection areas have different positions and intensities on the object appearing in the two images, such as image 1 and image 2 shown in FIG. Appears at

ここで、画像１、及び画像２の二枚の画像を位置合わせする、つまり、二枚の画像に写る物体が丁度重なり合うように、どちらか一方の画像（ここでは画像２）を変換することを考える。異なる二枚の画像の位置合わせには、例えば非特許文献２に記載のキーポイントに基づく位置合わせ等を行うことができる。しばしば鏡面反射領域は一様に高い輝度値を持つことが多く、鏡面反射領域内部は輝度変化に乏しいため、領域内部からはキーポイントは検出されにくい。したがって、先のような二枚の画像に対してキーポイントに基づく位置合わせを適用した場合には、図１のように、主に拡散反射領域（すなわち、本来の物体の見え方）から検出されたキーポイントに基づく位置合わせが行われる。図１中の画像１と画像２を跨る破線は、対応するキーポイント同士を繋いで図示している。 Here, the two images of image 1 and image 2 are aligned, that is, one of the images (here, image 2) is converted so that the objects appearing in the two images are exactly overlapped. Think. For alignment of two different images, for example, alignment based on key points described in Non-Patent Document 2 can be performed. Often, the specular reflection region often has a uniformly high luminance value, and since the inside of the specular reflection region is poor in luminance change, it is difficult to detect key points from within the region. Therefore, when the alignment based on the key points is applied to the two images as described above, it is detected mainly from the diffuse reflection area (that is, the original object appearance) as shown in FIG. Alignment is performed based on the key points. In FIG. 1, a broken line across image 1 and image 2 is illustrated by connecting corresponding key points.

この位置合わせの結果に基づいて、画像２を変換して画像１に位置合わせすることを考えると、位置合わせされた合成画像の各画素については、画像１の画素値と画像２の（対応する）画素値のどちらを取るかについて、任意性が生じるため、より鏡面反射の弱い方の画素値を採りたい。先の二色性反射モデルに基づけば、画像の放射輝度は拡散反射成分と鏡面反射成分の和で表すことができ、鏡面反射成分がある部分の方が輝度が強くなると言えるから、画像１、及び画像２の画素のうち、輝度の低い画素値を持つ方を採用すれば、鏡面反射成分が抑制された画像３を合成することができるのである。 Considering that the image 2 is converted and aligned with the image 1 based on the result of this alignment, the pixel value of the image 1 and the corresponding value of the image 2 are associated with each pixel of the aligned composite image. ) Arbitraryness arises as to which pixel value to take, so we want to take the pixel value with the weaker specular reflection. Based on the previous dichroic reflection model, the radiance of the image can be expressed by the sum of the diffuse reflection component and the specular reflection component, and it can be said that the portion having the specular reflection component has higher luminance. If the pixel having the lower luminance among the pixels of the image 2 is employed, the image 3 in which the specular reflection component is suppressed can be synthesized.

このようにして合成された画像３をクエリ画像として検索することにより、鏡面反射による不明瞭性のない問い合わせが可能となり、より正確な検索が実行できるのである。 By searching the image 3 synthesized in this way as a query image, an inquiry without ambiguity due to specular reflection becomes possible, and a more accurate search can be executed.

以下、図面を参照して本発明の実施の一形態を詳細に説明する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

＜＜全体構成＞＞ << Overall structure >>

図２は、本発明の実施形態に係る画像検索装置１１の構成の一例を示すブロック図である。図２に示す画像検索装置１１は、対応決定部１２と、画像合成部１３と、検索部１４とを備える。 FIG. 2 is a block diagram showing an example of the configuration of the image search apparatus 11 according to the embodiment of the present invention. The image search apparatus 11 illustrated in FIG. 2 includes a correspondence determination unit 12, an image composition unit 13, and a search unit 14.

また、画像検索装置１１は、参照画像データベース１５と通信手段を介して接続されて相互に情報通信し、任意の画像の特徴量を参照画像データベース１５に登録したり、読み出したりすることができる構成を取る。参照画像データベース１５は、例えば、一般的な汎用コンピュータに実装されているファイルシステムによって構成できる。各画像それぞれを一意に識別可能な識別子（例えば、通し番号によるＩＤやユニークな画像ファイル名等）を与えるものとする。また、一般に画像検索時には、各画像を何らかの特徴量によって表現することが普通であるが、これらを記述したファイルについても、当該画像の識別子と関連づけて格納しておく。データベースはＲＤＢＭＳ（ＲｅｌａｔｉｏｎａｌＤａｔａｂａｓｅＭａｎａｇｅｍｅｎｔＳｙｓｔｅｍ）などで実装・構成されていても構わない。その他、メタデータとして、例えば画像の内容を表現するもの（画像のタイトル、概要文、又はキーワード等）、画像のフォーマットに関するもの（画像のデータ量、サムネイル等のサイズ）などを含んでいても構わないが、本発明の実施においては必須ではない。 The image search device 11 is connected to the reference image database 15 via a communication unit and communicates information with each other, and can register or read a feature amount of an arbitrary image in the reference image database 15. I take the. The reference image database 15 can be configured by, for example, a file system mounted on a general general-purpose computer. Assume that an identifier (for example, an ID based on a serial number or a unique image file name) that uniquely identifies each image is given. In general, at the time of image search, each image is usually expressed by some feature quantity, but a file describing these is also stored in association with the identifier of the image. The database may be implemented and configured by RDBMS (Relational Database Management System) or the like. In addition, the metadata may include, for example, data representing the content of the image (image title, summary text, or keyword), data related to the image format (image data amount, thumbnail size, etc.), and the like. Although not required, it is not essential in the practice of the present invention.

参照画像データベース１５は、少なくとも一枚以上の物体を含む参照画像を蓄積したデータベースである。参照画像データベース１５は、画像検索装置１１の内部にあっても外部にあっても構わず、通信手段は任意の公知のものを用いることができるが、本実施の形態においては、外部にあるものとし、通信手段は、インターネット、TCP/IPにより通信するよう接続されているものとする。 The reference image database 15 is a database in which reference images including at least one object are accumulated. The reference image database 15 may be inside or outside the image search apparatus 11, and any known communication means can be used. In the present embodiment, the reference image database 15 is outside. It is assumed that the communication means is connected to communicate via the Internet or TCP / IP.

また、画像検索装置１１が備える各部及び参照画像データベース１５は、演算処理装置、記憶装置等を備えたコンピュータやサーバ等により構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムは画像検索装置１１が備える記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。もちろん、その他いかなる構成要素についても、単一のコンピュータやサーバによって実現しなければならないものではなく、ネットワークによって接続された複数のコンピュータに分散して実現しても構わない。 Further, each unit provided in the image search device 11 and the reference image database 15 may be configured by a computer or server provided with an arithmetic processing device, a storage device, and the like, and the processing of each unit may be executed by a program. This program is stored in a storage device included in the image search device 11, and can be recorded on a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory, or provided through a network. Of course, any other components need not be realized by a single computer or server, but may be realized by being distributed to a plurality of computers connected by a network.

＜＜処理部＞＞ << Processor >>

本実施の形態における画像検索装置１１の各処理部について説明する。ここでは、同一の物体を異なる視点から撮影した第一のクエリ画像１６と第二のクエリ画像１７の二枚が入力されたと仮定して説明する。なお、後述するように、二枚以上の第二のクエリ画像１７が入力された場合においても適用可能である。この場合、複数枚の第二のクエリ画像１７が入力されることになる。 Each processing unit of the image search apparatus 11 in the present embodiment will be described. Here, description will be made on the assumption that two images, that is, a first query image 16 and a second query image 17 obtained by photographing the same object from different viewpoints are input. As will be described later, the present invention can also be applied when two or more second query images 17 are input. In this case, a plurality of second query images 17 are input.

対応決定部１２は、外部から第一のクエリ画像１６と、第二のクエリ画像１７とが与えられると、第一のクエリ画像１６と、第二のクエリ画像１７との間で位置合わせを行い、第一のクエリ画像１６の一つ以上の画素に対して、第二のクエリ画像１７の画素を対応させ、対応結果を画像合成部１３に出力する。ここでは、第一のクエリ画像１６から抽出したキーポイントと、第二のクエリ画像１７から抽出したキーポイントとに基づいて、第一のクエリ画像１６のキーポイントと第二のクエリ画像１７のキーポイントとを対応付け、キーポイント間の対応関係に基づいて、第二のクエリ画像１７のキーポイントの画素から対応する第一のクエリ画像１６のキーポイントの画素へ変換するための線形変換行列を求める。なお、第一のクエリ画像１６のキーポイントの画素から対応する第二のクエリ画像１７のキーポイントの画素へ変換するための線形変換行列としてもよい。 When the first query image 16 and the second query image 17 are given from the outside, the correspondence determination unit 12 performs alignment between the first query image 16 and the second query image 17. The pixels of the second query image 17 are associated with one or more pixels of the first query image 16, and the correspondence result is output to the image composition unit 13. Here, based on the key points extracted from the first query image 16 and the key points extracted from the second query image 17, the key points of the first query image 16 and the keys of the second query image 17. A linear transformation matrix for associating points with each other and converting the key point pixels of the second query image 17 to the corresponding key point pixels of the first query image 16 based on the correspondence between the key points. Ask. Note that a linear transformation matrix for converting key point pixels of the first query image 16 to corresponding key point pixels of the second query image 17 may be used.

なお、クエリ画像が三枚以上入力された場合には、クエリ画像のうちの一枚（本実施の形態の例においては第一のクエリ画像１６）に対して、ほかの全ての第二のクエリ画像１７についてキーポイントに基づく位置合わせを実行し、対応するキーポイントの組の数が最も多かったクエリ画像を採用すればよい。 When three or more query images are input, one of the query images (first query image 16 in the example of the present embodiment) is used for all other second queries. The image 17 may be registered based on key points, and the query image having the largest number of corresponding key point pairs may be employed.

画像合成部１３は、第一のクエリ画像１６から注目画素を選択し、選択した注目画素に対応する第二のクエリ画像１７の対応画素が存在する場合、選択した注目画素の輝度値と、第二のクエリ画像１７の対応画素の輝度値とを比較し、対応画素の輝度値が注目画素の輝度値よりも低い場合には、注目画素の画素値を対応画素の画素値によって更新することで、合成画像を生成し、検索部１４に出力する。ここで、対応決定部１２で求められた線形変換行列に基づいて、第二のクエリ画像１７の座標を第一のクエリ画像１６の座標に変換することで第一のクエリ画像１６と第二のクエリ画像１７とを同一座標上に張り合わせるが、注目画素の選択は、変換された座標上で第一のクエリ画像１６と第二のクエリ画像１７とが重なる領域において行う。 The image composition unit 13 selects a target pixel from the first query image 16, and if there is a corresponding pixel of the second query image 17 corresponding to the selected target pixel, the brightness value of the selected target pixel, The brightness value of the corresponding pixel in the second query image 17 is compared, and if the brightness value of the corresponding pixel is lower than the brightness value of the target pixel, the pixel value of the target pixel is updated with the pixel value of the corresponding pixel. Then, a composite image is generated and output to the search unit 14. Here, the first query image 16 and the second query image 16 are converted into the coordinates of the first query image 16 by converting the coordinates of the second query image 17 based on the linear transformation matrix obtained by the correspondence determination unit 12. Although the query image 17 is pasted on the same coordinates, the target pixel is selected in an area where the first query image 16 and the second query image 17 overlap on the converted coordinates.

検索部１４は、生成された合成画像を新たなクエリ画像として参照画像データベース１５に問い合わせを実行し、参照画像データベース１５中の参照画像のうち、合成画像に最も近しい参照画像を検索結果１８として出力する。 The search unit 14 makes an inquiry to the reference image database 15 using the generated composite image as a new query image, and outputs a reference image closest to the composite image among the reference images in the reference image database 15 as a search result 18. To do.

＜＜処理概要＞＞ << Process overview >>

次に、本実施の形態における画像検索装置１１の処理について説明する。図３は、処理の流れを示すフローチャートである。 Next, processing of the image search apparatus 11 in the present embodiment will be described. FIG. 3 is a flowchart showing the flow of processing.

まず、ステップＳ３０１では、外部から二枚以上のクエリ画像が与えられた場合、一枚を第一のクエリ画像１６、もう一枚を第二のクエリ画像１７として、対応決定部１２が、第一のクエリ画像１６に対してキーポイントに基づく位置合わせを行い、当該第一のクエリ画像１６の画素のそれぞれに対して、第二のクエリ画像１７の画素を対応させ、対応結果を画像合成部１３に出力する。 First, in step S301, when two or more query images are given from the outside, the correspondence determination unit 12 sets the first query image 16 as the first query image and the second query image 17 as the first query image. The query image 16 is aligned based on the key points, the pixels of the second query image 17 are associated with the pixels of the first query image 16, and the correspondence result is displayed as the image composition unit 13. Output to.

続いて、ステップＳ３０２では、画像合成部１３が、第一のクエリ画像１６の各画素の輝度値が、当該画素に対応する第二のクエリ画像１７の画素の輝度値よりも低い場合に、第一のクエリ画像１６の当該画素の値を、第二のクエリ画像１７の対応する画素の画素値によって置き換えることで、合成画像を生成し、検索部１４に出力する。 Subsequently, in step S302, the image composition unit 13 determines that the brightness value of each pixel of the first query image 16 is lower than the brightness value of the pixel of the second query image 17 corresponding to the pixel. By replacing the value of the pixel of one query image 16 with the pixel value of the corresponding pixel of the second query image 17, a composite image is generated and output to the search unit 14.

続いて、ステップＳ３０３では、検索部１４が、合成画像をクエリ画像として参照画像データベース１５に問い合わせを行い、マッチした参照画像を検索結果として出力する。 Subsequently, in step S303, the search unit 14 inquires the reference image database 15 using the composite image as a query image, and outputs the matched reference image as a search result.

以上の処理により、入力されたクエリ画像に対して、同一の物体を含む参照画像を検索することができる。 Through the above processing, a reference image including the same object can be searched for the input query image.

＜＜各処理の処理詳細＞＞ << Details of each process >>

以降、各処理の詳細処理について、本実施形態における一例を説明する。 Hereinafter, an example of the detailed processing of each processing will be described in the present embodiment.

［対応決定処理］ [Correspondence decision processing]

まず、対応決定部１２において、二枚のクエリ画像間での対応を求める処理について説明する。 First, a process for obtaining a correspondence between two query images in the correspondence determination unit 12 will be described.

クエリ画像間の対応を求めるには、例えば、非特許文献１や非特許文献２などに記載のキーポイントマッチングに基づく方法を採用する。これらに限らず、その他の公知のキーポイントマッチング法を採用しても構わない。 In order to obtain the correspondence between query images, for example, a method based on key point matching described in Non-Patent Document 1, Non-Patent Document 2, or the like is employed. Not limited to these, other known key point matching methods may be adopted.

ここでは、第一のクエリ画像１６から抽出されたあるキーポイントをＱ_ｉ、第二のクエリ画像１７から抽出されたキーポイントをＲ_ｊと表すことにする。各キーポイントは、特徴量ベクトルによって記述する。任意の特徴量ベクトルを用いても構わないが、好ましくは非特許文献１に記載のＳＩＦＴなどの局所特徴量を用いる。 Here, a certain key point extracted from the first query image 16 is expressed as Q _i , and a key point extracted from the second query image 17 is expressed as R _j . Each key point is described by a feature vector. An arbitrary feature quantity vector may be used, but a local feature quantity such as SIFT described in Non-Patent Document 1 is preferably used.

キーポイントＱ_ｉを記述する特徴量ベクトルをｖ_ｉ、Ｒ_ｊを記述する特徴量ベクトルをｗ_ｊと表すとする。このとき、キーポイント同士の特徴量の距離ｄｉｓｔ（Ｑ_ｉ，Ｒ_ｊ）を次式により求める。 The feature vector describing the key points _{Q i} _v i, a feature vector that describes the _{R j} and expressed as _{w j.} At this time, the distance dist (Q _i , R _j ) between the feature points of the key points is obtained by the following equation.

続いて、求めたキーポイント間の特徴量の距離に基づいて、各キーポイントの組が対応しているか否かを判定する。あるキーポイントＲ_ｊに着目したとき、これに最も近いキーポイントがＱ_ｋ、その次に近いキーポイントがＱ_ｌであったとする。このとき、下記の条件を満たすとき、Ｒ_ｊとＱ_ｋが対応していると判定する。 Subsequently, based on the obtained feature distance between key points, it is determined whether or not each set of key points corresponds. When focusing on a certain key point R _j , it is assumed that the closest key point is Q _k and the next closest key point is Q _l . At this time, when the following conditions are satisfied, it is determined that R _j corresponds to Q _k .

ここで、Ｔは事前に決めておくパラメータであり、０＜Ｔ≦１の任意の値を取ってよい。例えばＴ＝０．８などとすればよい。 Here, T is a parameter determined in advance, and may take an arbitrary value of 0 <T ≦ 1. For example, T = 0.8 may be set.

以上の計算を全てのキーポイントの組に対して実施することで、対応するキーポイントを求めることが可能である。 By executing the above calculation for all keypoint sets, the corresponding keypoints can be obtained.

なお、このように求めた対応は重複を許す。つまり、Ｒ_ｊに着目している場合、あるキーポイントＱ_ｋに対して、着目している側のクエリ画像の複数のキーポイントＲ_ｊが対応する可能性がある。逆に、Ｑ_ｋに着目している場合、あるキーポイントＲ_ｊに対して、着目して側のクエリ画像の複数のキーポイントＱ_ｋが対応する可能性がある。自然に考えれば、物体が同一であるにも関わらず、物体のある一つのキーポイントに対して、別の見え方をした物体の複数のキーポイントが対応することは考えにくい。そこで、対応の重複を許さないように、後処理を導入しても構わない。例えば、一度上記方法によって全ての対応を求めた後、第二のクエリ画像１７の複数のキーポイントと対応している第一のクエリ画像１６のキーポイントを列挙する。続いて、当該第一のクエリ画像１６のキーポイントと対応している第二のクエリ画像１７のキーポイントのうち、最も距離の近いものだけを有効な対応であると判断し、それ以外の組については対応を棄却する。以上のような処理を導入することにより、全てのキーポイントは必ず一対一対応するように制約することができる。このようにして二枚のクエリ画像間でキーポイント同士の対応を取ることができる。 In addition, the correspondence obtained in this way allows duplication. That is, when focusing on R _j , there is a possibility that a plurality of key points R _j of the query image on the focused side correspond to a certain key point Q _k . Conversely, if attention is paid to Q _k, for a given keypoint R _j, a plurality of key points Q _k side of the query image by focusing might correspond. Naturally, it is unlikely that a plurality of key points of an object with different appearance correspond to one key point of the object even though the objects are the same. Therefore, post-processing may be introduced so as not to allow duplicate correspondence. For example, after all correspondence is obtained by the above method, the key points of the first query image 16 corresponding to the plurality of key points of the second query image 17 are listed. Subsequently, among the key points of the second query image 17 corresponding to the key points of the first query image 16, only the closest one is determined to be an effective correspondence, and the other groups The response will be rejected. By introducing the processing as described above, it is possible to constrain all key points to always correspond one-to-one. In this way, correspondence between key points can be taken between two query images.

続いて、求めたキーポイント間の対応関係から、第一のクエリ画像１６上の注目画素に対応する第二のクエリ画像１７の対応画素を対応づけることによって位置合わせを行う。 Subsequently, alignment is performed by associating corresponding pixels of the second query image 17 corresponding to the target pixel on the first query image 16 from the obtained correspondence relationship between the key points.

もし写る物体が剛体であるならば、クエリ画像中の物体と参照画像中の物体は異なる視点から撮影されているにすぎず、現実的な仮定の下、この視点変動は線形変換でモデル化できる。言い換えれば、同じ物体上に存在するキーポイントに限れば、第一のクエリ画像１６のキーポイントの座標と、第二のクエリ画像１７のキーポイントの座標は線形変換により表現できるということになる。このような線形変換は、仮に線形変換がアフィン変換であると仮定する場合には３組の物体上のキーポイントの組があれば一意に求めることができ、また、射影変換と仮定する場合には４組の物体上のキーポイントの組があれば一意に求めることができる。 If the object in the image is a rigid body, the object in the query image and the object in the reference image are only taken from different viewpoints, and this viewpoint variation can be modeled by linear transformation under realistic assumptions. . In other words, as long as the key points exist on the same object, the coordinates of the key points of the first query image 16 and the coordinates of the key points of the second query image 17 can be expressed by linear transformation. Such a linear transformation can be obtained uniquely if there is a set of key points on three sets of objects, assuming that the linear transformation is an affine transformation, and if it is assumed to be a projective transformation. Can be uniquely determined if there are a set of key points on four sets of objects.

しかし、現実的には二枚のクエリ画像いずれについても、全てのキーポイントが物体上に存在するとは限らないため、物体上にあるキーポイントの組を正確にサンプリングしながら、線形変換を求めるような処理を構成する必要がある。幸運にも、このような条件で線形変換を求める手法は公知の有効な方法が存在する。例えば、参考文献１に記載のＲＡＮＳＡＣアルゴリズムや参考文献２に記載のＬＯ−ＲＡＮＳＡＣアルゴリズムが好適である。 However, in reality, not all keypoints are present on the object in any of the two query images, so linear transformation is obtained while accurately sampling the set of keypoints on the object. Must be configured. Fortunately, there are known and effective methods for obtaining linear transformation under such conditions. For example, the RANSAC algorithm described in Reference 1 and the LO-RANSAC algorithm described in Reference 2 are suitable.

［参考文献１］ M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Comm. ACM, vol. 24, no. 6, pp. 381-395, 1981. [Reference 1] MA Fischler and RC Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Comm. ACM, vol. 24, no. 6, pp. 381-395, 1981 .

［参考文献２］ O. Chum, J. Matas, and S. Obdrzalek, “Enhancing RANSAC by generalized model optimization,” Proceedings of Asian Conference on Computer Vision, pp. 812-817, 2004. [Reference 2] O. Chum, J. Matas, and S. Obdrzalek, “Enhancing RANSAC by generalized model optimization,” Proceedings of Asian Conference on Computer Vision, pp. 812-817, 2004.

このような方法により、第二のクエリ画像１７の物体上のキーポイントの画素から、第一のクエリ画像１６の物体上のキーポイントの注目画素への変換を与える線形変換行列（行列の大きさは３×３）を求めることができる。なお、この線形変換行列を使うことで、物体上に存在する画素に限り、二枚のクエリ画像の任意の画素の対応を取ることができる。 By such a method, a linear transformation matrix (matrix size) that gives a transformation from the key point pixel on the object of the second query image 17 to the target pixel of the key point on the object of the first query image 16. 3 × 3) can be obtained. Note that by using this linear transformation matrix, it is possible to take correspondence between arbitrary pixels of two query images only for pixels existing on the object.

以上のようにして、線形変換行列の形で異なる二枚のクエリ画像間の対応を決定することができる。 As described above, the correspondence between two different query images in the form of a linear transformation matrix can be determined.

なお、求めた線形変換行列を用いることで、対応するキーポイントの組から物体上のキーポイントの組の数をより正確に計数することが可能である。仮に、第二のクエリ画像１７のキーポイントの座標を（ｘ，ｙ）と表すとすると、このキーポイントに対応する第一のクエリ画像１６のキーポイントの座標の推定値（ｘ_ｅ，ｙ_ｅ）は Note that by using the obtained linear transformation matrix, it is possible to more accurately count the number of keypoint sets on the object from the corresponding set of keypoints. If the coordinate of the key point of the second query image 17 is expressed as (x, y), the estimated value (x _e , y _e) of the key point coordinate of the first query image 16 corresponding to this key point. )

と求めることができる。ここで、Ｈは先に求めた線形変換行列である。一方、第一のクエリ画像１６のキーポイントの真の座標は既知であるから、この真の座標を（ｘ’，ｙ’）と表すとすると、仮に（ｘ，ｙ）、（ｘ’，ｙ’）が双方物体上に存在するキーポイントの組であるならば、（ｘ’，ｙ’）は（ｘ_ｅ，ｙ_ｅ）によって正確に推定されているはずである。そこで、これらの距離（誤差）が一定の閾値以内に収まっているか否かによって、その対応が物体から抽出されたキーポイント同士の対応であるか否かを判定する。閾値は任意の値としてよいが、例えば、１、９、３６、６４など、小さい値に設定すればよい。特に、三枚以上のクエリ画像が入力された場合、このようにして残されたキーポイントの組の数が多いペアを判定して、以降の処理を進めるのが好適である。 It can be asked. Here, H is the linear transformation matrix obtained previously. On the other hand, since the true coordinates of the key points of the first query image 16 are known, if the true coordinates are represented as (x ′, y ′), (x, y), (x ′, y) If ') is a set of keypoints present on both objects, (x', y ') should be accurately estimated by (x _e , y _e ). Therefore, whether or not the correspondence is between the key points extracted from the object is determined depending on whether or not these distances (errors) are within a certain threshold. The threshold value may be an arbitrary value, but may be set to a small value such as 1, 9, 36, 64, for example. In particular, when three or more query images are input, it is preferable to determine a pair having a large number of keypoint groups left in this way and proceed with the subsequent processing.

［画像合成処理］ [Image composition processing]

続いて、画像合成部１３における、対応決定部１２による対応づけに基づいて、二枚のクエリ画像から合成画像を生成する処理について説明する。 Next, a process of generating a composite image from two query images based on the association by the correspondence determination unit 12 in the image composition unit 13 will be described.

まず、先の線形変換行列に基づいて、第二のクエリ画像１７を第一のクエリ画像１６上の座標に変換し、これらを張り合わせる。すると、図４に示す例のように、物体位置で二枚の画像が張り合わせることができる。仮に、この張り合わせられた画像を一枚の画像とみなすとすると、二枚の画像が重なっている領域では、もともとの第一のクエリ画像１６の画素値と、もともとの第二のクエリ画像１７の画素値の、二つの画素値を取りうる。好ましくは、鏡面反射が起こっていない、あるいは、より弱い方の画像の画素値を採用したい。 First, based on the previous linear transformation matrix, the second query image 17 is transformed into coordinates on the first query image 16, and these are pasted together. Then, as in the example shown in FIG. 4, two images can be bonded at the object position. If this combined image is regarded as a single image, the pixel value of the original first query image 16 and the original second query image 17 of the area where the two images overlap are displayed. Two pixel values of the pixel value can be taken. Preferably, the pixel value of the image with no specular reflection or weaker image is desired to be adopted.

本発明の実施形態では、二色性反射モデルに基づいて、画素値を選択することで、この条件に合う画素値の選択を実現する。二色性反射モデルは、画像の放射輝度は拡散反射成分と鏡面反射成分の和で表すことができるという仮定を表現したモデルである。完全拡散反射する同一の物体であれば、同一光源下において、同一の位置での拡散反射成分は視点によらずほぼ同じ値を取ると仮定できるから、もし、二つのクエリ画像間で大きな輝度差が生じているとすれば、それは概ね鏡面反射成分によるものと考えてよいであろう。したがって、ある画素に着目した際に、二枚のクエリ画像の画素のうち、輝度の低い方の画素値を採用する。 In the embodiment of the present invention, selection of a pixel value that meets this condition is realized by selecting a pixel value based on a dichroic reflection model. The dichroic reflection model is a model expressing the assumption that the radiance of an image can be expressed by the sum of a diffuse reflection component and a specular reflection component. If it is the same object that is completely diffusely reflected, it can be assumed that the diffuse reflection component at the same position under the same light source takes almost the same value regardless of the viewpoint, so if there is a large luminance difference between the two query images If this occurs, it can be considered that it is largely due to the specular reflection component. Therefore, when attention is paid to a certain pixel, the pixel value having the lower luminance of the two query image pixels is adopted.

このようにして、図４に示すような、鏡面反射成分が抑制された合成画像を得ることができるのである。 In this way, a composite image in which the specular reflection component is suppressed as shown in FIG. 4 can be obtained.

［検索部］ [Search section]

最後に、検索部１４において、画像合成部１３で生成した合成画像をクエリ画像として、参照画像データベース１５に問い合わせを行い、クエリ画像と同一の物体を含む参照画像のみを検索する。 Finally, the search unit 14 makes an inquiry to the reference image database 15 using the synthesized image generated by the image synthesizing unit 13 as a query image, and searches only for a reference image including the same object as the query image.

このような検索を実行する方法には、数多くの公知の方法が存在する。好ましくは、物体の写り方（位置・姿勢・大きさ）に依らず同一物体画像を検索可能な、物体検索に基づく方法を採用する。例えば、参考文献３や参考文献４などの方法を用いるのが好適である。 There are many known methods for executing such a search. Preferably, a method based on object search is employed that can search for the same object image regardless of how the object is captured (position, posture, and size). For example, it is preferable to use methods such as Reference 3 and Reference 4.

［参考文献３］特開２０１６−１８４４４号公報 [Reference 3] Japanese Patent Application Laid-Open No. 2016-18444

［参考文献４］G. Tolias, Y. Avrithis, and H. J´egou, “Image search with selective match kernels: aggregation across single and multiple images,” International Journal of Computer Vision, vol. 116, pp. 247-261, 2016. [Reference 4] G. Tolias, Y. Avrithis, and H. J´egou, “Image search with selective match kernels: aggregation across single and multiple images,” International Journal of Computer Vision, vol. 116, pp. 247- 261, 2016.

以上説明したように、本発明の実施の形態に係る画像検索装置によれば、第一のクエリ画像１６と、第二のクエリ画像１７との間で位置合わせを行い、第一のクエリ画像１６の一つ以上の画素に対して、第二のクエリ画像１７の画素を対応させ、第一のクエリ画像１６から注目画素を選択し、選択した注目画素に対応する前記第二のクエリ画像１７の対応画素が存在する場合、選択した注目画素の輝度値と、第二のクエリ画像１７の対応画素の輝度値とを比較し、対応画素の輝度値が前記注目画素の輝度値よりも低い場合には、注目画素の画素値を対応画素の画素値によって更新することで、合成画像を生成し、合成画像に最も近しい、参照画像データベース中の参照画像を出力することにより、テクスチャのある物体を撮影したクエリ画像から、物体の輝度変化に対してロバストに、精度よく同一の物体を検索することができる。 As described above, according to the image search device according to the embodiment of the present invention, the first query image 16 and the second query image 17 are aligned, and the first query image 16 is registered. The pixel of the second query image 17 is made to correspond to one or more of the pixels, the pixel of interest is selected from the first query image 16, and the pixel of the second query image 17 corresponding to the selected pixel of interest is selected. When the corresponding pixel exists, the luminance value of the selected target pixel is compared with the luminance value of the corresponding pixel of the second query image 17, and the luminance value of the corresponding pixel is lower than the luminance value of the target pixel. Shoots a textured object by generating a composite image by updating the pixel value of the target pixel with the pixel value of the corresponding pixel and outputting the reference image in the reference image database that is closest to the composite image Query image Et al, robust to luminance change of an object, accuracy can search the same object.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

１１画像検索装置
１２対応決定部
１３画像合成部
１４検索部
１５参照画像データベース
１６第一のクエリ画像
１７第二のクエリ画像 DESCRIPTION OF SYMBOLS 11 Image search device 12 Correspondence determination part 13 Image composition part 14 Search part 15 Reference image database 16 1st query image 17 2nd query image

Claims

A reference image database storing at least one reference image is provided, and at least two query images of a first query image obtained by photographing the same object from different viewpoints and at least one second query image are provided. An image search device that receives and outputs a reference image in the reference image database, including the same object as that shown in the first query image,
Alignment is performed between the first query image and the second query image, and pixels of the second query image are made to correspond to one or more pixels of the first query image. A response determination unit;
When a target pixel is selected from the first query image, and there is a corresponding pixel of the second query image corresponding to the selected target pixel, the luminance value of the selected target pixel and the second query When the luminance value of the corresponding pixel is lower than the luminance value of the target pixel, the pixel value of the target pixel is updated with the pixel value of the corresponding pixel. An image composition unit for generating a composite image;
A search unit that outputs a reference image in the reference image database that is closest to the composite image;
Image search device including

The correspondence determination unit is configured to determine the keypoint of the first query image and the second query based on the keypoint extracted from the first query image and the keypoint extracted from the second query image. The key points of the image are associated with each other, and the key point pixels of the first query image and the key point pixels of the second query image are associated with each other based on the correspondence between the key points. Find a linear transformation matrix to convert to
The image composition unit converts the coordinates of the second query image into the coordinates of the first query image based on the linear transformation matrix, and the first query image and the coordinates on the converted coordinates In a region where the second query image overlaps, the target pixel is selected from the first query image, and the luminance value of the selected target pixel is compared with the luminance value of the corresponding pixel of the second query image. The composite image is generated by updating the pixel value of the target pixel with the pixel value of the corresponding pixel when the luminance value of the corresponding pixel is lower than the luminance value of the target pixel. Image search device.

A reference image database storing at least one reference image is provided, and at least two query images of a first query image obtained by photographing the same object from different viewpoints and at least one second query image are provided. An image search method in an image search apparatus for receiving and outputting a reference image in the reference image database including the same object as the object shown in the first query image,
The correspondence determining unit performs alignment between the first query image and the second query image, and the second query image is applied to one or more pixels of the first query image. Matching the pixels of
When the image composition unit selects a target pixel from the first query image and there is a corresponding pixel of the second query image corresponding to the selected target pixel, a luminance value of the selected target pixel; The brightness value of the corresponding pixel of the second query image is compared. If the brightness value of the corresponding pixel is lower than the brightness value of the target pixel, the pixel value of the target pixel is set to the pixel value of the corresponding pixel. Generating a composite image by updating with:
A search unit that outputs a reference image in the reference image database that is closest to the composite image;
Image search method including

The step of causing the correspondence determination unit to correspond is based on the keypoint extracted from the first query image and the keypoint extracted from the second query image, and the keypoint of the first query image and the keypoint The key points of the second query image are associated with each other, and based on the correspondence between the key points, any of the corresponding key point pixels of the first query image and the key points of the second query image Find a linear transformation matrix to convert from one to the other,
The step of combining by the image combining unit converts the coordinates of the second query image into the coordinates of the first query image based on the linear conversion matrix, and the first coordinate image on the converted coordinates. In a region where the query image and the second query image overlap, the target pixel is selected from the first query image, the luminance value of the selected target pixel, and the luminance value of the corresponding pixel of the second query image When the luminance value of the corresponding pixel is lower than the luminance value of the target pixel, the synthesized image is generated by updating the pixel value of the target pixel with the pixel value of the corresponding pixel. Item 4. The image search method according to Item 3.

The program for functioning a computer as each part of the image search device of Claim 1 or Claim 2.