JP2009536499A

JP2009536499A - System and method for reconstructing a three-dimensional object from a two-dimensional image

Info

Publication number: JP2009536499A
Application number: JP2009509539A
Authority: JP
Inventors: ニジム，ユセフ，ウェイセフ; アイザット，アイザット，ヘクマット
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2006-05-05
Filing date: 2006-10-25
Publication date: 2009-10-08
Also published as: CN101432776B; WO2007130122A3; CA2650557A1; CN101432776A; CA2650557C; WO2007130122A2; EP2016559A2

Abstract

２次元（２Ｄ）画像を使用したシーンの３次元（３Ｄ）取得及びモデリングのシステム及び方法が提供される。本システム及び方法は、あるシーンの第一及び第二の画像を取得し、平滑化機能を第一の画像に適用し（２０２）、たとえばシーンにおけるオブジェクトのコーナ及びエッジといったオブジェクトの特徴点をさらに目に見えるようにし、少なくとも２つの特徴検出機能第一の画像に適用し、第一の画像におけるオブジェクトの特徴点を検出し（２０４，２０８）、少なくとも２つの特徴検出機能の出力を結合して追跡されるべきオブジェクトの特徴点を選択し（２１０）、平滑化機能を第二の画像に適用し（２０６）、追跡機能を第二の画像に適用して選択されたオブジェクトの特徴点を追跡し（２１４）、追跡機能の出力からシーンの３次元モデルを再構成する（２１８）。
Systems and methods for three-dimensional (3D) acquisition and modeling of scenes using two-dimensional (2D) images are provided. The system and method obtains first and second images of a scene and applies a smoothing function to the first image (202) to further determine object feature points such as object corners and edges in the scene. Visible and applied to the first image with at least two feature detection functions, detects feature points of the object in the first image (204, 208), and combines the outputs of the at least two feature detection functions Select feature points of the object to be tracked (210), apply a smoothing function to the second image (206), and apply a tracking function to the second image to track the feature points of the selected object Then, the three-dimensional model of the scene is reconstructed from the output of the tracking function (218).

Description

本発明は、一般に３次元オブジェクトのモデリングに関し、より詳細には、平滑化機能を含むハイブリッド特徴検出及び追跡を使用した２次元（２Ｄ）画像から３次元（３Ｄ）情報を取得するシステム及び方法に関する。
本出願は、2006年5月5日に米国で提出された仮出願60/798087の35U.S.C.§119の利益を特許請求するものである。 The present invention relates generally to 3D object modeling, and more particularly to a system and method for obtaining 3D (3D) information from 2D (2D) images using hybrid feature detection and tracking including smoothing functions. .
This application claims the benefit of 35U.SC §119 of provisional application 60/798087 filed in the United States on May 5, 2006.

シーンがフィルム化されるとき、結果的に得られるビデオ系列は、シーンの３次元（３Ｄ）の形状に関する暗黙の情報（implicit information）を含む。適切な人間の知覚のため、この暗黙の情報は十分である一方で、多くの用途について、３Ｄシーンの正確な形状が必要とされる。これらの用途の１つのカテゴリは、たとえば新たなシーンのビューの生成又は工業検査の用途について３Ｄ形状の再構成において、洗練されたデータ処理技術が使用されるときである。 When the scene is filmed, the resulting video sequence contains implicit information about the three-dimensional (3D) shape of the scene. While this implicit information is sufficient for proper human perception, the exact shape of the 3D scene is required for many applications. One category of these applications is when sophisticated data processing techniques are used, for example, in the generation of new scene views or 3D shape reconstruction for industrial inspection applications.

３Ｄ情報の回復は、暫くの間、活発な研究分野である。文献において、たとえばレーザレンジファインダを使用して直接的に３Ｄ情報を捕捉するか、又は動き技術からステレオ又は構造のような１又は複数の２次元（２Ｄ）画像から３Ｄ情報を回復する多数の技術が存在する。一般に３Ｄ取得技術は、アクティブ及びパッシブアプローチ、シングルビュー及びマルチビューアプローチ、並びに、幾何学的及び光度方法として分類することができる。 The recovery of 3D information has been an active research area for some time. Numerous techniques in the literature to capture 3D information directly, for example using a laser range finder, or to recover 3D information from one or more two-dimensional (2D) images such as stereo or structure from motion techniques Exists. In general, 3D acquisition techniques can be classified as active and passive approaches, single-view and multi-view approaches, and geometric and luminosity methods.

パッシブアプローチは、正規の照明条件下で撮影される画像又はビデオから３Ｄ形状を取得する。３Ｄ形状は、画像及びビデオから抽出された幾何学的形状又は光学的な特徴を使用して計算される。アクティブなアプローチは、レーザのような特別の光源、構造照明又は赤外照明を使用する。アクティブなアプローチは、オブジェクト及びシーンの表面に投影される特別な光に対するオブジェクト及びシーンの応答に基づいて形状を計算する。 The passive approach acquires 3D shapes from images or videos taken under normal lighting conditions. The 3D shape is calculated using geometric shapes or optical features extracted from images and videos. Active approaches use special light sources such as lasers, structured illumination or infrared illumination. The active approach calculates the shape based on the response of the object and scene to the special light projected onto the object and scene surface.

シングルビューアプローチは、単一のカメラの視点から撮影される多数の画像を使用して３Ｄ形状を回復する。例は、動きからの構造及びデフォーカスからの深さを含む。 The single view approach uses a large number of images taken from a single camera viewpoint to recover the 3D shape. Examples include structure from motion and depth from defocus.

マルチビューアプローチは、オブジェクトの動きから得られる多数のカメラの視点から、又は異なる光源の位置で撮影される多数の画像から３Ｄ形状を回復する。ステレオマッチングは、画素の深さ情報を得るため、ステレオペアにおける左の画像と右の画像における画素を整合させることによるマルチビューの３Ｄリカバリの例である。 The multi-view approach recovers the 3D shape from multiple camera viewpoints obtained from object motion or from multiple images taken at different light source positions. Stereo matching is an example of multi-view 3D recovery by matching pixels in the left and right images in a stereo pair to obtain pixel depth information.

幾何学的な方法は、１又は複数の画像におけるコーナ、エッジ、ライン又は輪郭のような幾何学的な特徴を検出することで３Ｄ形状を回復する。抽出されたコーナ、エッジ、ライン又は輪郭の間の特別の関係は、画像における画素の３Ｄ座標を推論するために使用される。ＳＦＭ（Structure From Motion）は、シーン内で動くカメラ又は静止したカメラから撮影された一連の画像からシーン及び動くオブジェクトの３Ｄ構造を再構築するのを試みる技術である。多くはＳＦＭが基本的に非線形の問題であることを同意するが、ダイレクトソリューション方法と同様に数学的な優雅さを提供する、非線形を表す幾つかの試みがなされている。他方で、線形の技術は、非線形の技術は、繰り返しの最適化を必要とし、極小に対処する必要がある。しかし、これらは、良好な数値の精度及び柔軟性を妥協する。ステレオマッチングに対するＳＦＭの利点は、１つのカメラが必要とされることである。特徴に基づいたアプローチは、次のフレームにおける不一致を予測するために特徴の動きの過去の履歴を利用するトラッキング技術により更に効果的にされる。 Geometric methods recover 3D shapes by detecting geometric features such as corners, edges, lines or contours in one or more images. Special relationships between extracted corners, edges, lines or contours are used to infer the 3D coordinates of the pixels in the image. SFM (Structure From Motion) is a technology that attempts to reconstruct the 3D structure of a scene and moving objects from a series of images taken from a moving camera or a stationary camera in the scene. Many agree that SFM is basically a non-linear problem, but there have been some attempts to represent non-linearity that provide mathematical elegance as well as direct solution methods. On the other hand, linear techniques, non-linear techniques require iterative optimization and need to deal with local minima. However, they compromise good numerical accuracy and flexibility. The advantage of SFM over stereo matching is that one camera is required. The feature-based approach is made even more effective by tracking techniques that use the past history of feature motion to predict discrepancies in the next frame.

第二に、２つの連続するフレーム間の同じ空間及び時間差のため、対応する問題は、オプティカルパターンと呼ばれる画像の明るさのパターンの明らかな動きを予測する問題として位置づけられる。ＳＦＭを使用する幾つかのアルゴリズムが存在し、それらの大部分は、２Ｄ画像からの３Ｄ形状の再構成に基づく。幾つかは公知の対応する値を想定し、他は、対応なしに再構成するために統計的なアプローチを使用する。 Second, because of the same space and time difference between two consecutive frames, the corresponding problem is positioned as a problem of predicting the apparent movement of the image brightness pattern, called the optical pattern. There are several algorithms that use SFM, most of which are based on the reconstruction of 3D shapes from 2D images. Some assume known corresponding values, others use a statistical approach to reconstruct without a correspondence.

上述された方法は、数十年について広範囲に研究されている。しかし、単一の技術は、全ての状況において良好に実行せず、過去の方法の大部分は、比較的容易に実験室の条件下で３Ｄの再構成に焦点を当てている。現実の世界のシーンについて、対象は動いており、照明は複雑であり、深さのレンジは大きい。先に識別された技術について、これら現実世界の条件を扱うことは困難である。 The methods described above have been extensively studied for decades. However, a single technique does not perform well in all situations, and most of the past methods focus on 3D reconstruction relatively easily under laboratory conditions. For real world scenes, the subject is moving, the lighting is complex, and the depth range is large. It is difficult to deal with these real-world conditions for previously identified technologies.

この開示は、２次元（２Ｄ）画像を使用したシーンの３次元（３Ｄ）取得及びモデリングのシステム及び方法を提供する。この開示のシステム及び方法は、あるシーンの少なくとも２つの画像を取得するステップ、特徴を更に目に見えるようにする平滑化機能を行なうステップを含み、３Ｄ情報の回復のために特徴選択及びトラッキングのハイブリッドスキームにより後続される。はじめに、平滑化機能は、画像に適用され、画像における特徴を発見する特徴点選択により後続される。少なくとも２つの特徴点検出機能は、最初の画像における広い範囲の良好な特徴点をカバーするために利用され、次いで、平滑化機能は、第二の画像に適用され、第二の画像において検出された特徴点を追跡するためにトラッキング機能により後続される。特徴検出／選択及びトラッキングの結果は、完全な３Ｄモデルを取得するために結合される。この機能の１つの目標とする用途は、フィルムセットの３Ｄ再構成である。結果的に得られる３Ｄモデルは、フィルムシューティングの間の可視化又は後処理のために使用される。他の用途は、限定されるものではないが、ゲーム及び３ＤＴＶを含めてこのアプローチから利益を得る。 This disclosure provides systems and methods for three-dimensional (3D) acquisition and modeling of scenes using two-dimensional (2D) images. The system and method of this disclosure includes the steps of acquiring at least two images of a scene, performing a smoothing function that makes the features more visible, for feature selection and tracking for 3D information recovery. Followed by a hybrid scheme. First, a smoothing function is applied to the image and is followed by feature point selection to find features in the image. At least two feature point detection functions are utilized to cover a wide range of good feature points in the first image, and then a smoothing function is applied to the second image and detected in the second image. Followed by the tracking function to track the feature points. Feature detection / selection and tracking results are combined to obtain a complete 3D model. One target application for this function is 3D reconstruction of film sets. The resulting 3D model is used for visualization or post-processing during film shooting. Other applications will benefit from this approach, including but not limited to gaming and 3DTV.

この開示の１つの態様によれば、３次元の取得プロセスが提供される。シーンの第一及び第二の画像を取得するステップ、画像におけるオブジェクトの特徴点を検出するために第一の画像に少なくとも２つの特徴検出機能を適用するステップ、追跡されるべきオブジェクトの特徴点を選択するために少なくとも２つの特徴検出機能の出力を結合するステップ、選択されたオブジェクトの特徴点を追跡するために第二の画像にトラッキング機能を適用するステップ、及びトラッキング機能の出力からシーンの３次元モデルを再構成するステップを含む。このプロセスは、第一の画像におけるオブジェクトの特徴点を更に目に見えるようにするために少なくとも２つの特徴検出機能の適用前に、第一の画像への平滑化機能を更に適用し、この特徴点は、画像におけるオブジェクトのコーナ、エッジ又はラインである。 According to one aspect of this disclosure, a three-dimensional acquisition process is provided. Obtaining first and second images of the scene; applying at least two feature detection functions to the first image to detect object feature points in the image; Combining the output of at least two feature detection functions for selection, applying a tracking function to the second image to track the feature points of the selected object, and 3 of the scene from the output of the tracking function Reconstructing the dimensional model. This process further applies a smoothing function to the first image before applying at least two feature detection functions to make the feature points of the object in the first image more visible. A point is a corner, edge or line of an object in the image.

この開示の別の態様では、２次元（２Ｄ）画像から３次元（３Ｄ）情報を取得するシステムが提供される。当該システムは、少なくとも２つの画像からあるシーンの３次元モデルを再構成するために構成される後処理装置を含み、この後処理装置は、ある画像における特徴点を検出するために構成される特徴点検出器を含み、この特徴点検出器は、少なくとも２つの特徴検出機能を含む。少なくとも２つの特徴検出機能は、少なくとも２つの画像の第一の画像に適用される。また、当該システムは、少なくとも２つの画像間の選択された特徴点を追跡するために構成される特徴点追跡手段、及び、追跡された特徴点から少なくとも２つの画像間の深さマップを生成するために構成される深さマップ生成手段を含む。後処理装置は、深さマップからの３Ｄモデルを生成する。後処理装置は、第一の画像におけるオブジェクトの特徴点を更に目に見えるようにするために構成される平滑化機能フィルタを更に含む。 In another aspect of this disclosure, a system for obtaining three-dimensional (3D) information from a two-dimensional (2D) image is provided. The system includes a post-processing device configured to reconstruct a 3D model of a scene from at least two images, the post-processing device being configured to detect feature points in the image. A point detector, which includes at least two feature detection functions. At least two feature detection functions are applied to a first image of at least two images. The system also generates a feature point tracking means configured to track selected feature points between at least two images, and a depth map between the at least two images from the tracked feature points. A depth map generator configured for the purpose. The post-processing device generates a 3D model from the depth map. The post-processing device further includes a smoothing function filter configured to make the feature points of the object in the first image more visible.

この開示の更なる態様では、２次元（２Ｄ）画像から３次元（３Ｄ）シーンをモデリングする方法ステップを実行するためにコンピュータにより実行可能な命令からなるプログラムを実施する、コンピュータ読取り可能なプログラムストレージデバイスが提供され、上記方法は、あるシーンの第一及び第二の画像を取得するステップ、平滑化機能を第一の画像に適用するステップ、少なくとも２つの特徴検出機能を平滑化された第一の画像に適用して、画像におけるオブジェクトの特徴点を検出するステップ、追跡されるべきオブジェクトの特徴点を選択するために少なくとも２つの特徴検出機能の出力を結合するステップ、平滑化機能を第二の画像に適用するステップ、選択されたオブジェクトの特徴点を追跡するために第二の画像にトラッキング機能を適用するステップ、及び、トラッキング機能の出力からシーンの３次元モデルを再構成するステップを含む。 In a further aspect of this disclosure, a computer-readable program storage that implements a program comprising computer-executable instructions to perform method steps for modeling a three-dimensional (3D) scene from a two-dimensional (2D) image. A device is provided, the method comprising: obtaining a first and second image of a scene; applying a smoothing function to the first image; and smoothing at least two feature detection functions Applying to the image of the image, detecting a feature point of the object in the image, combining the outputs of at least two feature detection functions to select the feature point of the object to be tracked, a second smoothing function Applying to the second image, tracing to the second image to track the feature points of the selected object Applying a king function, and a step of reconstructing a 3-dimensional model of a scene from an output of the tracking function.

本発明のこれらの態様、特徴及び利点、並びに、他の態様、特徴及び利点が記載され、添付図面と共に読まれる好適な実施の形態の以下の詳細な説明から明らかとなるであろう。 These aspects, features and advantages of the present invention, as well as other aspects, features and advantages will be set forth and will be apparent from the following detailed description of the preferred embodiments, read in conjunction with the accompanying drawings.

図面では、同じ参照符号は、図を通して同じエレメントを示す。図面は、本発明のコンセプトを例示するためのものであり、必ずしも、本発明を説明するための唯一の可能なコンフィギュレーションではない。 In the drawings, like reference numerals designate like elements throughout the drawings. The drawings are for purposes of illustrating the concepts of the invention and are not necessarily the only possible configuration for illustrating the invention.

図示されるエレメントは、ハードウェア、ソフトウェア又はその組み合わせの様々な形式で実現される場合があることが理解される。好ましくは、これらのエレメントは、プロセッサ、メモリ並びに入力／出力インタフェースを含む１以上の適切にプログラムされた汎用装置でハードウェア及びソフトウェアの組み合わせで実現される。 It will be appreciated that the elements shown may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general purpose devices including a processor, memory and input / output interfaces.

この記載は、本発明の原理を例示する。当業者であれば、本明細書では明示的に記載又は図示されないが、本発明の原理を実施し、本発明の精神及び範囲内に含まれる様々なアレンジメントを考案することができることを理解されたい。 This description illustrates the principles of the invention. Those skilled in the art will appreciate that although not explicitly described or illustrated herein, the principles of the invention may be implemented and various arrangements may be devised which fall within the spirit and scope of the invention. .

本明細書で引用される全ての例及び条件付き言語は、本発明の原理及び本発明者により寄与されるコンセプトを理解することにおいて読者を支援する教育的な目的について意図され、係る特に引用される例及び条件への制限がないものとして解釈されるべきである。 All examples and conditional languages cited herein are intended for educational purposes to assist the reader in understanding the principles of the invention and the concepts contributed by the inventor, and are specifically cited as such. Should not be construed as having any limitations on the examples and conditions.

さらに、本発明の原理、態様及び実施の形態を参照する全ての説明は、その特定の例と同様に、本発明の構造的且つ機能的に等価な構成を包含することが意図される。さらに、係る等価な構成は、現在公知の等価な構成と同様に将来的に開発される等価な構成、すなわち構造に関わらず同じ機能を実行する開発されたエレメントの両者を含む。 Moreover, all references to principles, aspects and embodiments of the invention are intended to cover structurally and functionally equivalent configurations of the invention, as well as specific examples thereof. Furthermore, such equivalent configurations include both equivalent configurations that will be developed in the future as well as currently known equivalent configurations, ie, both developed elements that perform the same function regardless of the structure.

したがって、たとえば、本明細書で表されるブロック図は、本発明の原理を実施する例示的な回路の概念的なビューを表わすことが当業者により理解されるであろう。同様に、任意のフローチャート、フローダイアグラム、状態遷移図、擬似コード等は、コンピュータ読取り可能な媒体で実質的に表され、コンピュータ又はプロセッサが明示的に示されるか否かに関わらず、コンピュータ又はプロセッサによりそのように表される様々なプロセスを表す。 Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent a conceptual view of an exemplary circuit that implements the principles of the invention. Similarly, any flowcharts, flow diagrams, state transition diagrams, pseudocodes, etc. may be substantially represented by computer-readable media, whether or not the computer or processor is explicitly shown. Represents the various processes so represented.

図示される様々なエレメントの機能は、適切なソフトウェアと関連してソフトウェアを実行可能なハードウェアと同様に専用ハードウェアの使用を通して提供される。プロセッサにより提供されたとき、機能は、単一の専用プロセッサ、単一の共有プロセッサ、又はそのうちの幾つかが共有される複数の個々のプロセッサにより提供される。さらに、用語「プロセッサ」又は「コントローラ」の明示的な使用は、ソフトウェアを実行可能なハードウェアを排他的に示すために解釈されるべきではなく、限定することなしに、デジタルシグナルプロセッサ（ＤＳＰ）ハードウェア、ソフトウェアを記憶するためのリードオンリメモリ（ＲＯＭ）、ランダムアクセスメモリ（ＲＡＭ）及び不揮発性ストレージを暗黙に含む。 The functionality of the various elements shown is provided through the use of dedicated hardware as well as hardware capable of executing software in conjunction with appropriate software. When provided by a processor, the functionality is provided by a single dedicated processor, a single shared processor, or multiple individual processors, some of which are shared. Furthermore, the explicit use of the terms “processor” or “controller” should not be construed to indicate exclusively hardware capable of executing software, but without limitation, digital signal processors (DSPs). Implicitly includes hardware, read only memory (ROM) for storing software, random access memory (RAM) and non-volatile storage.

他のハードウェア、コンベンショナル及び／又はカスタムが含まれる場合もある。同様に、図示される任意のスイッチは概念的なものである。それらの機能は、プログラムロジックを通して、専用ロジックを通して、プログラム制御及び専用ロジックの繰り返しを通して実行されるか、マニュアル的に、特定の技術は、文脈から更に詳細に理解されるように実現者により選択可能である。 Other hardware, conventional and / or custom may be included. Similarly, any switches shown are conceptual. Those functions can be performed through program logic, through dedicated logic, through program control and dedicated logic iterations, or manually, and specific techniques can be selected by the implementor to be understood in more detail from the context. It is.

本発明の請求項では、特定の機能を実行する手段として表現されるエレメントは、ａ）その機能を実行する回路エレメントの組み合わせ、又はｂ）その機能を実行するためにそのソフトウェアを実行する適切な回路と結合されるファームウェア、マイクロコード等を含む何れかの形式のソフトウェアを含めて、その機能を実行する任意の方法を包含することが意図される。係る請求項により定義される発明は、様々な引用される手段により提供される機能が結合され、請求項が求めるやり方で互いに結合されるという事実にある。したがって、それらの機能を提供することができる任意の手段は、本明細書に示される手段に等価であると考えられる。 In the claims of the present invention, an element expressed as a means for performing a particular function is a) a combination of circuit elements that perform that function, or b) suitable software that executes that software to perform that function. It is intended to encompass any method of performing that function, including any form of software including firmware, microcode, etc. coupled with the circuit. The invention defined by such claims resides in the fact that the functions provided by the various cited means are combined and combined with each other in the way the claims require. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

本発明で開示される技術は、オブジェクト及びシーンの３Ｄ形状を回復する問題に対処する。現実世界のシーンの形状を回復することは、対象の動き、フォアグランドとバックグランドの間の大きな深さの不連続性、並びに、複雑化された照明及び明るさの条件のために挑戦的な問題である。ある画像の深さマップを予測するため、又は３Ｄ表現を再構成するために特徴点選択及び追跡で使用される現在の方法は、それら自身により非常に良好に実行されない。２Ｄ画像から３Ｄの再構成が使用されるが、結果は制限され、深さマップは非常に正確ではない。レーザスキャンのような３Ｄ取得のための技術の中には、たとえば人間の対象の存在のために多くの状況で許容可能ではない。 The technique disclosed in the present invention addresses the problem of recovering the 3D shape of objects and scenes. Restoring the shape of a real-world scene is challenging due to object movement, large depth discontinuities between the foreground and background, and complex lighting and brightness conditions It is a problem. Current methods used in feature point selection and tracking to predict a depth map of an image or to reconstruct a 3D representation are not performed very well by themselves. Although 3D reconstruction from 2D images is used, the results are limited and the depth map is not very accurate. Some techniques for 3D acquisition, such as laser scanning, are not acceptable in many situations, for example due to the presence of human objects.

オブジェクト及びシーンの３次元（３Ｄ）形状を回復するシステム及び方法が提供される。本発明のシステム及び方法は、３Ｄの特徴を回復するためにハイブリッドアプローチを使用してＳＦＭ（Structure From Motion）のエンハンスメントアプローチを提供する。この技術は、大きな環境について特徴を確実に配置可能な単一の方法を欠くことで動機付けされる。本発明の技術は、はじめに、特徴点検出／選択及び追跡の前に、ポアソン又はラプラシアン変換のような異なる平滑化機能を画像に適用することで開始する。このタイプの平滑化フィルタは、一般に使用されるガウス関数よりも、画像における特徴を検出するために更に目に見えるようにする役割を果たす。次いで、多数の特徴検出器は、良好な特徴を得るために１つの画像に適用される。２つの特徴検出器の使用の後、良好な特徴が得られ、次いで、トラッキング方法を使用して幾つかの画像を通して容易に追跡される。 Systems and methods are provided for recovering the three-dimensional (3D) shape of objects and scenes. The system and method of the present invention provides an SFM (Structure From Motion) enhancement approach using a hybrid approach to recover 3D features. This technology is motivated by the lack of a single method that can reliably place features for large environments. The technique of the present invention begins by applying different smoothing functions, such as Poisson or Laplacian transforms, to the image prior to feature point detection / selection and tracking. This type of smoothing filter serves to make it more visible to detect features in the image than commonly used Gaussian functions. A number of feature detectors are then applied to one image to obtain good features. After using two feature detectors, good features are obtained and then easily tracked through several images using a tracking method.

ここで図面を参照して、この開示の実施の形態に係る例示的なシステムコンポーネントは、図１に示される。スキャニング装置１０３は、たとえばカメラ−オリジナルフィルムネガティブといったフィルムプリント１０４を、たとえばＣｉｎｅｏｎフォーマット又はＳＭＰＴＥ（Society of Motion Picture and Television
Engineers）のＤＸＰ（Digital Picture Exchange）ファイルといったデジタルフォーマットにスキャニングするために提供される。 Referring now to the drawings, exemplary system components according to embodiments of this disclosure are shown in FIG. The scanning device 103 can be used to print a film print 104 such as a camera-original film negative, for example, a Cineon format or SMPTE (Society of Motion Picture and Television).
Provided for scanning to digital formats such as Engineers' DXP (Digital Picture Exchange) files.

スキャニング装置１０３は、たとえば、テレシネ、又はビデオ出力でたとえばArriLocPro^TMのようなフィルムからのビデオ出力を生成する任意の装置を含む。代替的に、（たとえばコンピュータ読取り可能な形式で既にあるファイルといった）ポストプロダクションプロセス又はデジタルシネマ１０６からのファイルは、直接的に使用することができる。コンピュータ読取り可能なファイルの潜在的なソースは、ＡＶＩＤ^TMエディタ、ＤＰＸファイル、Ｄ５テープ等である。 Scanning device 103 includes, for example, telecine, or any device that produces video output from film such as ArriLocPro ^™ with video output. Alternatively, a file from a post-production process or digital cinema 106 (eg, a file that already exists in a computer-readable format) can be used directly. Potential sources of computer readable files are AVID ^™ editors, DPX files, D5 tapes, etc.

スキャニングされたフィルムプリントは、たとえばコンピュータといった後処理装置１０２に入力される。コンピュータは、１以上の中央処理装置（ＣＰＵ）、ランダムアクセスメモリ（ＲＡＭ）及び／又はリードオンリメモリ（ＲＯＭ）のようなメモリ１１０、並びにキーボード、カーソル制御装置（たとえばマウス又はジョイスティック）及び表示装置のような入力／出力（Ｉ／Ｏ）ユーザインタフェース１１２のようなハードウェアを有する様々な公知のコンピュータプラットフォームで実現される。コンピュータプラットフォームは、オペレーティングシステム及びマイクロインストラクションコードを含む。本明細書で記載される各種の処理及び機能は、マイクロインストラクションコードの一部、又はオペレーティングシステムを介して実行されるソフトウェアアプリケーションプログラムの一部（或いはその組み合わせ）である。１実施の形態では、ソフトウェアアプリケーションプログラムは、プログラムストレージ装置で実現され、後処理装置１０２のような適切なコンピュータにアップロードされて実行される。さらに、各種の他の周辺装置は、パラレルポート、シリアルポート又はユニバーサルシリアルバス（ＵＳＢ）のような各種インタフェース及びバス構造によりコンピュータプラットフォームに接続される。他の周辺装置は、更なるストレージ装置１２４及びプリンタ１２８を含む。プリンタ１２８は、フィルム１２６の改訂されたバージョンをプリントするために利用され、シーンは、いかに記載される技術の結果として３Ｄモデリングされたオブジェクトを使用して変更又は置き換えられる。 The scanned film print is input to a post-processing device 102 such as a computer. The computer may include one or more central processing units (CPU), memory 110 such as random access memory (RAM) and / or read only memory (ROM), as well as keyboards, cursor control devices (eg, mouse or joystick) and display devices. It may be implemented on various known computer platforms having hardware such as an input / output (I / O) user interface 112. The computer platform includes an operating system and microinstruction code. The various processes and functions described in the present specification are a part of microinstruction code or a part (or combination thereof) of a software application program executed via an operating system. In one embodiment, the software application program is implemented on a program storage device and uploaded to a suitable computer such as the post-processing device 102 for execution. In addition, various other peripheral devices are connected to the computer platform by various interfaces and bus structures such as a parallel port, serial port or universal serial bus (USB). Other peripheral devices include a further storage device 124 and a printer 128. The printer 128 is utilized to print a revised version of the film 126, and the scene is changed or replaced using 3D modeled objects as a result of the technique described.

代替的に、（たとえば外部のハードドライブ１２４に記憶されるデジタルシネマといった）コンピュータ読取り可能な形式１０６で既にあるファイル／フィルムプリントは、コンピュータ１０２に直接的に入力される。なお、本明細書で使用される用語「フィルム」は、フィルムプリント又はデジタルシネマのいずれかを示す場合がある。 Alternatively, file / film prints that are already in computer readable form 106 (eg, digital cinema stored on external hard drive 124) are input directly to computer 102. Note that the term “film” used in this specification may indicate either film print or digital cinema.

ソフトウェアプログラムは、メモリ１１０に記憶される３次元（３Ｄ）再構成モジュールを含む。３Ｄ再構成モジュール１１４は、画像におけるオブジェクトの特徴を検出するために更に目に見えるようにする平滑化機能フィルタ１１６を含む。３Ｄ再構成モジュール１１４は、画像における特徴点を検出するための特徴点検出手段１１８を含む。特徴点検出器１１８は、特徴点を検出又は選択するため、たとえばアルゴリズムといった少なくとも２つの異なる特徴点の検出機能を含む。特徴点追跡手段１２０は、追跡機能又はアルゴリズムを介して複数の連続する画像を通して選択された特徴点を追跡するために提供される。深さマップ発生手段１２２は、追跡された特徴点から深さマップを発生するために提供される。 The software program includes a three-dimensional (3D) reconstruction module stored in the memory 110. The 3D reconstruction module 114 includes a smoothing function filter 116 that is further visible to detect object features in the image. The 3D reconstruction module 114 includes feature point detection means 118 for detecting feature points in the image. The feature point detector 118 includes at least two different feature point detection functions, such as an algorithm, for detecting or selecting feature points. A feature point tracking means 120 is provided for tracking selected feature points through a plurality of consecutive images via a tracking function or algorithm. A depth map generator 122 is provided for generating a depth map from the tracked feature points.

図２は、本発明の態様に係る２次元（２Ｄ）画像から３次元（３Ｄ）オブジェクトを再構成する例示的な方法のフローダイアグラムである。 FIG. 2 is a flow diagram of an exemplary method for reconstructing a three-dimensional (3D) object from a two-dimensional (2D) image according to an aspect of the present invention.

図２を参照して、後処理装置１０２は、コンピュータ読取可能なフォーマットでデジタルマスタービデオファイルを取得する。デジタルビデオファイルは、デジタルビデオカメラでテンポラルなビデオ画像の系列を捕捉することにより取得される。代替的に、ビデオ系列は、慣習的なフィルムタイプのカメラにより捕捉される場合がある。このシナリオでは、フィルムは、スキャニング装置１０３を介してスキャンされ、プロセスはステップ２０２に進む。カメラは、シーンにおけるオブジェクト又はカメラのいずれかが移動する間に２Ｄ画像を取得する。カメラは、シーンの複数の視点を取得する。 Referring to FIG. 2, post-processing device 102 obtains a digital master video file in a computer-readable format. A digital video file is obtained by capturing a sequence of temporal video images with a digital video camera. Alternatively, the video sequence may be captured by a conventional film type camera. In this scenario, the film is scanned via scanning device 103 and the process proceeds to step 202. The camera acquires a 2D image while either an object or camera in the scene moves. The camera acquires multiple viewpoints of the scene.

フィルムがスキャンされるか又は既にデジタルフォーマットにあるかに関わらず、フィルムのデジタルファイルは、フレームの位置に関する指示又は情報（たとえばタイムコード、フレーム番号、フィルムの開始からの時間等）を含む。デジタルビデオのそれぞれのフレームは、たとえばＩ₁，Ｉ₂，．．．Ｉ_nといった１つの画像を含む。 Regardless of whether the film is scanned or is already in digital format, the digital file of the film contains instructions or information regarding the position of the frame (eg, time code, frame number, time from the start of the film, etc.). Each frame of the digital video is, for example, I ₁ , I ₂ ,. . . Including one image such I _n.

ステップ２０２では、平滑化機能フィルタ１１６は、画像Ｉ₁に適用される。好ましくは、平滑化関数フィルタ１１６は、ポアソン又はラプラシアン変換であり、当該技術分野で一般に使用されるガウス関数よりも画像に於けるオブジェクトの機能を検出するために更に目に見えるようにするのに役立つ。他の平滑化関数フィルタが採用される場合があることを理解されたい。 In step 202, the smoothing function filter 116 is applied to the image I ₁ . Preferably, the smoothing function filter 116 is a Poisson or Laplacian transformation to make it more visible to detect the function of the object in the image than a Gaussian function commonly used in the art. Useful. It should be understood that other smoothing function filters may be employed.

次いで、画像Ｉ₁は、ステップ２０４で第一の特徴点検出器により処理される。特徴点は、コーナ、エッジ、ライン等のような画像の顕著な特徴であり、この場合、多くの量の画像の強度のコントラストが存在する。特徴点は、容易に識別可能であるために選択され、ロバストに追跡される。特徴点検出手段１１８は、当該技術分野で良好に知られるように、Kitchen-Rosenfeldコーナ検出演算子Ｃを使用する場合がある。この演算子は、所与の画素の位置で画像の「コーナネス（cornerness）」の度合いを評価するために使用される。「コーナ」は、たとえば９０度の角度での画像の強度の勾配の最大値の２つの方向の交点により特徴づけされる画像の特徴である。特徴点を抽出するため、Kitchen-Rosenfeld演算子は、画像Ｉ₁のそれぞれ有効な画素の位置で適用される。特定の画素で演算子Ｃの値が高くなると、そのコーナネスの程度が高くなり、画像Ｉ₁における画素の位置（ｘ,ｙ）での演算子Ｃが（ｘ,ｙ）の近傍における他の画素の位置でよりも大きい場合に、位置（ｘ,ｙ）は特徴点である。近傍は、画素の位置（ｘ,ｙ）でセンタリングされる５×５マトリクスである。ロバストネスを保証するため、選択された特徴点は、Ｔ_c＝１０のような閾値よりも大きいコーナネスの程度を有する。特徴点検出器１１８からの出力は、画像Ｉ₁における特徴点｛Ｆ₁｝のセットであり、それぞれのＦ₁は、画像Ｉ₁における「特徴」の画素の位置に対応する。限定されるものではないが、ＳＩＦＴ（Scale Invariant Feature Transform）、ＳＵＳＡＮ（Smallest Univalue Segment Assimilating Nucleus）、Sobelエッジ演算子及びCannyエッジ検出器を含めて、多くの他の特徴点検出器が利用される。 The image I ₁ is then processed by the first feature point detector at step 204. Feature points are salient features of the image such as corners, edges, lines, etc., where there is a large amount of image intensity contrast. Feature points are selected and tracked robustly to be easily identifiable. The feature point detection means 118 may use the Kitchen-Rosenfeld corner detection operator C, as is well known in the art. This operator is used to evaluate the degree of “cornerness” of an image at a given pixel location. A “corner” is a feature of an image characterized by the intersection of two directions of the maximum value of the intensity gradient of the image at an angle of 90 degrees, for example. In order to extract feature points, the Kitchen-Rosenfeld operator is applied at each valid pixel position in the image I ₁ . When the value of the operator C increases for a specific pixel, the degree of cornerness increases, and the operator C at the pixel position (x, y) in the image I ₁ is another pixel in the vicinity of (x, y). The position (x, y) is a feature point if it is greater than The neighborhood is a 5 × 5 matrix centered at the pixel location (x, y). In order to ensure robustness, the selected feature points have a degree of cornerness greater than a threshold, such as T _c = 10. The output from the feature point detector 118 is a set of feature points {F _1} in the image I _1, each of F ₁ corresponds to the position of the pixel of the "features" of the image I _1. Many other feature point detectors are used, including but not limited to SIFT (Scale Invariant Feature Transform), SUSAN (Smallest Univalue Segment Assimilating Nucleus), Sobel edge operator and Canny edge detector .

ステップ２０６では、画像Ｉ₁は、平滑化関数フィルタ１１６に入力され、第二の異なる特徴点検出器は、該画像に適用される（ステップ２０８）。次いで、ステップ２０４及びステップ２０８で検出された特徴点が結合され、複製の選択された特徴点が除かれる（ステップ２１０）。ステップ２０６で適用される平滑化関数のフィルタは、ステップ２０２で適用されるのと同じフィルタであるが、他の実施の形態では、異なる平滑化関数フィルタがステップ２０２及び２０６のそれぞれで使用される場合があることを理解されたい。 In step 206, the image I ₁ is input to the smoothing function filter 116, and a second different feature point detector is applied to the image (step 208). The feature points detected in step 204 and step 208 are then combined and the selected feature points of the duplicate are removed (step 210). The smoothing function filter applied in step 206 is the same filter applied in step 202, but in other embodiments, a different smoothing function filter is used in each of steps 202 and 206. Please understand that there are cases.

ハイブリッドアプローチを特徴点の検出に採用することで、多数の特徴点を検出することができることを理解されたい。図３Ａは、小さな矩形により表される検出された特徴点をもつシーンを例示する。図３Ａにおけるシーンは、１つの特徴点検出器で処理される。対照的に、図３Ｂにおけるシーンは、本発明に係るハイブリッドポイント検出器のアプローチで処理され、著しく多数の特徴点を検出する。 It should be understood that a large number of feature points can be detected by employing a hybrid approach for feature point detection. FIG. 3A illustrates a scene with detected feature points represented by small rectangles. The scene in FIG. 3A is processed with one feature point detector. In contrast, the scene in FIG. 3B is processed with the hybrid point detector approach of the present invention to detect a significant number of feature points.

検出された特徴点が選択された後、第二の画像Ｉ₂は、第一の画像Ｉ₁で使用されたのと同じ平滑化関数フィルタを使用して平滑化される（ステップ２１２）。第一の画像Ｉ₁で選択された良好な特徴点は、第二の画像Ｉ₂で追跡される（ステップ２１４）。画像Ｉ_１における特徴点のセットが与えられると、特徴点の追跡手段１２０は、それらの最も近い整合を発見することでシーンショットの次の画像Ｉ₂に特徴点を追跡する。 After the detected feature points are selected, the second image I ₂ is smoothed using the same smoothing function filter used in the first image I ₁ (step 212). Good feature points selected in the first image I ₁ are tracked in the second image I ₂ (step 214). Given a set of feature points in image I ₁ , feature point tracking means 120 tracks the feature points in the next image I ₂ of the scene shot by finding their closest match.

上述されるように、他の実施の形態では、ステップ２１２で適用される平滑化関数フィルタは、ステップ２０２及び２０６で適用されるフィルタとは異なる。さらに、ステップ２０２〜２１２は順次に記載されたが、所定の実施の形態では、平滑化関数フィルタは、並列処理又はハードウェアを介して同時に適用される場合がある。 As described above, in other embodiments, the smoothing function filter applied in step 212 is different from the filter applied in steps 202 and 206. Further, although steps 202-212 are described sequentially, in certain embodiments, smoothing function filters may be applied simultaneously via parallel processing or hardware.

特徴点がひとたび追跡されると、それぞれ追跡された特徴について、水平方向における画素の位置ｌ₁とｌ₂の間の差として不一致の情報が計算される。不一致は、カメラのキャリブレーションパラメータに関連するスケーリングファクタにより深さに逆相関される。ステップ２１６で、カメラのキャリブレーションパラメータが取得され、深さマップジェネレータ１２２により、２つの画像間のオブジェクト又はシーンの深さマップを生成するために利用される。カメラパラメータは、限定するものではないが、カメラの焦点距離、２つのカメラショット間の距離を含む。カメラパラメータは、ユーザインタフェース１１２を介してシステム１００に手動で入力されるか、又は、カメラのキャリブレーションアルゴリズムから推定される。カメラパラメータを使用して、深さは、特徴点で予測される。結果的に得られる深さマップは、検出された特徴でのみ深さの値をもつまばらである（sparse）。深さマップは、空間における表面を数学的に表すための値の２次元アレイであり、この場合、アレイの行及び列は、表面のｘ及びｙ位置に対応し、アレイエレメントは、所与のポイント又はカメラ位置から表面への深さ又は距離の測定値である。深さマップは、オブジェクトのグレイスケールの画像として見ることができ、深さ情報は、オブジェクトの表面のそれぞれのポイントで、強度情報又は画素を置き換える。したがって、表面のポイントは、３Ｄグラフィカルな構築の技術での画素とも呼ばれ、２つの用語は、この開示で交換可能に使用される。不一致の情報はスケーリングファクタで乗じられる深さに逆比例するので、大部分の用途にとって３Ｄシーンのモデルを構築するために直接に使用することができる。これにより、カメラパラメータの計算が不要となるので、計算が簡略化される。 Once the feature points are tracked, mismatch information is calculated for each tracked feature as the difference between the pixel positions l ₁ and l ₂ in the horizontal direction. The discrepancy is inversely related to depth by a scaling factor associated with the camera calibration parameters. At step 216, camera calibration parameters are obtained and utilized by the depth map generator 122 to generate a depth map of the object or scene between the two images. Camera parameters include, but are not limited to, the focal length of the camera and the distance between the two camera shots. Camera parameters are manually entered into the system 100 via the user interface 112 or estimated from a camera calibration algorithm. Using camera parameters, depth is predicted at feature points. The resulting depth map is sparse with depth values only at the detected features. A depth map is a two-dimensional array of values for mathematically representing a surface in space, where the rows and columns of the array correspond to the x and y positions of the surface, and the array element is given by A measure of depth or distance from a point or camera position to the surface. The depth map can be viewed as a grayscale image of the object, and the depth information replaces intensity information or pixels at each point on the surface of the object. Thus, surface points are also referred to as pixels in 3D graphical construction techniques, and the two terms are used interchangeably in this disclosure. Because the discrepancy information is inversely proportional to the depth multiplied by the scaling factor, it can be used directly to build a 3D scene model for most applications. This eliminates the need for camera parameter calculation and simplifies the calculation.

画像のペアＩ₁及びＩ₂に存在する特徴点及びそれぞれの特徴点での深さの推定値のセットから、及び、特徴点が互いに比較的に接近して配置され、全体の画像に及ぶように特徴点が選択されるものとすると、深さマップ発生手段１２２は、かかる特徴点を相互に接続することで３Ｄメッシュ構造を形成し、ここで、特徴点は、形成された多角形の頂点にある。特徴点が互いに近くなると、結果的に得られる３Ｄメッシュ構造の密度が高くなる。３Ｄ構造のそれぞれの頂点での深さが公知であるので、それぞれの多角形内の点での深さが推定される。このように、全ての画像の画素の位置での深さが推定される。これは、平面の補間（planar interpolation）により行われる場合がある。３Ｄメッシュ構造を生成するロバスト及び高速な方法は、ドローネー三角分割法である。特徴点は、その頂点が特徴点の位置にある三角形のセットを形成するために接続される。それぞれの特徴点及びその対応する頂点に関連する深さを使用して、「深さの平面」は、それぞれ個々の三角形にフィットされ、個々の三角形から、三角形内のそれぞれのポイントの深さが決定される。 From the set of feature points present in the image pairs I ₁ and I ₂ and the estimated depth values at each feature point and so that the feature points are located relatively close to each other and span the entire image If a feature point is selected, the depth map generating means 122 forms a 3D mesh structure by connecting the feature points to each other, where the feature points are the vertices of the formed polygon. It is in. When the feature points are close to each other, the resulting 3D mesh structure has a higher density. Since the depth at each vertex of the 3D structure is known, the depth at the point in each polygon is estimated. In this way, the depths at the pixel positions of all the images are estimated. This may be done by planar interpolation. A robust and fast method for generating 3D mesh structures is Delaunay triangulation. The feature points are connected to form a set of triangles whose vertices are at the location of the feature points. Using the depth associated with each feature point and its corresponding vertex, the “depth plane” is fitted to each individual triangle, and from each individual triangle, the depth of each point within the triangle is It is determined.

オブジェクトの完全な３Ｄモデルは、ドローネーアルゴリズムから得られる三角メッシュを画像Ｉ₁からのテクスチャ情報と結合することで再構成することができる（ステップ２１８）。テクスチャ情報は、２Ｄの強度画像である。完全な３Ｄモデルは、画像の画素での深さ及び強度値を含む。結果的に得られる結合された画像は、カリフォルニア州のスタンフォードのスタンフォード大学で開発されたScanAlyzeソフトウェアのような従来の視覚化ツールを使用して視覚化される。 The complete 3D model of the object can be reconstructed by combining the triangular mesh obtained from the Delaunay algorithm with the texture information from image I ₁ (step 218). The texture information is a 2D intensity image. A complete 3D model includes depth and intensity values at the pixels of the image. The resulting combined image is visualized using conventional visualization tools such as the ScanAlyze software developed at Stanford University, Stanford, California.

特定のオブジェクト又はシーンの再構成された３Ｄモデルは、表示装置で見るためにレンダリングされるか、画像を含むファイルから離れてデジタルファイル１３０で保存される。３Ｄ再構成のデジタルファイル１３０は、たとえば、モデル化されたオブジェクトが該オブジェクトが以前に存在しなかったシーンに挿入されるフィルムの編集ステージの間に、後の検索のためにストレージ装置１２４に記憶される。 A reconstructed 3D model of a particular object or scene is rendered for viewing on a display device or stored in a digital file 130 away from the file containing the image. The 3D reconstructed digital file 130 is stored in the storage device 124 for later retrieval, for example, during a film editing stage where the modeled object is inserted into a scene where the object did not previously exist. Is done.

本発明のシステム及び方法は、多数の特徴点検出器を利用して、多数の特徴点検出器の結果を結合して、検出されや特徴点の数及び品質を改善する。単一の特徴検出器とは対照に、異なる特徴点検出器を組み合わせることで、追跡すべき良好な特徴点の発見結果が改善される。多数の特徴点検出器からの「良好な」結果を取得した後（すなわち、１を越える特徴点検出器を使用して）、第二の画像における特徴点は、追跡するのが容易であり、深さマップの結果を得るために１つの特徴点検出器を使用することに比較して、良好な深さマップの結果を生成するのが容易である。 The system and method of the present invention utilizes multiple feature point detectors to combine the results of multiple feature point detectors to improve the number and quality of detected and feature points. Combining different feature point detectors as opposed to a single feature detector improves the finding of good feature points to track. After obtaining “good” results from multiple feature point detectors (ie, using more than one feature point detector), the feature points in the second image are easy to track, Compared to using a single feature point detector to obtain a depth map result, it is easier to produce a good depth map result.

本発明の教示を盛り込んだ実施の形態は本明細書で詳細に図示及び記載されたが、当業者は、これらの教示を盛り込んだ多くの他の変化される実施の形態を容易に考案することができる。（例示的且つ限定するものではないことが意図される）シーンの３次元（３Ｄ）取得及びモデリングのシステム及び方法の好適な実施の形態が記載されたが、変更及び変形が上記教示に照らして当該技術分野で当業者によりなされる。したがって、特許請求の範囲で概説される本発明の範囲及び精神にある、開示される本発明の特定の実施の形態で行なわれることを理解されたい。詳細かつ特許法により要求されるように本発明を記載したが、特許請求され且つ特許法により保護される所望の概念は特許請求の範囲で述べられる。 While embodiments incorporating the teachings of the present invention have been shown and described in detail herein, those skilled in the art will readily devise many other varied embodiments that incorporate these teachings. Can do. While a preferred embodiment of a three-dimensional (3D) acquisition and modeling system and method of a scene (which is intended to be exemplary and not limiting) has been described, modifications and variations in light of the above teachings Made by those skilled in the art. Accordingly, it is to be understood that the particular embodiments of the disclosed invention may be practiced within the scope and spirit of the invention as outlined in the claims. Although the invention has been described in detail and as required by patent law, the desired concepts claimed and protected by patent law are set forth in the following claims.

本発明の態様に係る３次元（３Ｄ）情報取得のためにシステムの例示的な説明である。1 is an exemplary description of a system for obtaining three-dimensional (3D) information according to an aspect of the present invention. 本発明の態様に係る２次元（２Ｄ）画像から３次元（３Ｄ）オブジェクトを再構成する例示的な方法のフローチャートである。4 is a flowchart of an exemplary method for reconstructing a three-dimensional (3D) object from a two-dimensional (2D) image according to an aspect of the present invention. １つの特徴点検出機能で処理されるシーンを例示する図である。It is a figure which illustrates the scene processed with one feature point detection function. ハイブリッド検出機能で処理される図３Ａに示されるシーンを例示する図である。It is a figure which illustrates the scene shown by FIG. 3A processed with a hybrid detection function.

Claims

Obtaining a first image and a second image of a scene;
Applying at least two feature detection functions to the first image to detect feature points of the object in the first image;
Combining the outputs of the at least two feature detection functions to select feature points of the object to be tracked;
Applying a tracking function to the second image to track feature points of the selected object;
Reconstructing a three-dimensional model of the scene from the output of the tracking function;
A three-dimensional acquisition process.

Applying a smoothing function to the first image prior to applying the at least two feature detection functions to further make object feature points in the first image more visible. ,
The three-dimensional acquisition process according to claim 1.

The feature points are corners, edges or lines of objects in the image,
The three-dimensional acquisition process according to claim 2.

Prior to applying the tracking function, further comprising applying the same smoothing function to the second image;
The three-dimensional acquisition process according to claim 2.

Applying a first smoothing function to the first image before applying a first feature detection function of the at least two feature detection functions; and Applying a second smoothing function to the first image before applying the second feature detection function;
The first and second smoothing functions make the feature points of the object in the first image more visible;
The three-dimensional acquisition process according to claim 1.

The combining step further includes the step of removing duplicate feature points detected by the at least two feature detection functions;
The three-dimensional acquisition process according to claim 1.

The reconstructing step further includes generating a depth map of feature points of an object selected between the first image and the second image.
The three-dimensional acquisition process according to claim 1.

The reconstructing step further includes generating a three-dimensional mesh structure from the feature points of the selected object and the depth map.
The three-dimensional acquisition process according to claim 7.

The step of generating the three-dimensional mesh structure is performed by a triangulation function.
The three-dimensional acquisition process according to claim 8.

Reconstructing further comprises combining the mesh structure with texture information from the first image to complete a three-dimensional model;
The three-dimensional acquisition process according to claim 8.

A system for obtaining three-dimensional (3D) information from a two-dimensional image,
The system has a post-processing device configured to reconstruct a three-dimensional model of the scene from at least two images,
The post-processing device includes:
A feature point detector configured to detect feature points in an image and including at least two feature detection functions, and the at least two feature detection functions are applied to a first image of the at least two images. ,
Feature point tracking means configured to track selected feature points between the at least two images;
Depth map generating means configured to generate a depth map between at least two images from the tracked feature points;
The post-processing device generates the 3D model from the depth map;
A system characterized by that.

The post-processing device further includes a smoothing function filter configured to make object feature points in the first image more visible.
The system of claim 11.

The smoothing function filter is a Poisson transform or a Laplacian transform.
The system of claim 12.

The feature point detection means is configured to combine the feature points detected from the at least two feature detection functions to remove the detected feature points of replication.
The system of claim 12.

The post-processing device is further configured to generate a three-dimensional mesh structure from the selected feature points and the depth map.
The system of claim 12.

The post-processing device is further configured to combine the mesh structure with texture information from the first image to complete a 3D model.
The system of claim 15.

A display device for rendering the 3D model;
The system of claim 16.

A computer-readable recording medium for realizing a program comprising instructions executable by a computer to execute a method of modeling a three-dimensional scene from a two-dimensional image,
The method is
Obtaining a first image and a second image of a scene;
Applying a smoothing function to the first image;
Applying at least two feature detection functions to the smoothed first image to detect feature points of the object in the first image;
Combining the outputs of the at least two feature detection functions to select feature points of the object to be tracked;
Applying a smoothing function to the second image;
Applying a tracking function to the second image to track feature points of the selected object;
Reconstructing a three-dimensional model of the scene from the output of the tracking function;
A recording medium comprising:

The reconstruction step further includes generating a depth map of feature points of the selected object between the first image and the second image.
The recording medium according to claim 18.

The reconstructing step further includes generating a three-dimensional mesh structure from the feature points of the selected object and the depth map.
The recording medium according to claim 19.

The reconstruction step further comprises the step of combining the mesh structure with texture information from the first image to complete the three-dimensional model.
The recording medium according to claim 20.