JP7245766B2

JP7245766B2 - 3D model generation method and apparatus

Info

Publication number: JP7245766B2
Application number: JP2019231270A
Authority: JP
Inventors: 良亮渡邊
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2023-03-24
Anticipated expiration: 2039-12-23
Also published as: JP2021099671A

Description

本発明は、3Dモデル生成方法及び装置に係り、特に、複数台のカメラの映像から被写体の3Dモデルを高速かつ高品質に生成する3Dモデル生成方法及び装置に関する。 TECHNICAL FIELD The present invention relates to a 3D model generation method and apparatus, and more particularly to a 3D model generation method and apparatus for generating a 3D model of an object from images captured by a plurality of cameras at high speed and with high quality.

複数のカメラ映像から被写体の3Dモデルを生成するアプローチとして、非特許文献1に開示された視体積交差法が広く知られている。視体積交差法は、各カメラ映像から被写体の部分だけを抽出した2値のシルエット画像を3D空間に投影し、その積集合となる部分のみを残すことによって3Dモデルを生成する手法である。 As an approach for generating a 3D model of an object from multiple camera images, the visual volume intersection method disclosed in Non-Patent Document 1 is widely known. The visual volume intersection method is a method of generating a 3D model by projecting a binary silhouette image, which is obtained by extracting only the part of the subject from each camera image, onto a 3D space and leaving only the part that is the product set.

視体積交差法に基づいて生成される3Dモデルを構成する最小単位はボクセルと呼ばれる。ボクセルは、一定の値を持つ小さな体積の立方体であり、立体データを離散的に表現する際の正規格子単位である。 A minimum unit that constitutes a 3D model generated based on the visual volume intersection method is called a voxel. A voxel is a small-volume cube with a constant value, and is a regular grid unit for discrete representation of volumetric data.

非特許文献２には、視体積交差法を自由視点映像技術等の中で用いる技術が開示されている。自由視点映像技術は複数台のカメラ映像から3D空間を再構成し、カメラがないアングルからでも視聴することを可能とする技術であるが、スポーツ映像などを対象とする場合にはリアルタイム性が重要である。しかしながら、スタジアムなどの広大な領域の中で、通常のボクセルベースの視体積交差法で3Dモデルの生成を行う場合には、計算時間が膨大となるという欠点があった。 Non-Patent Document 2 discloses a technique that uses the visual volume intersection method in a free-viewpoint imaging technique or the like. Free-viewpoint video technology is a technology that reconstructs 3D space from images from multiple cameras and enables viewing from angles where there are no cameras. is. However, in a vast area such as a stadium, when generating a 3D model using the normal voxel-based visual volume intersection method, there is a drawback that the calculation time is enormous.

このような技術課題を解決するために、非特許文献３には視体積交差法を高速化する技術として、Coarse-to-Fineのボクセルモデル生成アルゴリズムが開示されている。非特許文献３では、視体積交差法で3Dボクセルモデルを生成する際に、初めに粗い単位ボクセルサイズMaでモデルの生成を行い、ボクセルの塊を一つのオブジェクトとして3Dのバウンディングボックスを得る。その後、各3Dバウンディングボックス内を、細かい単位ボクセルサイズMb（＜Ma）で視体積交差法を用いてモデル化することで処理時間を大幅に削減することに成功している。非特許文献４には、ボクセルの欠けを抑止するためにボクセルを膨張させる技術が開示されている。 In order to solve such a technical problem, Non-Patent Document 3 discloses a Coarse-to-Fine voxel model generation algorithm as a technique for speeding up the visual volume intersection method. In Non-Patent Document 3, when generating a 3D voxel model by the visual volume intersection method, the model is first generated with a coarse unit voxel size Ma, and a 3D bounding box is obtained by treating a cluster of voxels as one object. After that, we succeeded in greatly reducing the processing time by modeling the inside of each 3D bounding box using the visual volume intersection method with a fine unit voxel size Mb (<Ma). Non-Patent Document 4 discloses a technique for expanding voxels in order to prevent voxel chipping.

Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162 (1994).Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162 (1994). J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184.J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184. J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), WeAT17.2, (2019).J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems ( IROS 2019), WeAT17.2, (2019). C. Prock, Andrew & Dyer, Charles. "Towards Real-Time Voxel Coloring.", 1970.C. Prock, Andrew & Dyer, Charles. "Towards Real-Time Voxel Coloring.", 1970. C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999).C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999). Chen, J., Nonaka, K., Sankoh, H., Watanabe, R., Sabirin, H., & Naito, S. Efficient Parallel Connected Component Labeling with a Coarse-to-Fine Strategy. IEEE Access, 2008, 6, 55731-55740.Chen, J., Nonaka, K., Sankoh, H., Watanabe, R., Sabirin, H., & Naito, S. Efficient Parallel Connected Component Labeling with a Coarse-to-Fine Strategy. IEEE Access, 2008, 6 , 55731-55740.

非特許文献3に開示されるCoarse-to-Fineのボクセルモデル生成アルゴリズムは、従来の視体積交差法の処理時間を大幅に削減できる一方、3Dモデルの品質に劣後を生じさせる場合がある。 Although the Coarse-to-Fine voxel model generation algorithm disclosed in Non-Patent Document 3 can significantly reduce the processing time of the conventional visual volume intersection method, it may degrade the quality of the 3D model.

図８は、品質劣後の原因を説明するための図である。非特許文献3では、粗い（Coarseな）ボクセルから3Dバウンディングボックスを推定する際、視体積の交差エリアと粗いボクセルグリッドとの重なりが判定され、視体積交差エリアと重なった粗いボクセルグリッドのみがボクセルモデルの存在領域と見なされる。 FIG. 8 is a diagram for explaining the cause of quality deterioration. In Non-Patent Document 3, when estimating a 3D bounding box from coarse voxels, the overlap between the visual volume intersection area and the coarse voxel grid is determined, and only the coarse voxel grid that overlaps the visual volume intersection area is the voxel regarded as the model's domain of existence.

しかしながら、重なり判定は各ボクセルグリッドの中心座標（図中、「・」で示される）を基準に行われるため、視体積交差エリアの一部でありながら3Dバウンディングボックスの外側と見なされる劣後箇所が生じ得る。このような劣後箇所は、その後の3Dバウンディングボックスの内側を対象とする細かな（Fineな）ボクセル生成においてモデル化されないので、出力される3Dモデルの特に表面近傍部分に欠損を生じさせる原因となる。 However, since the overlap judgment is performed based on the center coordinates of each voxel grid (indicated by "·" in the figure), there are subordinated parts that are considered to be outside the 3D bounding box even though they are part of the visual volume intersection area. can occur. Such subordinated parts are not modeled in the subsequent fine voxel generation targeting the inside of the 3D bounding box, so they cause defects in the output 3D model, especially near the surface. .

この点、非特許文献４によれば各ボクセルが膨張されるので、非特許文献３では3Dバウンディングボックスの外側領域となっていた一部の視体積交差エリアを3Dバウンディングボックスの内側領域に含ませることができる。 In this regard, according to Non-Patent Document 4, each voxel is dilated, so a part of the visual volume intersection area, which was the outer region of the 3D bounding box in Non-Patent Document 3, is included in the inner region of the 3D bounding box. be able to.

しかしながら、非特許文献４では野球のボールなどの小さい物体に関しては、粗いボクセルの単位ボクセルサイズが大きくなった場合にボクセルサイズが0になってしまい、粗いボクセル生成の段階でモデル生成が成されず、物体そのものが消失してしまう懸念があった。 However, in non-patent document 4, when the unit voxel size of coarse voxels increases, the voxel size becomes 0 for small objects such as baseballs, and model generation is not performed at the stage of coarse voxel generation. , there was a concern that the object itself would disappear.

加えて、非特許文献３ではバレーボールや柔道のシーケンスを対象に実験を行っているが、サッカーのスタジアム全体など、制作対象領域が広大になればなるほど処理時間が増大するため、広大な空間でのリアルタイム3Dモデル生成を実現するためには更なる高速化が必要である。 In addition, in Non-Patent Document 3, experiments were conducted on volleyball and judo sequences. Further speeding up is necessary to realize real-time 3D model generation.

一方、処理負荷に余裕があれば単位ボクセルサイズを細かく設計できる(＝単位ボクセルサイズが細かくなることで最終的なモデルの形状が洗練される)ことを鑑みれば、非特許文献３のアルゴリズムが更に高速化されることは自由視点映像の実用化を考える上で重要である。 On the other hand, if there is room in the processing load, the unit voxel size can be finely designed (=the finer unit voxel size refines the shape of the final model). Speeding up is important in considering the practical use of free-viewpoint video.

本発明の目的は、上記の技術課題を解決し、Coarse-to-Fineのボクセルモデル生成において、3Dモデルの欠損や小さな物体の消失を防いで高速かつ高品質な3Dモデル生成を可能にする3Dモデル生成方法及び装置を提供することにある。 The purpose of the present invention is to solve the above technical problems, and to prevent the loss of 3D models and the disappearance of small objects in coarse-to-fine voxel model generation, enabling high-speed and high-quality 3D model generation. An object of the present invention is to provide a model generation method and apparatus.

上記の目的を達成するために、本発明は、多視点映像から被写体の3Dモデルを生成する3Dモデル生成装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above objects, the present invention is characterized by having the following configuration in a 3D model generating apparatus for generating a 3D model of a subject from multi-viewpoint images.

(1) 多視点映像から視点ごとにシルエット画像を取得する手段と、シルエット画像の輪郭を膨張加工して膨張シルエット画像を生成する手段と、膨張シルエット画像を用いた視体積交差法により単位ボクセルサイズが第１サイズM₁の低解像ボクセルモデルMD_Lを計算する手段と、低解像ボクセルモデルMD_Lの領域を対象に、前記シルエット画像を用いた視体積交差法により単位ボクセルサイズが第１サイズM₁よりも小さい第２サイズM₂の高解像ボクセルモデルMD_Hを計算する手段と、高解像ボクセルモデルMD_Hに基づいて被写体の3Dモデルを出力する手段とを具備した。 (1) Unit voxel size by means of obtaining a silhouette image for each viewpoint from a multi-view video, means of generating an expanded silhouette image by dilating the outline of the silhouette image, and the visual volume intersection method using the expanded silhouette image. is a means _for calculating a low-resolution voxel model M _L having a first size M ₁ , and a unit voxel size is a first means for calculating a high resolution voxel model _MDH of a second size _M2 smaller than size _M1 ; and means for outputting a 3D model of the object based on the high resolution voxel model _MDH .

(2) 前記シルエット画像の輪郭を退縮加工して退縮シルエット画像を生成する手段と、退縮シルエット画像を用いた視体積交差法によりボクセルサイズが前記第２サイズM₂よりも大きい第３サイズM₃の低解像ボクセルモデルMD_L2を計算する手段とを具備し、前記高解像ボクセルモデルを計算する手段は、低解像ボクセルモデルMD_L1の領域には含まれるが低解像ボクセルモデルMD_L2の領域には含まれない表面近傍部分を対象に高解像ボクセルモデルMD_Hを計算するようにした。 (2) means for generating a reduced silhouette image by reducing the outline of _the silhouette image _; _, wherein the means for calculating the high resolution voxel model is included in the area of the low resolution voxel model MD _L1 but the low resolution voxel model MD _L2 The high-resolution voxel model MD _H is calculated for the near-surface portion that is not included in the area of .

(3) 膨張加工する手段は、シルエットのサイズや形状に応じて膨張加工の膨張量を適応的に変更するようにした。 (3) As for the expansion processing means, the amount of expansion in the expansion processing is adaptively changed according to the size and shape of the silhouette.

(4) 退縮加工する手段は、シルエットのサイズや形状に応じて退縮加工の退縮量を適応的に変更するようにした。 (4) As for the means for retraction processing, the retraction amount of retraction processing is adaptively changed according to the size and shape of the silhouette.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 膨張シルエット画像を用いた視体積交差法により計算した低解像ボクセルモデルの領域を対象に3Dモデル用の高解像ボクセルモデルを計算するので、小さな被写体を消失させることなく、出力する3Dモデルの表面近傍部分における欠損の発生を抑えられるようになる。 (1) Since a high-resolution voxel model for a 3D model is calculated for the area of the low-resolution voxel model calculated by the visual volume intersection method using the dilated silhouette image, it can be output without erasing small objects. It becomes possible to suppress the occurrence of defects in the vicinity of the surface of the 3D model.

(2) 低解像ボクセルモデルの領域を対象に3Dモデル用の高解像ボクセルモデルを計算する際に、低解像ボクセルモデルの3Dバウンディングボックスの内側を計算対象とするのではなく、低解像度ボクセルモデルが生成される領域のみを計算対象とすれば、高解像ボクセルモデルの計算範囲を低解像ボクセルモデルが生成される領域のみに限定することができ、計算処理の高速化が可能になる。 (2) When calculating the high-resolution voxel model for the 3D model in the area of the low-resolution voxel model, instead of calculating the inside of the 3D bounding box of the low-resolution voxel model, the low-resolution If only the area where the voxel model is generated is targeted for calculation, the calculation range of the high-resolution voxel model can be limited to the area where the low-resolution voxel model is generated, making it possible to speed up the calculation process. Become.

(3) 膨張シルエット画像を用いて計算した低解像ボクセルモデルの領域には含まれるが退縮シルエット画像を用いて計算したが低解像ボクセルモデルの領域には含まれない表面近傍部分のみを対象に3Dモデル用の高解像ボクセルモデルを計算するので、高解像ボクセルモデルの計算範囲を3Dモデルの表面近傍部分のみに限定することができ、計算処理の高速化が可能になる。 (3) Targeting only the near-surface portion that is included in the region of the low-resolution voxel model calculated using the dilated silhouette image but not included in the region of the low-resolution voxel model calculated using the degenerated silhouette image. Since the high-resolution voxel model for the 3D model is calculated at the same time, the calculation range of the high-resolution voxel model can be limited only to the vicinity of the surface of the 3D model, making it possible to speed up the calculation process.

(4) シルエット画像の輪郭を膨張加工して膨張シルエット画像を生成する際の膨張量や、シルエット画像の輪郭を退縮加工して退縮シルエット画像を生成する際の退縮量を、シルエット画像におけるシルエットのサイズや形状に応じて可変としたので、膨張過多や膨張不足が原因の品質低下や、品質向上を伴わない計算量の増加を抑制できるようになる。 (4) The amount of expansion when generating an expanded silhouette image by expanding the contour of a silhouette image, and the amount of contraction when generating a reduced silhouette image by contracting the contour of a silhouette image, are determined by determining the amount of silhouette in the silhouette image. Since it is variable according to the size and shape, it is possible to suppress quality deterioration caused by excessive expansion or insufficient expansion, and an increase in the amount of calculation that does not lead to quality improvement.

本発明の第１実施形態に係る3Dモデル生成装置の機能ブロック図である。1 is a functional block diagram of a 3D model generation device according to a first embodiment of the present invention; FIG. シルエット画像の膨張例を示した図である。It is the figure which showed the expansion example of the silhouette image. 3Dバウンディングボックスの例を示した図である。FIG. 4 is a diagram showing an example of a 3D bounding box; 膨張シルエット画像を用いることにより視体積交差エリアが拡張される様子を示した図である。FIG. 11 is a diagram showing how a visual volumetric intersection area is expanded by using an expanded silhouette image; 本発明の第２実施形態に係る3Dモデル生成装置の機能ブロック図である。FIG. 5 is a functional block diagram of a 3D model generation device according to a second embodiment of the present invention; 第２実施形態における3Dモデルの生成方法を模式的に示した図である。FIG. 10 is a diagram schematically showing a method of generating a 3D model in the second embodiment; FIG. 本発明により3Dモデルの表面近傍部分の欠損が低減される様子を示した図である。FIG. 10 is a diagram showing how the present invention reduces defects in a near-surface portion of a 3D model. Coarse-to-Fineのボクセルモデル生成アルゴリズムの技術課題を説明するための図である。FIG. 10 is a diagram for explaining a technical problem of the Coarse-to-Fine voxel model generation algorithm;

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の第１実施形態に係る3Dモデル生成装置１の主要部の構成を示した機能ブロック図であり、ここでは、野球中継における被写体の3Dモデルの生成を例にして説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of the main parts of a 3D model generation device 1 according to the first embodiment of the present invention. Here, generation of a 3D model of a subject in a baseball broadcast will be described as an example. .

このような3Dモデル生成装置１は、汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいは、アプリケーションの一部をハードウェア化またはソフトウェア化した専用機や単能機としても構成できる。 Such a 3D model generation device 1 can be configured by installing an application (program) that implements each function in a general-purpose computer or server. Alternatively, a part of the application can be configured as a dedicated machine or a single-function machine that is made into hardware or software.

シルエット画像取得部１０１は、複数の被写体を異なる視点で撮影した複数のカメラ２の映像（多視点映像）から、視体積交差法に用いるシルエット画像Aをフレーム単位でそれぞれ取得する。視体積交差法で3Dモデルを形成するためには、３台以上のカメラ２からシルエット画像Aを取得することが望ましい。 The silhouette image acquisition unit 101 acquires a silhouette image A used for the visual volume intersection method in units of frames from images (multi-viewpoint images) of a plurality of cameras 2 photographing a plurality of subjects from different viewpoints. In order to form a 3D model by the visual volume intersection method, it is desirable to acquire silhouette images A from three or more cameras 2 .

シルエット画像Aは、3Dモデルを生成する被写体を白、それ以外の部分を黒で表した２値のマスク画像形式で取得される。このようなシルエット画像Aは、非特許文献５に開示された背景差分法を利用して取得できる。 The silhouette image A is acquired in a binary mask image format in which the subject for which the 3D model is to be generated is represented in white and the other portions are represented in black. Such a silhouette image A can be acquired using the background subtraction method disclosed in Non-Patent Document 5.

シルエット画像加工部１０２は、各シルエット画像Aの輪郭を膨張加工して膨張シルエット画像A₁を生成する膨張加工部１０２ａを少なくとも含み、当該膨張シルエット画像A₁を他の加工方法により加工されたシルエット画像または未加工のシルエット画像A₀（=A）と共に出力する。本実施形態では、視点ごとに膨張シルエット画像A₁および未加工のシルエット画像A₀が出力される。 The silhouette image processing unit 102 includes at least an expansion processing unit 102a that expands the outline of each silhouette image A to generate an expanded silhouette image _A1 , and the expanded silhouette image _A1 is processed by another processing method to form a silhouette image. Output with image or raw silhouette image A ₀ (=A). In this embodiment, an expanded silhouette image _A1 and a raw silhouette image _A0 are output for each viewpoint.

図２は、前記シルエット加工部１０２の膨張加工部１０２ａによる膨張加工の一例を示した図であり、シルエット画像A₀の輪郭を外方へ一様に数ピクセルずつ拡張する膨張加工により膨張シルエット画像A₁が生成されている。本実施形態では、シルエット画像A₀の各ピクセルを5×5ピクセルのサイズまで拡張することで膨張シルエット画像A₁が生成される。 FIG. 2 is a diagram showing an example of expansion processing by the expansion processing unit 102a of the silhouette processing unit 102. The expanded silhouette image is obtained by expanding the outline of the silhouette image _A0 outward by several pixels uniformly. A ₁ is generated. In this embodiment, the expanded silhouette image _A1 is generated by expanding each pixel of the silhouette image _A0 to a size of 5×5 pixels.

前記膨張加工部１０２ａによるシルエット画像の膨張量は全てのシルエット画像に一様であっても良い。しかしながら、同一の被写体に関する多視点映像の各シルエット画像を比較した場合、画角の中で被写体のサイズが大きい、換言すれば、シルエットサイズが大きいシルエット画像の膨張量をシルエットサイズが小さいシルエット画像の膨張量よりも大きくした方が欠損をより正確に排除できることが経験的に認められる。したがって、シルエットサイズあるいは当該シルエットサイズを代表できるカメラからその被写体までの距離に応じて、被写体ごとに各シルエット画像の膨張量を適応的に変更しても良い。 The expansion amount of the silhouette image by the expansion processing unit 102a may be uniform for all the silhouette images. However, when comparing the silhouette images of the multi-view video regarding the same subject, the amount of expansion of the silhouette image with the larger silhouette size in the angle of view, in other words, the expansion amount of the silhouette image with the smaller silhouette size, is It is empirically recognized that defects can be eliminated more accurately by increasing the expansion amount. Therefore, the expansion amount of each silhouette image may be adaptively changed for each subject according to the silhouette size or the distance from the camera that can represent the silhouette size to the subject.

ただし、野球中継のように被写体の多くが選手や審判などの人物であり、その大きさに大差が無い環境下では、同一人物の多視点映像に係るシルエット画像であるか否かを問わず、全てのシルエット画像の膨張量を単純にそのシルエットサイズあるいはカメラと被写体との距離のみに基づいて適応的に変更しても良い。 However, in an environment such as a live broadcast of a baseball game where most of the subjects are people such as players and referees, and there is not much difference in the size of the subjects, regardless of whether it is a silhouette image related to multi-view video of the same person, The expansion amount of all silhouette images may be adaptively changed simply based on the silhouette size or the distance between the camera and the subject.

なお、膨張シルエット画像A₁を生成する際の膨張量を増せば欠損部分が減少するので品質向上が期待できる反面、処理する粗いボクセルの数が増加するため、高解像ボクセルモデルの生成をスキップできる領域が減ってしまう可能性がある。したがって、膨張量は欠損を十分に抑制できる範囲内で最小限にすることが望ましい。 If the amount of dilation when generating the dilated silhouette image _A1 is increased, the missing parts will be reduced, so quality improvement can be expected. The available space may be reduced. Therefore, it is desirable to minimize the expansion amount within a range that can sufficiently suppress defects.

低解像ボクセルモデル計算部１０３は、単位ボクセルサイズ（本実施形態では、ボクセルグリッドの一辺の長さ）が相対的に大きい第１サイズM₁となるボクセルグリッドを配置した３次元空間に、前記各膨張シルエット画像A₁を用いた視体積交差法により膨張した視体積を形成する。 The _low -resolution voxel model calculation unit 103 performs the above-described An expanded visual volume is formed by the visual volume intersection method using each expanded silhouette image _A1 .

低解像ボクセルモデル計算部１０３は更に、この視体積に対して各ボクセルの隣接関係を基に連結成分を計算し、連結している領域を一つの被写体のモデルとみなすことで、単位ボクセルサイズが第１サイズM₁の低解像ボクセルモデルMD_Lを計算する。連結領域のラベリングには任意の既存手法を用いることができるが、例えば、非特許文献６に開示されたラベリング手法を採用すれば連結成分を効率的に計算できる。 The low-resolution voxel model calculation unit 103 further calculates a connected component based on the adjacency relationship of each voxel in this visual volume, and regards the connected area as a model of one subject, so that the unit voxel size computes a low-resolution voxel model _MDL of a first size _M1 . Any existing method can be used for labeling the connected regions. For example, if the labeling method disclosed in Non-Patent Document 6 is adopted, the connected components can be efficiently calculated.

本実施形態では、第１サイズM₁が5cmに設定され、3Dモデル生成の対象範囲（本実施形態では、野球グランド全体）に単位ボクセルサイズが5cmとなるボクセルグリッドを配置し、ボクセルグリッドごとに3Dモデルを形成するか否かを視体積交差法に基づき判定する。視体積交差法は、n枚のシルエット画像を3次元ワールド座標に投影した際の視錐体の共通部分を視体積（Visual Hull）VH(I)として獲得するものであり、以下の式で示される。 In this embodiment, the first size _M1 is set to 5 cm, a voxel grid with a unit voxel size of 5 cm is arranged in the target range of 3D model generation (the entire baseball field in this embodiment), and each voxel grid Whether or not to form a 3D model is determined based on the visual volume intersection method. The visual volume intersection method acquires the common part of the visual frustum when n silhouette images are projected onto the 3D world coordinates as the visual volume (Visual Hull) VH(I), which is expressed by the following formula. be

上式(1)において、集合Iはシルエット画像の集合であり、Viはi番目のカメラから得られるシルエット画像から計算される視錐体である。また、通常はn枚全てのシルエット画像の共通部分がモデル化されるが、n-1枚が共通する場合にモデル化するなど、モデル化に用いるシルエット画像の数は変更してもよい。なお、モデル化に用いるシルエット画像数を減じると、一部のシルエット画像で被写体が欠けた場合にも3Dモデルの復元が可能になる一方、ノイズが多くなるなどの副作用が現れる可能性がある。 In the above equation (1), set I is a set of silhouette images, and Vi is a viewing frustum calculated from silhouette images obtained from the i-th camera. In addition, although the common part of all n silhouette images is usually modeled, the number of silhouette images used for modeling may be changed, such as modeling when n-1 images are common. Reducing the number of silhouette images used for modeling makes it possible to restore the 3D model even if the subject is missing in some of the silhouette images, but it may cause side effects such as increased noise.

3Dバウンディングボックス生成部１０４は、図３に示したように、各低解像ボクセルモデルMD_Lを内包する3DバウンディングボックスBBをそれぞれ生成する。本実施形態では、図４に示したように、膨張加工されたシルエット画像A₁を用いて視体積交差エリアE₁が生成されるので、その大きさが従来技術（図８）による視体積交差エリアE₂よりも大きくなる。 The 3D bounding box generator 104 generates a 3D bounding box BB containing each low-resolution voxel model _MDL , as shown in FIG. In this embodiment, as shown in FIG. 4, the dilated silhouette image _A1 is used to generate the visual volume intersection area _E1 . Larger than Area _E2 .

その結果、従来技術では3DバウンディングボックスBBの外側領域と判定されていた２つの粗いボクセルグリッドBB₁，BB₂が3DバウンディングボックスBBの内側領域に追加されることとなり、前記劣後箇所にも細かなボクセルが生成されるようになる。 As a result, the two coarse voxel grids _{BB1 and} _BB2 , which were determined to be the outer regions of the 3D bounding box BB in the conventional technology, are added to the inner region of the 3D bounding box BB. Voxels will be generated.

高解像ボクセルモデル計算部１０５は、3Dバウンディングボックス生成部１０４が生成した3DバウンディングボックスBBの内部の狭い領域のみに対して、あるいは前記低解像ボクセルモデルMD_Lが生成されるボクセルグリッドのみに対して、単位ボクセルサイズが相対的に小さい第２サイズM₂（M₂<M₁）となるボクセルグリッドを配置し、前記未加工のシルエットマスクA₀を用いた視体積交差法により高解像ボクセルモデルMD_Hを生成する。 The high-resolution voxel model calculation unit 105 applies only to the narrow region inside the 3D bounding box BB generated by the 3D bounding box generation unit 104, or only to the voxel grid from which the low-resolution voxel model _MDL is generated. On the other hand, a voxel grid having a second size M ₂ (M ₂ <M ₁ ) with a relatively small unit voxel size is arranged, and high resolution is obtained by the visual volume intersection method using the raw silhouette mask A ₀ Generate a voxel model MD _H.

3Dモデル出力部１０６は、高解像ボクセルモデル計算部１０５で得られた高解像ボクセルモデルMD_Hに基づいて3Dモデルを出力する機能を有する。高解像ボクセルモデルMD_Hは多数のボクセルで形成されるボリュームデータであるが、一般的に3Dモデルデータはポリゴンモデルとして扱う方が都合の良いケースも多い。このとき、例えばマーチンキューブ法などのボクセルモデルをポリゴンモデルに変換する手法を用いてボクセルモデルをポリゴンモデルに変換する機能を具備し、ポリゴンモデルとして3Dモデルを出力する機能を有していてもよい。 The 3D model output unit 106 has a function of outputting a 3D model based on the high resolution voxel model MD _H obtained by the high resolution voxel model calculation unit 105 . The high-resolution voxel model _MDH is volume data formed by a large number of voxels, but in general, there are many cases where it is more convenient to handle 3D model data as a polygon model. At this time, it may have a function of converting a voxel model into a polygon model using a method for converting a voxel model into a polygon model, such as the Martin Cube method, and have a function of outputting a 3D model as a polygon model. .

本実施形態によれば、膨張シルエット画像を用いた視体積交差法により計算した低解像ボクセルモデルの領域を対象に3Dモデル用の高解像ボクセルモデルを計算するので、小さな被写体を消失させることなく、出力する3Dモデルの表面近傍部分における欠損の発生を抑えられるようになる。 According to this embodiment, since a high-resolution voxel model for a 3D model is calculated for a region of a low-resolution voxel model calculated by the visual volume intersection method using an expanded silhouette image, a small object can be eliminated. Therefore, it is possible to suppress the occurrence of defects in the vicinity of the surface of the output 3D model.

図５は、本発明の第２実施形態に係る3Dモデル生成装置１の主要部の構成を示したブロック図であり、図６は、第２実施形態による3Dモデルの生成手順を模式的に示した図である。図５，６において、前記と同一の符号は同一もしくは同等部分を表しているので、その説明は省略する。 FIG. 5 is a block diagram showing the configuration of the main part of the 3D model generation device 1 according to the second embodiment of the present invention, and FIG. 6 schematically shows the 3D model generation procedure according to the second embodiment. It is a diagram. In FIGS. 5 and 6, the same reference numerals as above denote the same or equivalent parts, so description thereof will be omitted.

本実施形態では、シルエット画像加工部１０２が、前記第１加工部１０２ａに加えて、各シルエット画像Aの輪郭を退縮加工して退縮シルエット画像A₂を生成する退縮加工部１０２ｂを少なくとも含み、膨張シルエット画像A₁および退縮シルエット画像A₂を、他の加工方法により加工されたシルエット画像または未加工のシルエット画像A₀（=A）と共に出力する。本実施形態では、視点ごとに膨張シルエット画像A₁、退縮シルエット画像A₂および未加工のシルエット画像A₀が出力される。 In this embodiment, the silhouette image processing unit 102 includes, in addition to the first processing unit 102a, at least a reduction processing unit 102b for generating a reduced silhouette image _A2 by reducing the outline of each silhouette image A. The silhouette image A ₁ and the reduced silhouette image A ₂ are output together with the silhouette image processed by another processing method or the unprocessed silhouette image A ₀ (=A). In this embodiment, an expanded silhouette image A ₁ , a contracted silhouette image A ₂ and an unprocessed silhouette image A ₀ are output for each viewpoint.

第２低解像ボクセルモデル計算部１０７は、3Dバウンディングボックス生成部１０４が膨張シルエット画像A₁に基づいて生成した3DバウンディングボックスBBの内側領域、または第１低解像ボクセルモデル計算部１０３が計算した第１低解像ボクセルモデルMD_L1が生成されるボクセルグリッドのみに対して、単位ボクセルサイズが前記第２サイズM₂よりも相対的に大きい第３サイズM₃（M₃>M₂）となるボクセルグリッドを配置し、前記退縮シルエット画像A₂を用いた視体積交差法により第２低解像ボクセルモデルMD_L2を生成する。前記第１サイズM₁と第３サイズM₃とは同一サイズでも良いし、異なるサイズでも良い。 The second low-resolution voxel model calculation unit 107 calculates the inner area of the 3D bounding box BB generated by the 3D bounding box generation unit 104 based on the dilated silhouette image _A1 , or the first low-resolution voxel model calculation unit 103. a third size M ₃ (M ₃ >M ₂ ) in which the unit voxel size is relatively larger than the second size M ₂ only for the voxel grid on which the first low-resolution voxel model MD _L1 is generated; A second low-resolution voxel model MD _L2 is generated by the visual volume intersection method using the reduced silhouette image _A2 . The first size _M1 and the third size _M3 may be the same size or may be different sizes.

この処理は、結果的に生成される3Dモデル自体を縮退させ、一段階小さい3Dモデルを生成する処理と類似することから、そのボクセルは3Dモデルの内側のみに存在するボクセルであり、高解像ボクセルモデル計算を不要と見なすことができる。 This process reduces the resulting 3D model itself and is similar to the process of generating a 3D model that is one step smaller. Voxel model calculations can be considered unnecessary.

高解像ボクセルモデル計算部１０５は、前記第１低解像ボクセルモデルMD_L1が生成されるボクセルグリッドまたはその3DバウンディングボックスBBの内側領域であって、かつ第２低解像ボクセルモデルMD_L2が生成されるボクセルグリッドを除いた、3Dモデルの表面近傍部分のみに対して、単位ボクセルサイズが第２サイズM₂（M₂<M₁，M₂<M₃）となるボクセルグリッドを配置し、前記未加工のシルエットマスクA₀を用いた視体積交差法により高解像ボクセルモデルMD_Hを生成する。 The high-resolution voxel model calculation unit 105 is a voxel grid in which the first low-resolution voxel model MD _L1 is generated or a region inside its 3D bounding box BB, and the second low-resolution voxel model MD _L2 is arranging a voxel grid with a unit voxel size of a second size _M2 ( _M2 < _M1 , _M2 < _M3 ) only for the near-surface portion of the 3D model, excluding the generated voxel grid; A high-resolution voxel model MD _H is generated by the visual volume intersection method using the raw silhouette mask A ₀ .

本実施形態によれば、視体積交差法の計算を行う必要のある細かいボクセルグリッドの数を3Dモデルの輪郭領域のみに限定することができ、第１実施形態と比べてその数を減じることができるので、計算処理の高速化が可能になる。 According to this embodiment, it is possible to limit the number of fine voxel grids for which the visual volume intersection method needs to be calculated to only the contour region of the 3D model, and the number can be reduced as compared with the first embodiment. Therefore, it is possible to speed up the calculation process.

なお、第２実施形態では細かいボクセルグリッドを用いた視体積交差法の計算領域を3Dモデルの表面近傍部分に限定できる一方、シルエット画像Aの輪郭を膨張および縮退させて膨張シルエット画像A₁および退縮シルエット画像A₂を生成する処理時間が別途に必要となるから、処理時間を第１実施形態との比較で必ず短縮できるとは限らない。例えば、サイズの小さい物体や、細長い物体が多く存在するようなシーンでは、第１実施形態よりも処理時間が長くなり得る。 In the second embodiment, the computational area of the visual volume intersection method using fine voxel grids can be limited to the surface _vicinity of the 3D model. Since additional processing time is required to generate the silhouette image _A2 , the processing time cannot always be shortened compared to the first embodiment. For example, in a scene in which many small-sized objects or elongated objects exist, the processing time may be longer than in the first embodiment.

そこで、形状が単純かつサイズの大きい被写体が多く存在するシーンでは第２実施形態を採用する一方、サイズの小さい物体や、細長い物体が多く存在するようなシーンには第１実施形態を採用するなど、各実施形態を使い分けることが望ましい。 Therefore, the second embodiment is adopted for a scene in which many objects with simple shapes and large sizes are present, while the first embodiment is adopted for a scene in which many small-sized objects or elongated objects are present. , it is desirable to use each embodiment properly.

なお、退縮シルエット画像A₂を生成する際の縮退量も固定値に限らず、前記膨張シルエット画像A₁を生成する際の膨張量と同様に、入力映像や未加工のシルエット画像Aに応じて適応的に変更しても良い。 Note that the contraction amount when generating the contracted silhouette image _A2 is not limited to _a fixed value. It may be changed adaptively.

図７では、第１実施形態において、低解像のボクセルサイズを5cm、高解像のボクセルサイズを2cmとしてモデル生成を行い、シルエット画像の輪郭部分（被写体が存在する白色部分）の各ピクセルを5×5ピクセルのサイズまで拡張することで、欠損のないモデル生成ができている。 In FIG. 7, in the first embodiment, a model is generated with a low-resolution voxel size of 5 cm and a high-resolution voxel size of 2 cm. By expanding it to a size of 5 x 5 pixels, a model without defects can be generated.

１…3Dモデル生成装置，２…カメラ，１０１…シルエット画像取得部，１０２…シルエット画像加工部，１０２ａ…膨張加工部，１０２ｂ…退縮加工部，１０３…（第１）低解像ボクセルモデル計算部，１０４…3Dバウンディングボックス生成部，１０５…高解像ボクセルモデル計算部，１０６…3Dモデル出力部，１０７…第２低解像ボクセルモデル計算部，A，A₀…シルエット画像（未加工），A₁…膨張シルエット画像，A₂…退出シルエット画像，M₁…単位ボクセルの第１サイズ，M₂…単位ボクセルの第２サイズ，M₃…単位ボクセルの第３サイズ Reference Signs List 1 3D model generation device 2 camera 101 silhouette image acquisition unit 102 silhouette image processing unit 102a expansion processing unit 102b retraction processing unit 103 (first) low-resolution voxel model calculation unit , 104... 3D bounding box generation unit, 105... high resolution voxel model calculation unit, 106... 3D model output unit, 107... second low resolution voxel model calculation unit, A, _A0 ... silhouette image (unprocessed), _A ₁ _. _{_} _{_}

Claims

In a 3D model generation device that generates a 3D model of a subject from multi-view images,
a means for acquiring a silhouette image for each viewpoint from multi-view video;
expansion processing means for expanding a contour of the silhouette image to generate an expanded silhouette image;
low-resolution voxel model calculation means for calculating a low-resolution voxel model having a first unit voxel size by a visual volume intersection method using an expanded silhouette image;
A high-resolution voxel for calculating a high-resolution voxel model having a second size smaller than the first size by a visual volume intersection method using the silhouette image, targeting the region of the low-resolution voxel model. a model computing means;
and means for outputting a 3D model of a subject based on the high resolution voxel model.

further comprising means for generating a 3D bounding box for each of said low resolution voxel models;
2. The 3D model generating apparatus according to claim 1, wherein said high resolution voxel model calculation means calculates a second size high resolution voxel model within said 3D bounding box.

a reduction process means for generating a reduced silhouette image by reducing the outline of the silhouette image;
a second low-resolution voxel model calculation means for calculating a low-resolution voxel model of a third size, the voxel size of which is larger than the second size, by a visual volume intersection method using the reduced silhouette image;
The high-resolution voxel model calculation means targets a near-surface portion that is included in the region of the low-resolution voxel model with the voxel size of the first size but is not included in the region of the low-resolution voxel model with the third size. 3. The 3D model generation device according to claim 2, wherein the high-resolution voxel model is calculated in .

4. The 3D model generation device according to claim 3, wherein the first size and the third size of the voxel size are the same.

5. The 3D model generating apparatus according to claim 1, wherein said expansion means adaptively changes the amount of expansion of each silhouette image according to the size of the silhouette.

6. The 3D according to claim 5, wherein the expansion processing means adaptively changes the expansion amount of the expansion processing for each silhouette image according to the silhouette size for each multi-viewpoint image of the same subject. model generator.

6. The expansion processing means makes the amount of expansion processing for a silhouette image having a relatively large silhouette size larger than the expansion amount for a silhouette image having a relatively small silhouette size. Or the 3D model generation device according to 6.

8. The 3D model generation device according to claim 1, wherein said dilation processing means adaptively changes the dilation amount of the dilation processing according to the shape of the silhouette image.

In a 3D model generation method in which a computer generates a 3D model of a subject from multi-view images,
Acquire a silhouette image for each viewpoint from multi-view video,
expanding the outline of the silhouette image to generate an expanded silhouette image;
Calculate a low-resolution voxel model with a unit voxel size of the first size by the visual volume intersection method using the expanded silhouette image,
Calculating a high-resolution voxel model of a second size whose unit voxel size is smaller than the first size by a visual volume intersection method using the silhouette image, targeting the region of the low-resolution voxel model;
A 3D model generation method, characterized by outputting a 3D model of a subject based on the high resolution voxel model.

Acquiring a reduced silhouette image by reducing the outline of the silhouette image,
calculating a low-resolution voxel model of a third size, the voxel size of which is larger than the second size, by a visual volume intersection method using the reduced silhouette image;
A high-resolution voxel model for a near-surface portion that is included in a region of a low-resolution voxel model with a first size of the voxel size but not included in a region of a low- resolution voxel model with a third size of the voxel size 10. The 3D model generation method according to claim 9, wherein the calculation of .