[go: up one dir, main page]

 
 
applsci-logo

Journal Browser

Journal Browser

New Insights into Computer Vision and Graphics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2025 | Viewed by 6976

Special Issue Editor


E-Mail Website
Guest Editor
Department of Information Engineering, China University of Geosciences, Wuhan 430075, China
Interests: computer vision; deep learning; image and video understanding

Special Issue Information

Dear Colleagues,

Application trends, device technologies, and the blurring of boundaries between disciplines are propelling information technology forward. This poses new challenges in the study of visual computing-based interactive graphics processing technology. Therefore, this Special Issue intends to presentation new ideas and experimental discoveries in the field of computer vision and graphics, from its design, service, and theory, to its applications.

Computer vision and graphics focus on the computational processing and applications of visual data. Areas relevant to computer vision and graphics include, but are not limited to, robotics, medical imaging, security and surveillance, gaming and entertainment, education and training, art and design, environmental monitoring, etc. High-speed processing techniques and real-time performance, developing and refining deep learning techniques for computer vision and graphics applications, and explainable AI techniques to improve the transparency and interpretability of AI models are all topics of interest.

This Special Issue will publish high-quality, original research papers in overlapping fields, including the following:

  • Image processing/analysis;
  • Computer vision theory and application;
  • Video and audio encoding;
  • Motion detection and tracking;
  • Reconstruction and representation;
  • Facial and hand gesture recognition;
  • Rendering techniques;
  • Matching, inference, and recognition;
  • Geometric modeling;
  • 3D vision;
  • Graph-based learning and applications.

Dr. Yuanyuan Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing/analysis
  • computer vision theory and application
  • video and audio encoding
  • motion detection and tracking
  • reconstruction and representation
  • facial and hand gesture recognition
  • rendering techniques
  • matching, inference, and recognition
  • geometric modeling
  • 3D vision
  • graph-based learning and applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 5326 KiB  
Article
6-DoF Pose Estimation from Single RGB Image and CAD Model Retrieval Using Feature Similarity Measurement
by Sieun Park, Won-Je Jeong, Mayura Manawadu and Soon-Yong Park
Appl. Sci. 2025, 15(3), 1501; https://doi.org/10.3390/app15031501 - 1 Feb 2025
Viewed by 542
Abstract
This study presents six degrees of freedom (6-DoF) pose estimation of an object from a single RGB image and retrieval of the matching CAD model by measuring the similarity between RGB and CAD rendering images. The 6-DoF pose estimation of an RGB object [...] Read more.
This study presents six degrees of freedom (6-DoF) pose estimation of an object from a single RGB image and retrieval of the matching CAD model by measuring the similarity between RGB and CAD rendering images. The 6-DoF pose estimation of an RGB object is one of the important techniques in 3D computer vision. However, in addition to 6-DoF pose estimation, retrieval and alignment of the matching CAD model with the RGB object should be performed for various industrial applications such as eXtended Reality (XR), Augmented Reality (AR), robot’s pick and place, and so on. This paper addresses 6-DoF pose estimation and CAD model retrieval problems simultaneously and quantitatively analyzes how much the 6-DoF pose estimation affects the CAD model retrieval performance. This study consists of two main steps. The first step is 6-DoF pose estimation based on the PoseContrast network. We enhance the structure of PoseConstrast by adding variance uncertainty weight and feature attention modules. The second step is the retrieval of the matching CAD model by an image similarity measurement between the CAD rendering and the RGB object. In our experiments, we used 2000 RGB images collected from Google and Bing search engines and 100 CAD models from ShapeNetCore. The Pascal3D+ dataset is used to train the pose estimation network and DELF features are used for the similarity measurement. Comprehensive ablation studies about the proposed network show the quantitative performance analysis with respect to the baseline model. Experimental results show that the pose estimation performance has a positive correlation with the CAD retrieval performance. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

Figure 1
<p>The architecture of the proposed 6-DoF pose estimation and CAD retrieval method.</p>
Full article ">Figure 2
<p>Representation of 3-DoF rotation from the reference to the camera coordinate systems.</p>
Full article ">Figure 3
<p>For CAD model retrieval, 10 CAD object categories with 10 subcategories are used.</p>
Full article ">Figure 4
<p>Example of 20 RGB images collected for a <span class="html-italic">clock</span> CAD model.</p>
Full article ">Figure 5
<p>Architecture of our proposed pose estimation network.</p>
Full article ">Figure 6
<p>An example of quaternion-based loss function (<span class="html-italic">α</span> = 1, GT index = 0). The <span class="html-italic">X</span>-axis represents the bin index, and the <span class="html-italic">y</span>-axis represents loss value.</p>
Full article ">Figure 7
<p>Rendering of 10 CAD subcategories using the pose estimation result of a <span class="html-italic">bench</span> object.</p>
Full article ">Figure 8
<p>Projection result of a car CAD model.</p>
Full article ">Figure 9
<p>Samples of CAD renderings and RGB images. From the left, renderings from PoseContrast, the proposed method, and the RGB object. The qualitative comparison is (<b>a</b>) Better, (<b>b</b>) SS, (<b>c</b>) Worse, and (<b>d</b>) Bad.</p>
Full article ">Figure 10
<p>Samples of CAD renderings and RGB images. From the left, renderings from the proposed method, PosefromShape, and RGB object. The qualitative comparison is (<b>a</b>) Better, (<b>b</b>) SS, (<b>c</b>) Worse, and (<b>d</b>) Bad.</p>
Full article ">
21 pages, 7424 KiB  
Article
Neural Network Ensemble to Detect Dicentric Chromosomes in Metaphase Images
by Ignacio Atencia-Jiménez, Adayabalam S. Balajee, Miguel J. Ruiz-Gómez, Francisco Sendra-Portero, Alegría Montoro and Miguel A. Molina-Cabello
Appl. Sci. 2024, 14(22), 10440; https://doi.org/10.3390/app142210440 - 13 Nov 2024
Viewed by 1111
Abstract
The Dicentric Chromosome Assay (DCA) is widely used in biological dosimetry, where the number of dicentric chromosomes induced by ionizing radiation (IR) exposure is quantified to estimate the absorbed radiation dose an individual has received. Dicentric chromosome scoring is a laborious and time-consuming [...] Read more.
The Dicentric Chromosome Assay (DCA) is widely used in biological dosimetry, where the number of dicentric chromosomes induced by ionizing radiation (IR) exposure is quantified to estimate the absorbed radiation dose an individual has received. Dicentric chromosome scoring is a laborious and time-consuming process which is performed manually in most cytogenetic biodosimetry laboratories. Further, dicentric chromosome scoring constitutes a bottleneck when several hundreds of samples need to be analyzed for dose estimation in the aftermath of large-scale radiological/nuclear incident(s). Recently, much interest has focused on automating dicentric chromosome scoring using Artificial Intelligence (AI) tools to reduce analysis time and improve the accuracy of dicentric chromosome detection. Our study aims to detect dicentric chromosomes in metaphase plate images using an ensemble of artificial neural network detectors suitable for datasets that present a low number of samples (in this work, only 50 images). In our approach, the input image is first processed by several operators, each producing a transformed image. Then, each transformed image is transferred to a specific detector trained with a training set processed by the same operator that transformed the image. Following this, the detectors provide their predictions about the detected chromosomes. Finally, all predictions are combined using a consensus function. Regarding the operators used, images were binarized separately applying Otsu and Spline techniques, while morphological opening and closing filters with different sizes were used to eliminate noise, isolate specific components, and enhance the structures of interest (chromosomes) within the image. Consensus-based decisions are typically more precise than those made by individual networks, as the consensus method can rectify certain misclassifications, assuming that individual network results are correct. The results indicate that our methodology worked satisfactorily in detecting a majority of chromosomes, with remarkable classification performance even with the low number of training samples utilized. AI-based dicentric chromosome detection will be beneficial for a rapid triage by improving the detection of dicentric chromosomes and thereby the dose prediction accuracy. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

Figure 1
<p>Schema of the overall proposal’s operation. The input image is preprocessed using multiple operators. Each preprocessed image is supplied independently on a deep neural network (DNN) model. Predictions released by each model are finally merged by ensembling and evaluated with a consensus function, producing the final output prediction.</p>
Full article ">Figure 2
<p>Example of two images from the dataset, where normal chromosomes and chromosomes with two centromeres (dicentric) are presented.</p>
Full article ">Figure 3
<p>Histogram of image 2Gy-023 with Spline curve and thresholds obtained from local maxima (black peak on the <b>left</b> and white peak on the <b>right</b>).</p>
Full article ">Figure 4
<p>Histogram of images 2Gy-068 and 2Gy-360 with Spline curve and thresholds obtained from local maxima (black peak on the <b>left</b> and white peak on the <b>right</b>). Note that three peaks are presented.</p>
Full article ">Figure 5
<p>Preprocessing techniques applied on example image. First row: original image. Second row: comparison between filter sizes for image 2Gy-023. In this example, the thresholding technique (Spline) and the type of morphological filter (closing) have been kept constant. Third row: comparison between opening and closing morphological filters for image 2Gy-023. In this example, the thresholding technique (Spline) and the filter size (3 × 3) were kept constant. Fourth and fifth rows: comparison between filter sizes for image 2Gy-023. In this example, the thresholding technique (Spline) and the type of morphological filter (closing) have been kept constant.</p>
Full article ">Figure 6
<p>Model performance for the 2 × 2 Spline closing method. It can be noted that loss function decreases over the epochs, whereas the recall, precision and mAP50 metrics increase.</p>
Full article ">Figure 7
<p>Prediction for the chromosomes of the image 2Gy-329 for the experiment where the Spline thresholding technique with 2 × 2 closing filter was employed. In green: predictions as ‘non-dicentric’; in red: predictions as ‘dicentric’.</p>
Full article ">Figure 8
<p>Composition of 2Gy-329 image predictions for a dicentric chromosome. In green: predictions as ‘non-dicentric’; in red: predictions as ‘dicentric’.</p>
Full article ">Figure 9
<p>Composition of 2Gy-329 image predictions for a non-dicentric chromosome. In green: predictions as ‘non-dicentric’; in red: predictions as ‘dicentric’.</p>
Full article ">Figure 10
<p>Schema of the ensemble operation. In a first phase, it is shown whether each ground truth chromosome is detected by the model. Then, in a second phase, for those chromosomes that were detected, it is shown whether the model classifies correctly as dicentric or non-dicentric.</p>
Full article ">Figure 11
<p>Detection performance (spatial accuracy) achieved by the baseline method (no operation), the best methods for each combination of binarization technique and morphological filter (nc5, no2, sc2 and so3), and the best ensemble methods (consensus 0.1, 0.2 and 0.3).</p>
Full article ">Figure 12
<p>Composition of 2Gy-023 image predictions for a dicentric chromosome. In green: predictions as ‘non-dicentric’; in red: predictions as ‘dicentric’.</p>
Full article ">
18 pages, 1493 KiB  
Article
Hypergraph Position Attention Convolution Networks for 3D Point Cloud Segmentation
by Yanpeng Rong, Liping Nong, Zichen Liang, Zhuocheng Huang, Jie Peng and Yiping Huang
Appl. Sci. 2024, 14(8), 3526; https://doi.org/10.3390/app14083526 - 22 Apr 2024
Viewed by 1711
Abstract
Point cloud segmentation, as the basis for 3D scene understanding and analysis, has made significant progress in recent years. Graph-based modeling and learning methods have played an important role in point cloud segmentation. However, due to the inherent complexity of point cloud data, [...] Read more.
Point cloud segmentation, as the basis for 3D scene understanding and analysis, has made significant progress in recent years. Graph-based modeling and learning methods have played an important role in point cloud segmentation. However, due to the inherent complexity of point cloud data, it is difficult to capture higher-order and complex features of 3D data using graph learning methods. In addition, how to quickly and efficiently extract important features from point clouds also poses a great challenge to the current research. To address these challenges, we propose a new framework, called hypergraph position attention convolution networks (HGPAT), for point cloud segmentation. Firstly, we use hypergraph to model the higher-order relationships among point clouds. Secondly, in order to effectively learn the feature information of point cloud data, a hyperedge position attention convolution module is proposed, which utilizes the hyperedge–hyperedge propagation pattern to extract and aggregate more important features. Finally, we design a ResNet-like module to reduce the computational complexity of the network and improve its efficiency. We have conducted point cloud segmentation experiments on the ShapeNet Part and S3IDS datasets, and the experimental results demonstrate the effectiveness of the proposed method compared with the state-of-the-art ones. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

Figure 1
<p>Point cloud construction hypergraph process.</p>
Full article ">Figure 2
<p>Point cloud segmentation framework based on hypergraph position attention convolution networks. (<b>a</b>) denotes the ResNet-like module, (<b>b</b>) denotes the hypergraph position attention convolution module, and (<b>c</b>) denotes the skip connection module.</p>
Full article ">Figure 3
<p>Accuracy of different methods.</p>
Full article ">Figure 4
<p>Visualization of the semantic segmentation results of the S3IDS dataset on Area 5.</p>
Full article ">Figure 5
<p>mIoU changes on the validation set.</p>
Full article ">Figure 6
<p>The accuracy change on the test set when D is 16 and 32.</p>
Full article ">Figure 7
<p>Influence of the number of sampling points.</p>
Full article ">
13 pages, 12039 KiB  
Article
Camera Path Generation for Triangular Mesh Using Toroidal Patches
by Jinyoung Choi, Kangmin Kim, Seongil Kim, Minseok Kim, Taekgwan Nam and Youngjin Park
Appl. Sci. 2024, 14(2), 490; https://doi.org/10.3390/app14020490 - 5 Jan 2024
Viewed by 1265
Abstract
Triangular mesh data structures are principal in computer graphics, serving as the foundation for many 3D models. To effectively utilize these 3D models across diverse industries, it is important to understand the model’s overall shape and geometric features thoroughly. In this work, we [...] Read more.
Triangular mesh data structures are principal in computer graphics, serving as the foundation for many 3D models. To effectively utilize these 3D models across diverse industries, it is important to understand the model’s overall shape and geometric features thoroughly. In this work, we introduce a novel method for generating camera paths that emphasize the model’s local geometric characteristics. This method uses a toroidal patch-based spatial data structure, approximating the mesh’s faces within a predetermined tolerance ϵ, encapsulating their geometric intricacies. This facilitates the determination of the camera position and gaze path, ensuring the mesh’s key characteristics are captured. During the path construction, we create a bounding cylinder for the mesh, project the mesh’s faces and associated toroidal patches onto the cylinder’s lateral surface, and sequentially select grids of the cylinder containing the highest number of toroidal patches as we traverse the lateral surface. The centers of the selected grids are used as control points for a periodic B-spline curve, which serves as our foundational path. After initial curve generation, we generated camera position and gaze path from the curve by multiplying factors to ensure a uniform camera amplitude. We applied our method to ten triangular mesh models, demonstrating its effectiveness and adaptability across various mesh configurations. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

Figure 1
<p>Camera path generation examples in 3d modeling software: (<b>a</b>) process for camera path in Blender 3.5, (<b>b</b>) process for camera path in 3ds Max 2024.</p>
Full article ">Figure 2
<p><math display="inline"><semantics> <mrow> <msub> <mi>C</mi> <mrow> <mi>h</mi> <mo>,</mo> <mi>r</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>u</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> </mrow> </mrow> </semantics></math> about various triangular mesh models. In sequence from left to right and then top to bottom: Bear, Rocker, Horse, Kitten, Bunny, Haris, Venus, Popiersie, Armadillo, and Pieta.</p>
Full article ">Figure 3
<p>Example of control point selection process: (<b>a</b>) grids with their weights, (<b>b</b>) selected grids with maximum weight, and (<b>c</b>) selected control points for generating periodic B-spline curve.</p>
Full article ">Figure 4
<p>Camera path visualization for the Rocker model. The camera gaze path (shown in blue) and the camera position path (shown in green) are generated based on the initial curve (shown in red).</p>
Full article ">Figure 5
<p>Camera path generation workflow: (<b>a</b>) smoothing result of the input mesh, (<b>b</b>) colored faces according to included toroidal patch, (<b>c</b>) colored faces according to their weight, (<b>d</b>) colored faces according to corresponding cylinder grid’s weight, and (<b>e</b>) camera gaze path generation result.</p>
Full article ">Figure 6
<p>Snapshot of each model that the camera is at the starting point of the generated camera path.</p>
Full article ">Figure 7
<p>Ten selected triangular mesh models with bounding cylinder and camera gaze path: (<b>a</b>) Bear, (<b>b</b>) Rocker, (<b>c</b>) Horse, (<b>d</b>) Kitten, (<b>e</b>) Bunny, (<b>f</b>) Haris, (<b>g</b>) Venus, (<b>h</b>) Popiersie, (<b>i</b>) Armadillo, and (<b>j</b>) Pieta.</p>
Full article ">Figure 8
<p>Visualization of our camera paths for the models. The camera gaze path (shown in blue) and the camera position path (shown in green) are generated based on the initial third-order periodic B-spline curve (shown in red).</p>
Full article ">
28 pages, 4448 KiB  
Article
ED2IF2-Net: Learning Disentangled Deformed Implicit Fields and Enhanced Displacement Fields from Single Images Using Pyramid Vision Transformer
by Xiaoqiang Zhu, Xinsheng Yao, Junjie Zhang, Mengyao Zhu, Lihua You, Xiaosong Yang, Jianjun Zhang, He Zhao and Dan Zeng
Appl. Sci. 2023, 13(13), 7577; https://doi.org/10.3390/app13137577 - 27 Jun 2023
Cited by 1 | Viewed by 1456
Abstract
There has emerged substantial research in addressing single-view 3D reconstruction and the majority of the state-of-the-art implicit methods employ CNNs as the backbone network. On the other hand, transformers have shown remarkable performance in many vision tasks. However, it is still unknown whether [...] Read more.
There has emerged substantial research in addressing single-view 3D reconstruction and the majority of the state-of-the-art implicit methods employ CNNs as the backbone network. On the other hand, transformers have shown remarkable performance in many vision tasks. However, it is still unknown whether transformers are suitable for single-view implicit 3D reconstruction. In this paper, we propose the first end-to-end single-view 3D reconstruction network based on the Pyramid Vision Transformer (PVT), called ED2IF2-Net, which disentangles the reconstruction of an implicit field into the reconstruction of topological structures and the recovery of surface details to achieve high-fidelity shape reconstruction. ED2IF2-Net uses a Pyramid Vision Transformer encoder to extract multi-scale hierarchical local features and a global vector of the input single image, which are fed into three separate decoders. A coarse shape decoder reconstructs a coarse implicit field based on the global vector, a deformation decoder iteratively refines the coarse implicit field using the pixel-aligned local features to obtain a deformed implicit field through multiple implicit field deformation blocks (IFDBs), and a surface detail decoder predicts an enhanced displacement field using the local features with hybrid attention modules (HAMs). The final output is a fusion of the deformed implicit field and the enhanced displacement field, with four loss terms applied to reconstruct the coarse implicit field, structure details through a novel deformation loss, overall shape after fusion, and surface details via a Laplacian loss. The quantitative results obtained from the ShapeNet dataset validate the exceptional performance of ED2IF2-Net. Notably, ED2IF2-Net-L stands out as the top-performing variant, exhibiting the highest mean IoU, CD, EMD, ECD-3D, and ECD-2D scores, reaching impressive values of 61.1, 7.26, 2.51, 6.08, and 1.84, respectively. The extensive experimental evaluations consistently demonstrate the state-of-the-art capabilities of ED2IF2-Net in terms of reconstructing topological structures and recovering surface details, all while maintaining competitive inference time. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

Figure 1
<p>The overall pipeline of the proposed <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net, where <span class="html-italic">P</span> is a 3D query point and <math display="inline"><semantics><mrow><mi>π</mi><mo>(</mo><mo>·</mo><mo>)</mo></mrow></semantics></math> represents the operation of projecting a 3D spatial query point to an image. PVT Enc means Pyramid Vision Transformer encoder. Coa Dec, Def Dec, and Sur Dec denote Coarse Shape Decoder, Deformation Decoder, and Surface Detail Decoder, respectively. <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net first extracts a global vector and the local features of the input image via a Pyramid Vision Transformer encoder. The global vector is used in a coarse shape decoder to predict a coarse implicit field, which is then iteratively refined by a deformation decoder to obtain a deformed implicit field with finer structure details using multiple implicit field deformation blocks (IFDBs). A surface detail decoder with hybrid attention modules (HAMs) uses local features to recover an enhanced displacement field. The final output of <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net is a fusion of the deformed implicit field and the enhanced displacement field. Four combined loss terms are applied to reconstruct the coarse implicit field, structure details, overall shape, and surface details.</p>
Full article ">Figure 2
<p>An illustrative description of our disentanglement. <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net disentangles the ground-truth SDF of the chair into a deformed implicit field and an enhanced displacement field (visible surface), where the deformed implicit field is obtained by refining the coarse implicit field of the object. The red arrows in the deformed implicit field represent the deformation function <span class="html-italic">f</span> from the coarse implicit field (green part) to the deformed implicit field (containing most of the topological structures of the object).</p>
Full article ">Figure 3
<p>Architecture of the deformation decoder, where <math display="inline"><semantics><msub><mi>s</mi><mi>j</mi></msub></semantics></math> and <math display="inline"><semantics><msub><mi>c</mi><mi>j</mi></msub></semantics></math> represent the intermediate implicit field and the state code of the <math display="inline"><semantics><msup><mi>j</mi><mrow><mi>t</mi><mi>h</mi></mrow></msup></semantics></math> IFDB output, respectively.</p>
Full article ">Figure 4
<p>Illustrations of the <math display="inline"><semantics><msup><mi>j</mi><mrow><mi>t</mi><mi>h</mi></mrow></msup></semantics></math> IFDB, where Concat means concatenation operation.</p>
Full article ">Figure 5
<p>Architecture of the surface detail decoder.</p>
Full article ">Figure 6
<p>Qualitative comparison of various methods for single-view 3D reconstruction on ShapeNet.</p>
Full article ">Figure 7
<p>Visualization of the qualitative ablation studies of <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net-T. It is best viewed magnified on the screen.</p>
Full article ">Figure 8
<p>Examples of reconstruction from online images through <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net.</p>
Full article ">Figure 9
<p>Two examples of surface detail transfer using <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net, where the backrest details of the source chair are transferred.</p>
Full article ">Figure 10
<p>Examples about pasting a logo using <math display="inline"><semantics><mrow><msup><mi>ED</mi><mn>2</mn></msup><msup><mi>IF</mi><mn>2</mn></msup></mrow></semantics></math>-Net.</p>
Full article ">Figure A1
<p>Performance comparison of models with different batch_size; other settings remain fixed. Evaluation metrics include IoU, CD, EMD, ECD-3D, and ECD-2D.</p>
Full article ">Figure A2
<p>Performance of the models with different learning rates is compared with batch_size set to 16 and other settings kept fixed. Evaluation metrics are IoU, CD, EMD, ECD-3D, and ECD-2D.</p>
Full article ">
Back to TopTop