Ma et al., 2024 - Google Patents

When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models

Ma et al., 2024

Document ID: 3066496559366068505
Author: Ma X; Bhalgat Y; Smart B; Chen S; Li X; Ding J; Gu J; Chen D; Peng S; Bian J; Torr P; Pollefeys M; Nießner M; Reid I; Chang A; Laina I; Prisacariu V
Publication year: 2024
Publication venue: arXiv preprint arXiv:2405.10255

External Links

Cited by

Snippet

As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the …

Continue reading at arxiv.org (PDF) (other versions)

238000010197 meta-analysis 0 title abstract description 4

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation

Similar Documents

Publication	Publication Date	Title
Petrovich et al.	2022	Temos: Generating diverse human motions from textual descriptions
Miao et al.	2023	Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real
Xu et al.	2022	Deep learning for free-hand sketch: A survey
Firoozi et al.	2023	Foundation models in robotics: Applications, challenges, and the future
Zhan et al.	2022	Multimodal image synthesis and editing: A survey
Huang et al.	2022	Multi-view transformer for 3d visual grounding
Zhan et al.	2023	Rsvg: Exploring data and models for visual grounding on remote sensing data
US10825227B2 (en)	2020-11-03	Artificial intelligence for generating structured descriptions of scenes
Francis et al.	2022	Core challenges in embodied vision-language planning
Ma et al.	2024	When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models
Park et al.	2023	Visual language navigation: A survey and open challenges
Guo et al.	2022	Facial expressions recognition with multi-region divided attention networks for smart education cloud applications
Mi et al.	2019	Object affordance based multimodal fusion for natural human-robot interaction
Wu et al.	2023	MPCT: Multiscale point cloud transformer with a residual network
Shi et al.	2023	Intelligent layout generation based on deep generative models: A comprehensive survey
Liu et al.	2024	A survey on text-guided 3D visual grounding: elements, recent advances, and future directions
Huang et al.	2023	Applications of large scale foundation models for autonomous driving
Fan et al.	2024	A vision-language-guided robotic action planning approach for ambiguity mitigation in human–robot collaborative manufacturing
Yang et al.	2024	Language-aware vision transformer for referring segmentation
Fu et al.	2024	Exploring the interplay between video generation and world models in autonomous driving: A survey
Yuan et al.	2021	A survey of recent 3D scene analysis and processing methods
Lin et al.	2023	Advances in embodied navigation using large language models: A survey
Ma et al.	2024	Attentional bias for hands: Cascade dual‐decoder transformer for sign language production
Mu	2022	Pose Estimation‐Assisted Dance Tracking System Based on Convolutional Neural Network
Ma et al.	2022	Dance action generation model based on recurrent neural network