Ma et al., 2024 - Google Patents
When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language modelsMa et al., 2024
View PDF- Document ID
- 3066496559366068505
- Author
- Ma X
- Bhalgat Y
- Smart B
- Chen S
- Li X
- Ding J
- Gu J
- Chen D
- Peng S
- Bian J
- Torr P
- Pollefeys M
- Nießner M
- Reid I
- Chang A
- Laina I
- Prisacariu V
- Publication year
- Publication venue
- arXiv preprint arXiv:2405.10255
External Links
Snippet
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the …
- 238000010197 meta-analysis 0 title abstract description 4
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Petrovich et al. | Temos: Generating diverse human motions from textual descriptions | |
Miao et al. | Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real | |
Xu et al. | Deep learning for free-hand sketch: A survey | |
Firoozi et al. | Foundation models in robotics: Applications, challenges, and the future | |
Zhan et al. | Multimodal image synthesis and editing: A survey | |
Huang et al. | Multi-view transformer for 3d visual grounding | |
Zhan et al. | Rsvg: Exploring data and models for visual grounding on remote sensing data | |
US10825227B2 (en) | Artificial intelligence for generating structured descriptions of scenes | |
Francis et al. | Core challenges in embodied vision-language planning | |
Ma et al. | When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models | |
Park et al. | Visual language navigation: A survey and open challenges | |
Guo et al. | Facial expressions recognition with multi-region divided attention networks for smart education cloud applications | |
Mi et al. | Object affordance based multimodal fusion for natural human-robot interaction | |
Wu et al. | MPCT: Multiscale point cloud transformer with a residual network | |
Shi et al. | Intelligent layout generation based on deep generative models: A comprehensive survey | |
Liu et al. | A survey on text-guided 3D visual grounding: elements, recent advances, and future directions | |
Huang et al. | Applications of large scale foundation models for autonomous driving | |
Fan et al. | A vision-language-guided robotic action planning approach for ambiguity mitigation in human–robot collaborative manufacturing | |
Yang et al. | Language-aware vision transformer for referring segmentation | |
Fu et al. | Exploring the interplay between video generation and world models in autonomous driving: A survey | |
Yuan et al. | A survey of recent 3D scene analysis and processing methods | |
Lin et al. | Advances in embodied navigation using large language models: A survey | |
Ma et al. | Attentional bias for hands: Cascade dual‐decoder transformer for sign language production | |
Mu | Pose Estimation‐Assisted Dance Tracking System Based on Convolutional Neural Network | |
Ma et al. | Dance action generation model based on recurrent neural network |