[go: up one dir, main page]

Ma et al., 2024 - Google Patents

When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models

Ma et al., 2024

View PDF
Document ID
3066496559366068505
Author
Ma X
Bhalgat Y
Smart B
Chen S
Li X
Ding J
Gu J
Chen D
Peng S
Bian J
Torr P
Pollefeys M
Nießner M
Reid I
Chang A
Laina I
Prisacariu V
Publication year
Publication venue
arXiv preprint arXiv:2405.10255

External Links

Snippet

As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation

Similar Documents

Publication Publication Date Title
Petrovich et al. Temos: Generating diverse human motions from textual descriptions
Miao et al. Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real
Xu et al. Deep learning for free-hand sketch: A survey
Firoozi et al. Foundation models in robotics: Applications, challenges, and the future
Zhan et al. Multimodal image synthesis and editing: A survey
Huang et al. Multi-view transformer for 3d visual grounding
Zhan et al. Rsvg: Exploring data and models for visual grounding on remote sensing data
US10825227B2 (en) Artificial intelligence for generating structured descriptions of scenes
Francis et al. Core challenges in embodied vision-language planning
Ma et al. When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models
Park et al. Visual language navigation: A survey and open challenges
Guo et al. Facial expressions recognition with multi-region divided attention networks for smart education cloud applications
Mi et al. Object affordance based multimodal fusion for natural human-robot interaction
Wu et al. MPCT: Multiscale point cloud transformer with a residual network
Shi et al. Intelligent layout generation based on deep generative models: A comprehensive survey
Liu et al. A survey on text-guided 3D visual grounding: elements, recent advances, and future directions
Huang et al. Applications of large scale foundation models for autonomous driving
Fan et al. A vision-language-guided robotic action planning approach for ambiguity mitigation in human–robot collaborative manufacturing
Yang et al. Language-aware vision transformer for referring segmentation
Fu et al. Exploring the interplay between video generation and world models in autonomous driving: A survey
Yuan et al. A survey of recent 3D scene analysis and processing methods
Lin et al. Advances in embodied navigation using large language models: A survey
Ma et al. Attentional bias for hands: Cascade dual‐decoder transformer for sign language production
Mu Pose Estimation‐Assisted Dance Tracking System Based on Convolutional Neural Network
Ma et al. Dance action generation model based on recurrent neural network