Ma et al., 2024 - Google Patents

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

Ma et al., 2024

Document ID: 7506900857947491707
Author: Ma J; Li X; Wang Z; Zhang X; Yan S; Chen Y; Zhang Y; Jin M; Jiang L; Liang Y; Yang C; Lin D
Publication year: 2024
Publication venue: Proceedings of the 61st ACM/IEEE Design Automation Conference

External Links

Cited by

Snippet

As deep learning empowers various fields, many domain-specific non-neural network operators have been proposed to improve the accuracy of deep learning models. Researchers often use the imperative programming diagram (PyTorch) to express these …

Continue reading at dl.acm.org (PDF) (other versions)

238000013135 deep learning 0 title abstract description 27

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/433—Dependency analysis; Data or control flow analysis
- G06F8/434—Pointers; Aliasing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/445—Exploiting fine grain parallelism, i.e. parallelism at instruction level
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/436—Semantic checking
- G06F8/437—Type checking
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/51—Source to source
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/31—Programming languages or programming paradigms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/75—Structural analysis for program understanding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/35—Model driven
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions

Similar Documents

Publication	Publication Date	Title
Vasilache et al.	2019	The next 700 accelerated layers: From mathematical expressions of network computation graphs to accelerated gpu kernels, automatically
Ardalani et al.	2015	Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
Wu et al.	2022	Autotuning polybench benchmarks with llvm clang/polly loop optimization pragmas using bayesian optimization
Radoi et al.	2014	Translating imperative code to MapReduce
Oancea et al.	2012	Logical inference techniques for loop parallelization
Chatarasi et al.	2016	An extended polyhedral model for SPMD programs and its use in static data race detection
Doerfert et al.	2015	Polly's polyhedral scheduling in the presence of reductions
De Carvalho et al.	2021	Kernelfarer: replacing native-code idioms with high-performance library calls
Henriksen et al.	2014	Size slicing: a hybrid approach to size inference in Futhark
Cheramangalath et al.	2015	Falcon: A graph manipulation language for heterogeneous systems
Fonseca et al.	2016	Automatic parallelization: Executing sequential programs on a task-based parallel runtime
Sampaio et al.	2014	Divergence analysis
Henriksen et al.	2014	Bounds checking: An instance of hybrid analysis
Potter et al.	2015	Kernel composition in SYCL
Kruse et al.	2018	DeLICM: scalar dependence removal at zero memory cost
Kataev et al.	2021	Additional parallelization of existing MPI programs using SAPFOR
Barua et al.	2020	Ompmemopt: Optimized memory movement for heterogeneous computing
Ma et al.	2024	A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning
Wang et al.	2015	Automatic scoping of task clauses for the OpenMP tasking model
Shen et al.	2023	A machine learning method to variable classification in OpenMP
Hoefler et al.	2014	Automatic complexity analysis of explicitly parallel programs
Peccerillo et al.	2019	Task-dag support in single-source PHAST library: Enabling flexible assignment of tasks to cpus and gpus in heterogeneous architectures
Shashidhar et al.	2016	Lighthouse: An automatic code generator for graph algorithms on gpus
Hall et al.	2024	Scheduling languages: A past, present, and future taxonomy
Lou et al.	2024	Automatic Static Analysis-Guided Optimization of CUDA Kernels