Ma et al., 2024 - Google Patents
A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep LearningMa et al., 2024
View PDF- Document ID
- 7506900857947491707
- Author
- Ma J
- Li X
- Wang Z
- Zhang X
- Yan S
- Chen Y
- Zhang Y
- Jin M
- Jiang L
- Liang Y
- Yang C
- Lin D
- Publication year
- Publication venue
- Proceedings of the 61st ACM/IEEE Design Automation Conference
External Links
Snippet
As deep learning empowers various fields, many domain-specific non-neural network operators have been proposed to improve the accuracy of deep learning models. Researchers often use the imperative programming diagram (PyTorch) to express these …
- 238000013135 deep learning 0 title abstract description 27
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/433—Dependency analysis; Data or control flow analysis
- G06F8/434—Pointers; Aliasing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/445—Exploiting fine grain parallelism, i.e. parallelism at instruction level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/436—Semantic checking
- G06F8/437—Type checking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/51—Source to source
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/31—Programming languages or programming paradigms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/75—Structural analysis for program understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/35—Model driven
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3612—Software analysis for verifying properties of programs by runtime analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Vasilache et al. | The next 700 accelerated layers: From mathematical expressions of network computation graphs to accelerated gpu kernels, automatically | |
| Ardalani et al. | Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance | |
| Wu et al. | Autotuning polybench benchmarks with llvm clang/polly loop optimization pragmas using bayesian optimization | |
| Radoi et al. | Translating imperative code to MapReduce | |
| Oancea et al. | Logical inference techniques for loop parallelization | |
| Chatarasi et al. | An extended polyhedral model for SPMD programs and its use in static data race detection | |
| Doerfert et al. | Polly's polyhedral scheduling in the presence of reductions | |
| De Carvalho et al. | Kernelfarer: replacing native-code idioms with high-performance library calls | |
| Henriksen et al. | Size slicing: a hybrid approach to size inference in Futhark | |
| Cheramangalath et al. | Falcon: A graph manipulation language for heterogeneous systems | |
| Fonseca et al. | Automatic parallelization: Executing sequential programs on a task-based parallel runtime | |
| Sampaio et al. | Divergence analysis | |
| Henriksen et al. | Bounds checking: An instance of hybrid analysis | |
| Potter et al. | Kernel composition in SYCL | |
| Kruse et al. | DeLICM: scalar dependence removal at zero memory cost | |
| Kataev et al. | Additional parallelization of existing MPI programs using SAPFOR | |
| Barua et al. | Ompmemopt: Optimized memory movement for heterogeneous computing | |
| Ma et al. | A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning | |
| Wang et al. | Automatic scoping of task clauses for the OpenMP tasking model | |
| Shen et al. | A machine learning method to variable classification in OpenMP | |
| Hoefler et al. | Automatic complexity analysis of explicitly parallel programs | |
| Peccerillo et al. | Task-dag support in single-source PHAST library: Enabling flexible assignment of tasks to cpus and gpus in heterogeneous architectures | |
| Shashidhar et al. | Lighthouse: An automatic code generator for graph algorithms on gpus | |
| Hall et al. | Scheduling languages: A past, present, and future taxonomy | |
| Lou et al. | Automatic Static Analysis-Guided Optimization of CUDA Kernels |