[go: up one dir, main page]

Ma et al., 2024 - Google Patents

A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

Ma et al., 2024

View PDF
Document ID
7506900857947491707
Author
Ma J
Li X
Wang Z
Zhang X
Yan S
Chen Y
Zhang Y
Jin M
Jiang L
Liang Y
Yang C
Lin D
Publication year
Publication venue
Proceedings of the 61st ACM/IEEE Design Automation Conference

External Links

Snippet

As deep learning empowers various fields, many domain-specific non-neural network operators have been proposed to improve the accuracy of deep learning models. Researchers often use the imperative programming diagram (PyTorch) to express these …
Continue reading at dl.acm.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4442Reducing the number of cache misses; Data prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • G06F8/434Pointers; Aliasing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/451Code distribution
    • G06F8/452Loops
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • G06F8/437Type checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/51Source to source
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/35Model driven
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/04Inference methods or devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions

Similar Documents

Publication Publication Date Title
Vasilache et al. The next 700 accelerated layers: From mathematical expressions of network computation graphs to accelerated gpu kernels, automatically
Ardalani et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
Wu et al. Autotuning polybench benchmarks with llvm clang/polly loop optimization pragmas using bayesian optimization
Radoi et al. Translating imperative code to MapReduce
Oancea et al. Logical inference techniques for loop parallelization
Chatarasi et al. An extended polyhedral model for SPMD programs and its use in static data race detection
Doerfert et al. Polly's polyhedral scheduling in the presence of reductions
De Carvalho et al. Kernelfarer: replacing native-code idioms with high-performance library calls
Henriksen et al. Size slicing: a hybrid approach to size inference in Futhark
Cheramangalath et al. Falcon: A graph manipulation language for heterogeneous systems
Fonseca et al. Automatic parallelization: Executing sequential programs on a task-based parallel runtime
Sampaio et al. Divergence analysis
Henriksen et al. Bounds checking: An instance of hybrid analysis
Potter et al. Kernel composition in SYCL
Kruse et al. DeLICM: scalar dependence removal at zero memory cost
Kataev et al. Additional parallelization of existing MPI programs using SAPFOR
Barua et al. Ompmemopt: Optimized memory movement for heterogeneous computing
Ma et al. A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning
Wang et al. Automatic scoping of task clauses for the OpenMP tasking model
Shen et al. A machine learning method to variable classification in OpenMP
Hoefler et al. Automatic complexity analysis of explicitly parallel programs
Peccerillo et al. Task-dag support in single-source PHAST library: Enabling flexible assignment of tasks to cpus and gpus in heterogeneous architectures
Shashidhar et al. Lighthouse: An automatic code generator for graph algorithms on gpus
Hall et al. Scheduling languages: A past, present, and future taxonomy
Lou et al. Automatic Static Analysis-Guided Optimization of CUDA Kernels