Stars
collection of articles about PhD life written in 🇯🇵
The book "Performance Analysis and Tuning on Modern CPU"
Itoyori: A distributed multi-threading runtime system for global-view fork-join task parallelism
int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
Synchronize your working directory efficiently to a remote place without committing the changes.
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
stdgpu: Efficient STL-like Data Structures on the GPU
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
Dear ImGui: Bloat-free Graphical User interface for C++ with minimal dependencies
Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantia…
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Test suite for probing the numerical behavior of NVIDIA tensor cores
Important concepts in numerical linear algebra and related areas
A massively-parallel, block-sparse tensor framework written in C++
⚡ Dark powered Vim/Neovim plugin manager
Crow is very fast and easy to use C++ micro web framework (inspired by Python Flask)
Binary Neural Network Framework for FPGA(Differentiable LUT)
Embedded language for high-performance array computations