|
HPC-AI-Optimization-Lab 1.0.0
High-Performance CUDA Kernels for AI/ML Workloads
|
A CUDA kernel lab for AI workloads, organized as focused modules for elementwise ops, reductions, GEMM, convolution, attention, quantization, and experimental newer-CUDA paths.
src/common/: shared CUDA utilities such as tensor wrappers, timers, launch helpers, and reduction primitivessrc/01_elementwise/ to src/07_cuda13_features/: numbered kernel modules covering elementwise ops, reductions, GEMM, convolution, attention, quantization, and experimental newer-CUDA featurestests/: GoogleTest + RapidCheck coverage across kernel modulesexamples/: shipped CUDA and Python examplespython/: nanobind bindings plus benchmark scriptsdocs/: optimization notes and Python binding docssrc/01_elementwise to src/06_quantizationThe following modules currently exist as educational or compatibility-oriented paths rather than production-grade implementations:
src/04_convolution/conv_winograd.cu: currently falls back to the validated implicit-GEMM convolution pathsrc/07_cuda13_features/tma.cu: currently uses a regular kernel copy fallbacksrc/07_cuda13_features/cluster.cu: currently uses a portable block-reduction fallbacksrc/07_cuda13_features/fp8_gemm.cu: currently demonstrates scaled float behavior rather than a true Hopper FP8 kernelThe current Python extension is named hpc_ai_opt and exposes low-level submodules such as elementwise, reduction, and gemm.
The bindings are intentionally thin:
Notes:
src/07_cuda13_features/ are not evidence of full Hopper/Blackwell feature coverage.flash_attention currently supports float with head_dim == 64 in the shipped implementation.The default GitHub Actions workflow is intentionally lightweight and currently validates:
It does not currently provide full native CUDA build-and-test coverage on GitHub-hosted runners. For native verification, run the local CMake + CTest flow shown above on a machine with a working CUDA toolchain and GPU.
docs/README.mddocs/python/index.rstdocs/01_gemm_optimization.mddocs/04_flash_attention.md