Skip to content

CUDA SGEMM ENGINEERING NOTEBOOK

SGEMM Optimization Lab

A bilingual CUDA SGEMM case study built for two outcomes: solid learning depth and strong interview storytelling. Every optimization step is tied to correctness constraints, benchmark evidence, and explicit validation boundaries.

cuBLAS-verifiedOpenSpec-governedEN / ZH mirrored
Kernel Ladder
5
naive -> tiled -> bank-free -> double-buffer -> WMMA
Correctness Oracle
cuBLAS
separate tolerances for FP32 and Tensor Core paths
Validation Boundary
CI + GPU
hosted CI for build health, local GPU for runtime and performance
Public Surfaces
EN / 中文
mirrored pages for tutorial, interview, and references
Benchmark Scope
End-to-end and compute-only WMMA are reported separately.
Numerical Policy
FP32 and Tensor Core paths use different tolerance budgets by design.
Engineering Contract
Unified launcher signature keeps kernels swappable and testable.
Governance
OpenSpec keeps docs, process, and implementation intent aligned.

Why this repository is worth attention

Learning Depth
Progressive
Each kernel stage teaches one specific performance concept.
Evidence Model
Traceable
Speedup claims are attached to correctness checks and scope labels.
Interview Utility
Practical
The project can be explained as a clear engineering decision chain.
Community Value
Reusable
Includes playbooks, references, and architecture-aware tuning guidance.

Project map in one diagram

Choose your route

Build and run quickly

Get from clone to benchmark execution with clear local-vs-CI expectations.

Learn the optimization ladder

Understand what each stage changes in memory behavior and performance profile.

Prepare interview narrative

Use a concise storyline from architecture choices to measurable outcomes.

Validate technical lineage

Trace implementation choices to official docs, papers, and high-quality repos.

Knowledge hub

Command cockpit

bash
# Build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

# Validate
ctest --test-dir build
openspec validate --all

# Benchmark
./build/bin/sgemm_benchmark -a
./build/bin/sgemm_benchmark --dims 256 384 640

Language and entry points

MIT Licensed