Project Highlights

This page explains why this repository stands out when compared with many SGEMM demos.

What makes it differentiated

Dimension	Typical SGEMM demo	This repository
Learning structure	One or two kernels without clear progression	Five-stage kernel ladder with explicit learning intent
Correctness discipline	Spot-check outputs or no clear oracle	cuBLAS-backed verification with separate FP32/WMMA tolerance policy
Performance claims	Single number without context	End-to-end and compute-only labels, plus scope boundaries
Engineering governance	Docs and code can drift	OpenSpec-driven alignment for docs, workflow, and requirements
Interview readiness	Hard to narrate as engineering story	Dedicated interview playbook and proof-first homepage

The optimization path is not random tuning. It is a sequence of explicit bottlenecks:

Performance and correctness are coupled in public storytelling:

The project documents what CI can and cannot prove:

Hosted CI: formatting, compile validity, repository/spec integrity, Pages buildability
Local GPU machine: runtime correctness verification and performance benchmarking

This boundary is useful in interviews because it demonstrates realistic engineering judgment.