References

This list maps project decisions to authoritative technical sources.

CUDA and GPU fundamentals

Why this group matters:

Defines execution model assumptions used by all kernel stages.
Anchors memory and synchronization discussions in official terminology.

Tensor Core and WMMA

Why this group matters:

Supports WMMA fragment, alignment, and mixed-precision behavior discussion.
Explains why fallback policies are necessary for non-friendly shapes.

GEMM optimization research and methodology

Why this group matters:

Connects this project's staged optimization mindset to broader GEMM methodology.
Provides production-grade references for interview follow-up discussions.

Profiling and performance analysis

Why this group matters:

Supports diagnosis beyond single GFLOPS outputs.
Enables metric-driven explanations of bottlenecks and trade-offs.

Engineering process and validation discipline

Why this group matters:

Grounds the repository's correctness and workflow claims in established tooling.
Reinforces the local-GPU vs hosted-CI validation boundary model.