Skip to content

Reference Map

This page is a structured index of the external sources that back the claims in this whitepaper. Each entry is classified by type and linked to the section it supports most directly.

Primary technical references

CUDA and GPU architecture

SourceWhat it establishesRelevant section
CUDA C++ Programming GuideMemory hierarchy, warp execution model, shared memory layoutArchitecture, Academy
CUDA Best Practices GuideMemory coalescing, occupancy, bank conflict avoidanceAcademy (kernel pages)
PTX ISA ReferenceWMMA instruction semantics, matrix fragment layoutTensor Core path

cuBLAS

SourceWhat it establishesRelevant section
cuBLAS Developer GuideGEMM API, precision modes, leading-dimension conventionsValidation (oracle definition)

Tensor Core / WMMA

SourceWhat it establishesRelevant section
WMMA API documentationFragment types, load/store/compute APIAcademy (kernel-tensor-core), Architecture (tensor-core-path)
Volta architecture whitepaperFirst-generation Tensor Core throughput modelResearch (evolution), Performance model

Foundational papers

PaperContributionPrimary support for
Goto & van de Geijn (2008) — Anatomy of High-Performance Matrix MultiplicationHierarchical blocking theory for GEMM on CPUsTiled kernel design, shared-memory staging rationale
Lai & Seznec (2013) — Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUsGPU SGEMM tiling and occupancy analysisTiled kernel, double-buffer motivation
Whaley & Dongarra (1998) — ATLASAutomated tuning of block sizesHistorical context for tile-size sensitivity
Markidis et al. (2018) — NVIDIA Tensor Core Programmability, Performance & PrecisionWMMA programming model and mixed-precision behaviorTensor Core path design
RepositoryRelationshipNotes
CUTLASSAuthoritative production GEMM kernel libraryThe ceiling above which this project does not claim to compete
tinygrad / BEAM SGEMMCommunity SGEMM explorationDifferent educational framing; useful for contrast
siboehm/CUDA-GEMM-OptimizationStep-by-step SGEMM tutorialMost directly comparable educational structure
wangzyon/NVIDIA_SGEMM_PRACTICEChinese-language SGEMM practice repositoryBilingual contrast; different kernel progression

How to use this map

This reference map is not a bibliography to be cited at the end of a paper. It is a live index that connects each claim in the whitepaper to its supporting source.

If you want to challenge a claim:

  1. Find the section in the whitepaper that makes the claim.
  2. Find the supporting source in the table above.
  3. Open the source and check whether the claim is appropriately scoped.

If the claim is not in the table, it is either derived from the implementation itself (verifiable by reading the code) or it is an open question explicitly labeled as such in the text.

MIT Licensed