Skip to content

References

This list maps project decisions to authoritative technical sources.

CUDA and GPU fundamentals

Why this group matters:

  • Defines execution model assumptions used by all kernel stages.
  • Anchors memory and synchronization discussions in official terminology.

Tensor Core and WMMA

Why this group matters:

  • Supports WMMA fragment, alignment, and mixed-precision behavior discussion.
  • Explains why fallback policies are necessary for non-friendly shapes.

GEMM optimization research and methodology

Why this group matters:

  • Connects this project's staged optimization mindset to broader GEMM methodology.
  • Provides production-grade references for interview follow-up discussions.

Profiling and performance analysis

Why this group matters:

  • Supports diagnosis beyond single GFLOPS outputs.
  • Enables metric-driven explanations of bottlenecks and trade-offs.

Engineering process and validation discipline

Why this group matters:

  • Grounds the repository's correctness and workflow claims in established tooling.
  • Reinforces the local-GPU vs hosted-CI validation boundary model.

MIT Licensed