Skip to content

Learning Path

Follow the optimization ladder in the order the repository was designed to teach it

What each stage teaches

Naive -> Tiled

  • Thread/block mapping
  • Memory coalescing
  • Shared-memory reuse

Tiled -> Bank-Free

  • 32-bank shared-memory behavior
  • Why [32][33] matters

Bank-Free -> Double Buffer

  • Pipeline thinking
  • Tile staging and latency hiding

Double Buffer -> Tensor Core

  • WMMA fragments
  • Mixed precision
  • Safe fallback behavior for unsupported shapes

Before you start

MIT Licensed