Skip to content

CUDA SGEMM WHITEPAPER · ARCHITECTURE SITE · KERNEL ACADEMY

A CUDA SGEMM project that reads like a defended technical case

This site is built for interviewers and advanced GitHub readers who care about more than “one fast kernel”. It frames the repository as a chain of architectural claims, optimization decisions, validation boundaries, and research lineage. Read it as a whitepaper first, then as an academy.

5-stage kernel laddercuBLAS-grounded validationEN / ZH mirrored routes
Whitepaper map connecting overview, architecture, academy, validation, and research around the SGEMM kernel ladder.

The public narrative is organized like a technical argument: thesis, architecture, academy, proof, then lineage.

Project thesis
Optimization must stay explainable
Each kernel exists because it changes one bottleneck class, not because another benchmark screenshot was possible.
Audience contract
Readable under interview pressure
The site is written so an interviewer can audit the design, a candidate can defend it, and a CUDA reader can keep digging.
Trust model
CI is structural, GPU is empirical
Repository health, docs checks, and Pages fitness live in automation. Runtime correctness and performance still belong to real hardware.

Read this site by intent

I need the 90-second project brief

Open the guide first, then jump to architecture if you need the system story behind the summary.

I need to understand why each kernel exists

Start with the ladder and memory model before opening the academy pages that inspect each stage in detail.

I care about proof, not posture

Use validation when you want the correctness policy, benchmark scope, and reproducibility boundary before trusting any number.

I want lineage and comparative context

Use the research desk for papers, related repositories, and notes on how this project’s current shape emerged.

The whitepaper spine

SurfaceWhat it answersWhy it exists
OverviewWhat is this project, why does it matter, how should I read it?Gives reviewers and new readers one decisive orientation surface.
ArchitectureHow is the SGEMM system structured, and what are its core invariants?Turns implementation detail into a defendable system map.
AcademyHow do I study the optimization ladder in a rigorous order?Packages the repository as a curriculum, not a pile of notes.
ValidationWhat can the evidence prove, and what can it not prove?Keeps the project technically honest.
ResearchWhere do these ideas come from, and what should I compare against?Adds academic and comparative depth.

Architecture, rendered as a controlled figure

Kernel ladder moving from naive FP32 to tiled, bank-free, double-buffer, and Tensor Core WMMA, with architecture, validation, and research rails.

The ladder is not a trophy rack. It is a map of bottleneck shifts, interface constraints, and evidence requirements.

What makes this presentation different

  1. It treats SGEMM as a technical argument, not a showcase.
  2. It separates architecture, academy, validation, and research so each page has a single job.
  3. It uses mirrored English and Chinese routes because public depth is part of the project, not an afterthought.

Start from the repository if needed

MIT Licensed