Skip to content

Validation

This section explains why the repository's performance claims are trustworthy, and where that trust stops.

Methodology covers how we optimize. Validation covers the evidence boundary around those optimizations: correctness thresholds, benchmark scope labels, hosted-CI limits, and reproducibility expectations.

The validation model

Evidence surfaceWhat it provesWhat it does not prove
Hosted CI + local structural checksDocs/spec structure, Pages fitness, formatting/governance workflows, and repository health checksGPU runtime correctness, CUDA benchmark numbers, or hardware-specific speedups
Local ctest --test-dir build on a GPU machineRuntime correctness against the project's cuBLAS oracleUniversal performance claims
Local benchmark executionPerformance behavior on a named GPU, under a named command and scope labelResults on other GPUs, other CUDA stacks, or unlabeled workloads

Canonical validation pages

NeedPage
Understand correctness thresholds and oracle policyCorrectness Policy
Interpret benchmark labels and reported numbersBenchmark Scope
Reproduce a result responsiblyReproducibility
See a representative snapshot of resultsBenchmark Results

What hosted CI proves

Hosted CI is trusted to prove repository health: documentation structure, Pages buildability, formatting checks, and OpenSpec/governance alignment. It keeps the public surface coherent.

Hosted CI is not trusted to prove CUDA runtime behavior or benchmark performance. Those claims require a real GPU machine.

What only local GPU runs can prove

Local GPU runs are required for:

  • cuBLAS-backed runtime correctness checks
  • Tensor Core fast-path versus fallback behavior
  • benchmark numbers, including end-to-end versus compute-only differences
  • architecture-specific conclusions about occupancy, staging, and memory behavior

How to read published numbers

Treat every number in this repository as scoped evidence, not a universal promise.

  • Read the GPU model and CUDA context first.
  • Read the benchmark label second.
  • Read the shape set third.
  • Only then compare the number to another result.

If any of those fields are missing, the number is a hint, not a claim.

Common presentation mistakes

  • Claiming “Tensor Core is always faster” without shape, conversion, and fallback caveats.
  • Quoting one GFLOPS number without its benchmark label or workload scope.
  • Ignoring the numerical-tolerance difference between FP32 and mixed precision.
  • Treating hosted CI success as proof of CUDA runtime correctness or performance.

MIT Licensed