English Knowledge Base
This section is a code-accurate reference for the current repository state: public APIs, runtime contracts, performance tooling, and kernel internals.
Getting Started Install, run, and copy working snippets Start from environment setup, the first fused calls, and module-wrapper examples. API Reference Public surface and contracts Kernel signatures, quantization helpers, autotuning, benchmark classes, models, and errors. Guides Integration and performance knowledge Where to place fused ops, how to measure them, and how to use FP8 responsibly. Internals Source-level implementation context Architecture, kernel design trade-offs, and memory-traffic reduction patterns.
Reading paths
First visit
Read Installation and Quick Start.
API integration
Read Core Kernels and Integration.
Performance work
Read Benchmark, Auto-Tuning, and Performance.
Source dive
Read Architecture and Kernel Design.
Boundary reminder
- Triton kernel execution requires CUDA.
- CPU-only environments remain useful for import checks, linting, typing, build validation, and CPU-safe tests.
- The site intentionally keeps only technical knowledge pages; repository process history and changelog content are not part of the published docs.