API Reference
This section documents the maintained API surface and adjacent support modules used by the repository.
Root-package exports
The root package exports the public user-facing surface from triton_ops.__init__.
from triton_ops import (
fused_rmsnorm_rope,
fused_gated_mlp,
fp8_gemm,
quantize_fp8,
dequantize_fp8,
FusedRMSNormRoPE,
FusedGatedMLP,
FP8Linear,
TritonAutoTuner,
ConfigCache,
BenchmarkSuite,
)
Knowledge areas
Kernels Core compute paths Fused RMSNorm + RoPE, fused Gated MLP, FP8 GEMM, and module wrappers. Quantization FP8 storage and scaling Round-trip helpers, scale semantics, and the overflow-handling helper path. Autotuning Search, cache, and metrics `TritonAutoTuner`, `ConfigCache`, config spaces, and performance metrics. Benchmark Verification and reports `BenchmarkSuite`, `CorrectnessVerifier`, report objects, and benchmark helpers. Models Dataclasses and result containers `TensorSpec`, input specs, `KernelMetrics`, `TuningResult`, and `FP8Format`. Validation Input checks and constraints Shape, dtype, contiguity, device, and scalar-parameter validation helpers. Errors Exception hierarchy Device, dtype, shape, tuning, and overflow failure types with attached metadata.
Important scope note
Some helper functions live in submodules without being exported at the root package. The API pages call out those import paths explicitly when relevant.