基准测试

本仓库的 benchmark 层主要由类与 helper 组成，而不是一组根包级独立 benchmark 函数。

`BenchmarkSuite`

BenchmarkSuite(
    warmup_runs: int = 10,
    benchmark_runs: int = 100,
    rtol: float = 1e-3,
    atol: float = 1e-5,
)

主要方法：

CorrectnessVerifier(rtol: float = 1e-3, atol: float = 1e-5)

常用方法：

其中 verify 会返回更详细的统计信息，比如最大绝对误差、平均相对误差和违规元素数量。

位于 triton_ops.benchmark.correctness：

如果你不想走 BenchmarkSuite，这些函数也适合单独做数值验证。

triton_ops.benchmark.report 定义了：

PerformanceReport 支持：

基准测试模块中的专用 benchmark 方法是 GPU 导向的，因为它们会直接在 CUDA 上分配测试张量。若只是做仓库健康检查，应优先使用测试、lint、类型检查与构建命令，而不是 GPU benchmark。