性能测试

测量和基准测试 Tiny-DL-Inference 推理性能。

概述

性能测试对于优化至关重要。Tiny-DL-Inference 提供 Benchmark 工具类来测量：

算子执行时间
端到端推理延迟
内存使用
统计指标（平均值、中位数、标准差）

使用 Benchmark 类

基本用法

typescript

import { Benchmark } from 'tiny-dl-inference';

const benchmark = new Benchmark();

// 测量单次运行
const result = await benchmark.measure(async () => {
  await operator.forward([input]);
  await context.sync();
});

console.log(`耗时: ${result.timeMs.toFixed(2)} ms`);

多次运行统计

typescript

const stats = await benchmark.runMultiple(async () => {
  await operator.forward([input]);
  await context.sync();
}, { runs: 100 });

console.log('统计结果:');
console.log(`平均值: ${stats.mean.toFixed(2)} ms`);
console.log(`中位数: ${stats.median.toFixed(2)} ms`);
console.log(`标准差: ${stats.stdDev.toFixed(2)} ms`);
console.log(`P99: ${stats.p99.toFixed(2)} ms`);

算子基准测试

对比不同算子

typescript

import { Conv2dOperator, Conv2dBiasReLUOperator } from 'tiny-dl-inference';

const benchmark = new Benchmark();

// 测试分离算子
const separateStats = await benchmark.runMultiple(async () => {
  const conv = new Conv2dOperator(context);
  const relu = new ReLUOperator(context);
  const convOut = await conv.forward([input, weights]);
  const output = await relu.forward([convOut]);
  await context.sync();
  convOut.destroy();
  output.destroy();
}, { runs: 50 });

// 测试融合算子
const fusedStats = await benchmark.runMultiple(async () => {
  const convBiasRelu = new Conv2dBiasReLUOperator(context);
  const output = await convBiasRelu.forward([input, weights, bias]);
  await context.sync();
  output.destroy();
}, { runs: 50 });

console.log('分离算子:', separateStats.mean.toFixed(2), 'ms');
console.log('融合算子:', fusedStats.mean.toFixed(2), 'ms');
console.log('加速比:', (separateStats.mean / fusedStats.mean).toFixed(2) + 'x');

端到端推理测试

完整模型延迟测量

typescript

import { InferenceEngine } from 'tiny-dl-inference';

const benchmark = new Benchmark();
const engine = new InferenceEngine(context);

// 加载模型
await engine.loadModel(modelUrl);

// 测量端到端延迟
const stats = await benchmark.runMultiple(async () => {
  const output = await engine.infer(inputTensor);
  await context.sync();
  output.destroy();
}, { runs: 100 });

console.log('端到端推理:');
console.log(`平均延迟: ${stats.mean.toFixed(2)} ms`);
console.log(`P99 延迟: ${stats.p99.toFixed(2)} ms`);

Chrome DevTools 性能分析

使用 Performance API

typescript

// 在浏览器中
performance.mark('infer-start');
const output = await engine.infer(inputTensor);
performance.mark('infer-end');
performance.measure('inference', 'infer-start', 'infer-end');

const measure = performance.getEntriesByName('inference')[0];
console.log(`耗时: ${measure.duration.toFixed(2)} ms`);

WebGPU 时间查询

typescript

// 使用时间查询进行精确的 GPU 测量
const querySet = device.createQuerySet({
  type: 'timestamp',
  count: 2
});

const encoder = device.createCommandEncoder();
encoder.writeTimestamp(querySet, 0); // 开始时间
// ... 计算操作 ...
encoder.writeTimestamp(querySet, 1); // 结束时间

基准测试最佳实践

准确测量

预热 GPU：先运行几次再开始测量
同步等待：每次测量后调用 context.sync()
多次运行：至少运行 50 次以获得稳定统计
关闭其他应用：减少系统干扰
监控热节流：长时间运行可能导致 GPU 降频

报告指标

始终报告：

平均值：总体性能
中位数：典型情况
P99：最坏情况
标准差：一致性

typescript

console.log(`性能报告:
  平均值: ${stats.mean.toFixed(2)} ± ${stats.stdDev.toFixed(2)} ms
  中位数: ${stats.median.toFixed(2)} ms
  P95: ${stats.p95.toFixed(2)} ms
  P99: ${stats.p99.toFixed(2)} ms
  运行次数: ${stats.runs}
`);

性能测试 ​

概述 ​

使用 Benchmark 类 ​

基本用法 ​

多次运行统计 ​

算子基准测试 ​

对比不同算子 ​

端到端推理测试 ​

完整模型延迟测量 ​

Chrome DevTools 性能分析 ​

使用 Performance API ​

WebGPU 时间查询 ​

基准测试最佳实践 ​

准确测量 ​

报告指标 ​

相关资源 ​

性能测试

概述

使用 Benchmark 类

基本用法

多次运行统计

算子基准测试

对比不同算子

端到端推理测试

完整模型延迟测量

Chrome DevTools 性能分析

使用 Performance API

WebGPU 时间查询

基准测试最佳实践

准确测量

报告指标

相关资源