Benchmarking
Measuring and optimizing performance in Tiny-DL-Inference.
Overview
Tiny-DL-Inference includes a built-in benchmarking system for measuring operator and inference performance.
Using the Benchmark Class
Basic Usage
typescript
import { Benchmark, InferenceEngine } from 'tiny-dl-inference';
const engine = new InferenceEngine();
await engine.initialize();
const bench = new Benchmark(engine['context']);
// Measure single operation
const time = await bench.measureOperation(async () => {
return await someOperator.forward([input], params);
});
console.log(`Operation took ${time}ms`);1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
Multiple Runs
typescript
// Run multiple iterations for accuracy
const times = await bench.measureRepeated(
async () => await operator.forward([input], params),
100 // iterations
);
console.log(`Average: ${times.avg}ms`);
console.log(`Min: ${times.min}ms`);
console.log(`Max: ${times.max}ms`);1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Benchmarking Operators
Conv2d Performance
typescript
import { Conv2d } from 'tiny-dl-inference';
const conv2d = new Conv2d(context);
await conv2d.forward([input, weights], params); // Warm-up
const time = await bench.measureOperation(async () => {
return await conv2d.forward([input, weights], params);
});1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
Fused vs Non-Fused
typescript
// Measure fused
const fusedTime = await bench.measureOperation(async () => {
return await fusedOp.forward([input, weights, bias], params);
});
// Measure sequential
const seqTime = await bench.measureOperation(async () => {
const c = await conv2d.forward([input, weights], params);
const b = await bias.add(c, biasTensor);
return await relu.forward([b], params);
});
console.log(`Speedup: ${(seqTime / fusedTime).toFixed(2)}×`);1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
End-to-End Benchmarks
MNIST Inference
typescript
const engine = new InferenceEngine();
await engine.initialize();
await engine.loadModel(mnistModel);
// Warm-up
await engine.infer(testInput);
// Measure
const inferenceTime = await bench.measureOperation(async () => {
return await engine.infer(input);
});
console.log(`Inference: ${inferenceTime}ms`);1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
4
5
6
7
8
9
10
11
12
13
Throughput
typescript
const batchSize = 32;
const inputs = prepareBatch(batchSize);
const totalTime = await bench.measureOperation(async () => {
for (const input of inputs) {
await engine.infer(input);
}
});
const throughput = batchSize / (totalTime / 1000);
console.log(`Throughput: ${throughput.toFixed(1)} inferences/sec`);1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
Performance Metrics
Key Metrics
| Metric | Description |
|---|---|
| Latency | Time for single inference |
| Throughput | Inferences per second |
| Memory | GPU memory usage |
| Utilization | GPU compute utilization |
Expected Performance
| Model | Typical Latency |
|---|---|
| MNIST (1x28x28) | < 5ms |
| Small CNN | < 20ms |
| Medium CNN | < 100ms |
Optimization Tips
1. Use Kernel Fusion
typescript
// Before: 3 separate operations
conv2d → bias → relu
// After: 1 fused operation
conv2dBiasReLU1
2
3
4
5
2
3
4
5
2. Batch Processing
typescript
// Instead of single inferences
for (const input of inputs) {
await engine.infer(input);
}
// Batch multiple inputs
const batchInput = stackTensors(inputs);
await engine.infer(batchInput);1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
3. Minimize Data Transfer
typescript
// Bad: CPU ↔ GPU frequently
for (const input of inputs) {
const tensor = engine.tensorFromArray(input, shape);
const output = await engine.infer(tensor);
const result = await output.download();
}
// Good: Keep data on GPU
const tensor = engine.tensorFromArray(allInputs, batchShape);
const output = await engine.infer(tensor);
const results = await output.download();1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
Chrome DevTools
GPU Profiling
- Open Chrome DevTools (F12)
- Go to Performance tab
- Enable "GPU" in settings
- Record while running inference
WebGPU Inspector
Use browser extensions or chrome://gpu for detailed GPU information.
Benchmark Utilities
Timing Helpers
typescript
// Simple timing
const start = performance.now();
await operation();
const elapsed = performance.now() - start;1
2
3
4
2
3
4
Memory Tracking
typescript
// Estimate memory usage
const tensorMemory = tensor.size * 4; // 4 bytes per float32
const totalMemory = tensors.reduce((sum, t) => sum + t.size * 4, 0);1
2
3
2
3
API Reference
See the Utilities API Reference for the complete Benchmark API.