Benchmarking

Measuring and optimizing performance in Tiny-DL-Inference.

Overview

Tiny-DL-Inference includes a built-in benchmarking system for measuring operator and inference performance.

Using the Benchmark Class

Basic Usage

typescript

import { Benchmark, InferenceEngine } from 'tiny-dl-inference';

const engine = new InferenceEngine();
await engine.initialize();

const bench = new Benchmark(engine['context']);

// Measure single operation
const time = await bench.measureOperation(async () => {
  return await someOperator.forward([input], params);
});

console.log(`Operation took ${time}ms`);

Multiple Runs

typescript

// Run multiple iterations for accuracy
const times = await bench.measureRepeated(
  async () => await operator.forward([input], params),
  100  // iterations
);

console.log(`Average: ${times.avg}ms`);
console.log(`Min: ${times.min}ms`);
console.log(`Max: ${times.max}ms`);

Benchmarking Operators

Conv2d Performance

typescript

import { Conv2d } from 'tiny-dl-inference';

const conv2d = new Conv2d(context);
await conv2d.forward([input, weights], params); // Warm-up

const time = await bench.measureOperation(async () => {
  return await conv2d.forward([input, weights], params);
});

Fused vs Non-Fused

typescript

// Measure fused
const fusedTime = await bench.measureOperation(async () => {
  return await fusedOp.forward([input, weights, bias], params);
});

// Measure sequential
const seqTime = await bench.measureOperation(async () => {
  const c = await conv2d.forward([input, weights], params);
  const b = await bias.add(c, biasTensor);
  return await relu.forward([b], params);
});

console.log(`Speedup: ${(seqTime / fusedTime).toFixed(2)}×`);

End-to-End Benchmarks

MNIST Inference

typescript

const engine = new InferenceEngine();
await engine.initialize();
await engine.loadModel(mnistModel);

// Warm-up
await engine.infer(testInput);

// Measure
const inferenceTime = await bench.measureOperation(async () => {
  return await engine.infer(input);
});

console.log(`Inference: ${inferenceTime}ms`);

Throughput

typescript

const batchSize = 32;
const inputs = prepareBatch(batchSize);

const totalTime = await bench.measureOperation(async () => {
  for (const input of inputs) {
    await engine.infer(input);
  }
});

const throughput = batchSize / (totalTime / 1000);
console.log(`Throughput: ${throughput.toFixed(1)} inferences/sec`);

Performance Metrics

Key Metrics

Metric	Description
Latency	Time for single inference
Throughput	Inferences per second
Memory	GPU memory usage
Utilization	GPU compute utilization

Expected Performance

Model	Typical Latency
MNIST (1x28x28)	< 5ms
Small CNN	< 20ms
Medium CNN	< 100ms

Optimization Tips

1. Use Kernel Fusion

typescript

// Before: 3 separate operations
conv2d → bias → relu

// After: 1 fused operation
conv2dBiasReLU

2. Batch Processing

typescript

// Instead of single inferences
for (const input of inputs) {
  await engine.infer(input);
}

// Batch multiple inputs
const batchInput = stackTensors(inputs);
await engine.infer(batchInput);

3. Minimize Data Transfer

typescript

// Bad: CPU ↔ GPU frequently
for (const input of inputs) {
  const tensor = engine.tensorFromArray(input, shape);
  const output = await engine.infer(tensor);
  const result = await output.download();
}

// Good: Keep data on GPU
const tensor = engine.tensorFromArray(allInputs, batchShape);
const output = await engine.infer(tensor);
const results = await output.download();

Chrome DevTools

GPU Profiling

Open Chrome DevTools (F12)
Go to Performance tab
Enable "GPU" in settings
Record while running inference

WebGPU Inspector

Use browser extensions or chrome://gpu for detailed GPU information.

Benchmark Utilities

Timing Helpers

typescript

// Simple timing
const start = performance.now();
await operation();
const elapsed = performance.now() - start;

Memory Tracking

typescript

// Estimate memory usage
const tensorMemory = tensor.size * 4; // 4 bytes per float32
const totalMemory = tensors.reduce((sum, t) => sum + t.size * 4, 0);

API Reference

See the Utilities API Reference for the complete Benchmark API.

Benchmarking ​

Overview ​

Using the Benchmark Class ​

Basic Usage ​

Multiple Runs ​

Benchmarking Operators ​

Conv2d Performance ​

Fused vs Non-Fused ​

End-to-End Benchmarks ​

MNIST Inference ​

Throughput ​

Performance Metrics ​

Key Metrics ​

Expected Performance ​

Optimization Tips ​

1. Use Kernel Fusion ​

2. Batch Processing ​

3. Minimize Data Transfer ​

Chrome DevTools ​

GPU Profiling ​

WebGPU Inspector ​

Benchmark Utilities ​

Timing Helpers ​

Memory Tracking ​

API Reference ​

Benchmarking

Overview

Using the Benchmark Class

Basic Usage

Multiple Runs

Benchmarking Operators

Conv2d Performance

Fused vs Non-Fused

End-to-End Benchmarks

MNIST Inference

Throughput

Performance Metrics

Key Metrics

Expected Performance

Optimization Tips

1. Use Kernel Fusion

2. Batch Processing

3. Minimize Data Transfer

Chrome DevTools

GPU Profiling

WebGPU Inspector

Benchmark Utilities

Timing Helpers

Memory Tracking

API Reference