Web Integration
Learn how to integrate Tiny-DL-Inference into a web application for real-time inference in the browser.
Overview
This example demonstrates building a complete web application that:
- Loads a neural network model via the Fetch API
- Processes images from file input using Canvas
- Runs inference on WebGPU
- Displays predictions with timing information
The source code consists of an HTML file and a TypeScript module (app.ts).
HTML Setup
Create a basic HTML page with file input and result display areas:
<!DOCTYPE html>
<html>
<head>
<title>Web Inference Demo</title>
<style>
body {
font-family: system-ui, -apple-system, sans-serif;
max-width: 800px;
margin: 0 auto;
padding: 20px;
}
#status {
margin: 20px 0;
padding: 10px;
font-family: monospace;
background: #f5f5f5;
border-radius: 4px;
}
#result {
margin: 20px 0;
padding: 15px;
background: #f0f0f0;
border-radius: 4px;
}
#result h3 {
margin-top: 0;
}
.prediction {
margin: 5px 0;
}
</style>
</head>
<body>
<h1>Tiny-DL-Inference Web Demo</h1>
<input type="file" id="imageInput" accept="image/*">
<div id="status">Ready</div>
<div id="result"></div>
<script type="module" src="app.js"></script>
</body>
</html>2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
TypeScript Module
Create app.ts with the inference logic:
Step 1: Initialize the Application
import { InferenceEngine } from 'tiny-dl-inference';
class WebInferenceDemo {
private engine: InferenceEngine;
private statusEl: HTMLElement;
private resultEl: HTMLElement;
constructor() {
this.engine = new InferenceEngine();
this.statusEl = document.getElementById('status')!;
this.resultEl = document.getElementById('result')!;
}
async initialize() {
this.setStatus('Initializing WebGPU...');
try {
await this.engine.initialize();
this.setStatus('Loading model...');
const model = await this.loadModel();
await this.engine.loadModel(model);
this.setStatus('Ready for inference');
this.setupEventListeners();
} catch (error: unknown) {
const message = error instanceof Error ? error.message : 'Unknown error';
this.setStatus(`Error: ${message}`);
}
}
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Step 2: Load the Model
private async loadModel() {
const response = await fetch('model.json');
return await response.json();
}2
3
4
The model file should follow the format described in Custom Model Loading.
Step 3: Handle Image Input
private setupEventListeners() {
const input = document.getElementById('imageInput') as HTMLInputElement;
input.addEventListener('change', (e) => this.handleImageSelect(e));
}
private async handleImageSelect(event: Event) {
const file = (event.target as HTMLInputElement).files?.[0];
if (!file) return;
this.setStatus('Processing image...');
try {
const imageData = await this.loadImage(file);
const input = this.engine.tensorFromArray(imageData, [1, 3, 224, 224]);
const start = performance.now();
const output = await this.engine.infer(input);
const end = performance.now();
const predictions = await output.download();
this.displayResults(predictions, end - start);
this.setStatus('Inference complete');
} catch (error: unknown) {
const message = error instanceof Error ? error.message : 'Unknown error';
this.setStatus(`Error: ${message}`);
}
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Step 4: Load and Process Images
private async loadImage(file: File): Promise<Float32Array> {
return new Promise((resolve, reject) => {
const img = new Image();
img.onload = () => {
const canvas = document.createElement('canvas');
canvas.width = 224;
canvas.height = 224;
const ctx = canvas.getContext('2d')!;
ctx.drawImage(img, 0, 0, 224, 224);
const imageData = ctx.getImageData(0, 0, 224, 224);
// Convert RGBA to RGB and normalize to [0, 1]
const floatData = new Float32Array(3 * 224 * 224);
for (let i = 0; i < 224 * 224; i++) {
floatData[i] = imageData.data[i * 4] / 255.0; // R
floatData[i + 224 * 224] = imageData.data[i * 4 + 1] / 255.0; // G
floatData[i + 2 * 224 * 224] = imageData.data[i * 4 + 2] / 255.0; // B
}
URL.revokeObjectURL(img.src);
resolve(floatData);
};
img.onerror = () => {
URL.revokeObjectURL(img.src);
reject(new Error('Failed to load image'));
};
img.src = URL.createObjectURL(file);
});
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Step 5: Display Results
private displayResults(predictions: Float32Array, timeMs: number) {
// Get top-3 predictions
const top3 = Array.from(predictions)
.map((prob, idx) => ({ class: idx, prob }))
.sort((a, b) => b.prob - a.prob)
.slice(0, 3);
this.resultEl.innerHTML = `
<h3>Results (${timeMs.toFixed(2)}ms)</h3>
${top3.map(item => `
<div class="prediction">
Class ${item.class}: ${(item.prob * 100).toFixed(2)}%
</div>
`).join('')}
`;
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Step 6: Utility Methods
private setStatus(message: string) {
this.statusEl.textContent = message;
}
destroy() {
this.engine.destroy();
}
}
// Initialize the application
const demo = new WebInferenceDemo();
demo.initialize();2
3
4
5
6
7
8
9
10
11
12
Key Concepts
WebGPU Initialization
WebGPU must be initialized before any tensor operations:
await this.engine.initialize();This checks for WebGPU support and creates the GPU device.
Image Preprocessing
Images must be:
- Resized to the model's expected input size (e.g., 224x224)
- Converted from RGBA to the model's channel format (e.g., RGB in NCHW layout)
- Normalized to the range expected by the model (typically [0, 1] or [-1, 1])
NCHW Data Layout
The example converts images to NCHW layout (channels first):
// NCHW: all R values, then all G values, then all B values
floatData[i] = imageData.data[i * 4] / 255.0; // R channel
floatData[i + 224 * 224] = imageData.data[i * 4 + 1] / 255.0; // G channel
floatData[i + 2 * 224 * 224] = imageData.data[i * 4 + 2] / 255.0; // B channel2
3
4
For NHWC layout (channels last), the data would be interleaved: R0, G0, B0, R1, G1, B1, ...
Error Handling
Always wrap inference calls in try/catch blocks:
try {
const output = await this.engine.infer(input);
// Process results
} catch (error) {
// Handle errors gracefully
}2
3
4
5
6
Performance Tips
- Reuse the engine - Initialize once and reuse for multiple inferences
- Batch when possible - Process multiple images in a single tensor
- Clean up promptly - Destroy tensors and the engine when done
- Warm up the GPU - Run a dummy inference before timing to avoid cold-start overhead
Browser Compatibility
| Browser | Minimum Version | Notes |
|---|---|---|
| Chrome | 113+ | Recommended |
| Edge | 113+ | Recommended |
| Safari | 18+ | macOS Sonoma+ |
Check WebGPU support:
if (!navigator.gpu) {
console.error('WebGPU is not supported in this browser');
// Show fallback message or redirect
}2
3
4
Next Steps
- See Performance Benchmarking for measuring inference speed
- Read about Memory Layout for optimization
- Check the API Reference for detailed documentation