Web Integration

Learn how to integrate Tiny-DL-Inference into a web application for real-time inference in the browser.

Overview

This example demonstrates building a complete web application that:

Loads a neural network model via the Fetch API
Processes images from file input using Canvas
Runs inference on WebGPU
Displays predictions with timing information

The source code consists of an HTML file and a TypeScript module (app.ts).

HTML Setup

Create a basic HTML page with file input and result display areas:

html

<!DOCTYPE html>
<html>
<head>
  <title>Web Inference Demo</title>
  <style>
    body {
      font-family: system-ui, -apple-system, sans-serif;
      max-width: 800px;
      margin: 0 auto;
      padding: 20px;
    }
    #status {
      margin: 20px 0;
      padding: 10px;
      font-family: monospace;
      background: #f5f5f5;
      border-radius: 4px;
    }
    #result {
      margin: 20px 0;
      padding: 15px;
      background: #f0f0f0;
      border-radius: 4px;
    }
    #result h3 {
      margin-top: 0;
    }
    .prediction {
      margin: 5px 0;
    }
  </style>
</head>
<body>
  <h1>Tiny-DL-Inference Web Demo</h1>

  <input type="file" id="imageInput" accept="image/*">

  <div id="status">Ready</div>
  <div id="result"></div>

  <script type="module" src="app.js"></script>
</body>
</html>

TypeScript Module

Create app.ts with the inference logic:

Step 1: Initialize the Application

typescript

import { InferenceEngine } from 'tiny-dl-inference';

class WebInferenceDemo {
  private engine: InferenceEngine;
  private statusEl: HTMLElement;
  private resultEl: HTMLElement;

  constructor() {
    this.engine = new InferenceEngine();
    this.statusEl = document.getElementById('status')!;
    this.resultEl = document.getElementById('result')!;
  }

  async initialize() {
    this.setStatus('Initializing WebGPU...');

    try {
      await this.engine.initialize();
      this.setStatus('Loading model...');

      const model = await this.loadModel();
      await this.engine.loadModel(model);

      this.setStatus('Ready for inference');
      this.setupEventListeners();
    } catch (error: unknown) {
      const message = error instanceof Error ? error.message : 'Unknown error';
      this.setStatus(`Error: ${message}`);
    }
  }
}

Step 2: Load the Model

typescript

private async loadModel() {
  const response = await fetch('model.json');
  return await response.json();
}

The model file should follow the format described in Custom Model Loading.

Step 3: Handle Image Input

typescript

private setupEventListeners() {
  const input = document.getElementById('imageInput') as HTMLInputElement;
  input.addEventListener('change', (e) => this.handleImageSelect(e));
}

private async handleImageSelect(event: Event) {
  const file = (event.target as HTMLInputElement).files?.[0];
  if (!file) return;

  this.setStatus('Processing image...');

  try {
    const imageData = await this.loadImage(file);
    const input = this.engine.tensorFromArray(imageData, [1, 3, 224, 224]);

    const start = performance.now();
    const output = await this.engine.infer(input);
    const end = performance.now();

    const predictions = await output.download();
    this.displayResults(predictions, end - start);

    this.setStatus('Inference complete');
  } catch (error: unknown) {
    const message = error instanceof Error ? error.message : 'Unknown error';
    this.setStatus(`Error: ${message}`);
  }
}

Step 4: Load and Process Images

typescript

private async loadImage(file: File): Promise<Float32Array> {
  return new Promise((resolve, reject) => {
    const img = new Image();

    img.onload = () => {
      const canvas = document.createElement('canvas');
      canvas.width = 224;
      canvas.height = 224;
      const ctx = canvas.getContext('2d')!;

      ctx.drawImage(img, 0, 0, 224, 224);
      const imageData = ctx.getImageData(0, 0, 224, 224);

      // Convert RGBA to RGB and normalize to [0, 1]
      const floatData = new Float32Array(3 * 224 * 224);
      for (let i = 0; i < 224 * 224; i++) {
        floatData[i] = imageData.data[i * 4] / 255.0;                    // R
        floatData[i + 224 * 224] = imageData.data[i * 4 + 1] / 255.0;   // G
        floatData[i + 2 * 224 * 224] = imageData.data[i * 4 + 2] / 255.0; // B
      }

      URL.revokeObjectURL(img.src);
      resolve(floatData);
    };

    img.onerror = () => {
      URL.revokeObjectURL(img.src);
      reject(new Error('Failed to load image'));
    };

    img.src = URL.createObjectURL(file);
  });
}

Step 5: Display Results

typescript

private displayResults(predictions: Float32Array, timeMs: number) {
  // Get top-3 predictions
  const top3 = Array.from(predictions)
    .map((prob, idx) => ({ class: idx, prob }))
    .sort((a, b) => b.prob - a.prob)
    .slice(0, 3);

  this.resultEl.innerHTML = `
    <h3>Results (${timeMs.toFixed(2)}ms)</h3>
    ${top3.map(item => `
      <div class="prediction">
        Class ${item.class}: ${(item.prob * 100).toFixed(2)}%
      </div>
    `).join('')}
  `;
}

Step 6: Utility Methods

typescript

private setStatus(message: string) {
  this.statusEl.textContent = message;
}

destroy() {
  this.engine.destroy();
}
}

// Initialize the application
const demo = new WebInferenceDemo();
demo.initialize();

Key Concepts

WebGPU Initialization

WebGPU must be initialized before any tensor operations:

typescript

await this.engine.initialize();

This checks for WebGPU support and creates the GPU device.

Image Preprocessing

Images must be:

Resized to the model's expected input size (e.g., 224x224)
Converted from RGBA to the model's channel format (e.g., RGB in NCHW layout)
Normalized to the range expected by the model (typically [0, 1] or [-1, 1])

NCHW Data Layout

The example converts images to NCHW layout (channels first):

typescript

// NCHW: all R values, then all G values, then all B values
floatData[i] = imageData.data[i * 4] / 255.0;                    // R channel
floatData[i + 224 * 224] = imageData.data[i * 4 + 1] / 255.0;   // G channel
floatData[i + 2 * 224 * 224] = imageData.data[i * 4 + 2] / 255.0; // B channel

For NHWC layout (channels last), the data would be interleaved: R0, G0, B0, R1, G1, B1, ...

Error Handling

Always wrap inference calls in try/catch blocks:

typescript

try {
  const output = await this.engine.infer(input);
  // Process results
} catch (error) {
  // Handle errors gracefully
}

Performance Tips

Reuse the engine - Initialize once and reuse for multiple inferences
Batch when possible - Process multiple images in a single tensor
Clean up promptly - Destroy tensors and the engine when done
Warm up the GPU - Run a dummy inference before timing to avoid cold-start overhead

Browser Compatibility

Browser	Minimum Version	Notes
Chrome	113+	Recommended
Edge	113+	Recommended
Safari	18+	macOS Sonoma+

Check WebGPU support:

typescript

if (!navigator.gpu) {
  console.error('WebGPU is not supported in this browser');
  // Show fallback message or redirect
}

Next Steps

See Performance Benchmarking for measuring inference speed
Read about Memory Layout for optimization
Check the API Reference for detailed documentation

Web Integration ​

Overview ​

HTML Setup ​

TypeScript Module ​

Step 1: Initialize the Application ​

Step 2: Load the Model ​

Step 3: Handle Image Input ​

Step 4: Load and Process Images ​

Step 5: Display Results ​

Step 6: Utility Methods ​

Key Concepts ​

WebGPU Initialization ​

Image Preprocessing ​

NCHW Data Layout ​

Error Handling ​

Performance Tips ​

Browser Compatibility ​

Next Steps ​

Web Integration

Overview

HTML Setup

TypeScript Module

Step 1: Initialize the Application

Step 2: Load the Model

Step 3: Handle Image Input

Step 4: Load and Process Images

Step 5: Display Results

Step 6: Utility Methods

Key Concepts

WebGPU Initialization

Image Preprocessing

NCHW Data Layout

Error Handling

Performance Tips

Browser Compatibility

Next Steps