Memory Layout
Understanding NCHW and NHWC memory layouts.
Overview
Memory layout refers to how tensor data is arranged in memory. Tiny-DL-Inference supports two common layouts used by different deep learning frameworks.
Layout Types
NCHW (Channel-First)
Shape: [Batch, Channel, Height, Width]
Memory order: batch → channel → height → width
Example: [1, 3, 224, 224]1
2
3
4
2
3
4
Used by: PyTorch, ONNX
NHWC (Channel-Last)
Shape: [Batch, Height, Width, Channel]
Memory order: batch → height → width → channel
Example: [1, 224, 224, 3]1
2
3
4
2
3
4
Used by: TensorFlow, TFLite
Performance Implications
NCHW Advantages
- Natural for convolution operations
- Better cache locality for channel-wise operations
- Direct indexing:
output[n, c, h, w]
NHWC Advantages
- Better memory coalescing on some GPUs
- Better for element-wise operations
- Compatible with TensorFlow models
Using Different Layouts
Creating Tensors with Layout
typescript
// NCHW (default)
const nchw = Tensor.zeros(context, [1, 3, 224, 224], { layout: 'NCHW' });
// NHWC
const nhwc = Tensor.zeros(context, [1, 224, 224, 3], { layout: 'NHWC' });1
2
3
4
5
2
3
4
5
Layout Conversion
typescript
// Convert NCHW to NHWC
const nhwc = await nchw.convertLayout('NHWC');
// Convert back
const original = await nhwc.convertLayout('NCHW');1
2
3
4
5
2
3
4
5
Operator Support
| Operator | NCHW | NHWC |
|---|---|---|
| Conv2d | ✅ | ⚠️ Via conversion |
| MaxPool | ✅ | ⚠️ Via conversion |
| ReLU | ✅ | ✅ |
| Softmax | ✅ | ✅ |
| Dense | ✅ | ✅ |
| Flatten | ✅ | ✅ |
Note
Conv2d and MaxPool currently execute in NCHW. Layout conversion happens automatically for NHWC inputs.
Best Practices
- Use NCHW for inference - Better performance for convolution-heavy models
- Convert at model load - Convert NHWC weights to NCHW once at initialization
- Avoid repeated conversions - Layout conversion has overhead
API Reference
See the Tensor API Reference for layout conversion methods.