Database/Data Model Specifications

This directory contains data model and schema definitions for the Mini-Inference Engine.

Purpose

Data model specifications define:

  • Data structures and their relationships
  • Serialization formats
  • File formats for persistence
  • Memory layouts for GPU data

Current Status

The project uses custom binary formats for weight storage. Future specifications may include:

  • DBML for conceptual data modeling
  • Protocol buffer schemas for serialization
  • Custom binary format documentation

Data Models

Weight File Format

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Weight File Layout:
═════════════════════════════════════════════════════════════
Offset  Size        Description
═════════════════════════════════════════════════════════════
0       32 bytes    Header
  ├─ 0   4 bytes    Magic number (0x4D494E49 = "MINI")
  ├─ 4   4 bytes    Version (1)
  ├─ 8   4 bytes    Number of layers
  └─ 12  20 bytes   Reserved
═════════════════════════════════════════════════════════════
32      Variable    Layer Data (repeated for each layer)
  ├─ 0   4 bytes    Layer type (0 = Linear)
  ├─ 4   4 bytes    Input dimension (in_features)
  ├─ 8   4 bytes    Output dimension (out_features)
  ├─ 12  4 bytes    Has bias flag (0 or 1)
  ├─ 16  in×out×4   Weight data (row-major)
  └─ ... out×4      Bias data (if has_bias)
═════════════════════════════════════════════════════════════

Core Data Structures

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Matrix descriptor
struct MatrixDesc {
    float* data;        // Device pointer
    int rows;           // Row count M
    int cols;           // Column count N
    int ld;             // Leading dimension
    bool is_transposed; // Whether transposed
};

// GEMM configuration
struct GemmConfig {
    int BLOCK_M;        // Tile row size
    int BLOCK_N;        // Tile column size
    int BLOCK_K;        // K dimension block size
    int WARP_M;         // Warp-level M blocking
    int WARP_N;         // Warp-level N blocking
    bool use_double_buffer;
    bool use_vectorized_load;
};

// Fusion operation configuration
struct FusionConfig {
    bool add_bias;
    bool apply_relu;
    float* bias;
};

Network Architecture (MNIST)

1
2
3
4
5
6
7
8
9
Input: 784 (28x28)
    ↓
Linear(784, 256) + ReLU
    ↓
Linear(256, 128) + ReLU
    ↓
Linear(128, 10)
    ↓
Output: 10 (logits)

See AGENTS.md for development workflow.


Back to top

MIT License | A learning project for the CUDA community