v2.0.1 — Bug Fixes

Release Date: April 16, 2026
Full Changelog: v2.0.0 → v2.0.1


🔵 Fixed

Critical: Scale Dimension Calculation Error

Severity: Critical
Impact: Test utility only
File: tests/test_integration.cu

The createRandomWeight function had an incorrect scale tensor dimension calculation:

class="highlight">
1
2
3
4
5
6
7
// ❌ INCORRECT (rows and cols swapped)
int num_groups = (cols + group_size - 1) / group_size;
w.scales = randomDeviceFP16(rows * num_groups, ...);

// ✅ CORRECT  
int num_groups = (rows + group_size - 1) / group_size;
w.scales = randomDeviceFP16(num_groups * cols, ...);

Why this matters: W8A16 matmul uses [rows/group_size, cols] to index scales, requiring ceil(rows/g) * cols elements. The incorrect calculation could lead to:

  • Incorrect dequantization in tests
  • Potential memory access issues
  • Test failures on certain tensor sizes

Code Cleanup

File: kernels/attention.cu

Removed 12 lines of unused q_reg array loading code in attention_decode_kernel. This was dead code that had no functional impact but cluttered the implementation.


✅ Verification

All tests pass with the corrected implementation:

class="highlight">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ ctest --output-on-failure
Test project /tiny-llm/build
    Start 1: test_w8a16_matmul
1/5 Test #1: test_w8a16_matmul .........   Passed    0.05 sec
    Start 2: test_kv_cache
2/5 Test #2: test_kv_cache ..............   Passed    0.02 sec
    Start 3: test_attention
3/5 Test #3: test_attention .............   Passed    0.03 sec
    Start 4: test_rmsnorm
4/5 Test #4: test_rmsnorm ...............   Passed    0.01 sec
    Start 5: test_integration
5/5 Test #5: test_integration ...........   Passed    1.23 sec

100% tests passed, 0 tests failed

🔄 Changes

Files Changed Additions Deletions
3 15 14

Modified files:

  • tests/test_integration.cu — Fix scale dimension
  • kernels/attention.cu — Remove unused code
  • CHANGELOG.md — Documentation update

📦 Assets

  • tiny-llm-v2.0.1.tar.gz — Source tarball
  • tiny-llm-v2.0.1.zip — Source zip

← Back to Changelog


Back to top