Release Date: April 16, 2026 Full Changelog: v2.0.0 → v2.0.1
🔵 Fixed
Critical: Scale Dimension Calculation Error
Severity: Critical Impact: Test utility only File: tests/test_integration.cu
The createRandomWeight function had an incorrect scale tensor dimension calculation:
class="highlight">
1
2
3
4
5
6
7
// ❌ INCORRECT (rows and cols swapped)intnum_groups=(cols+group_size-1)/group_size;w.scales=randomDeviceFP16(rows*num_groups,...);// ✅ CORRECT intnum_groups=(rows+group_size-1)/group_size;w.scales=randomDeviceFP16(num_groups*cols,...);
Why this matters: W8A16 matmul uses [rows/group_size, cols] to index scales, requiring ceil(rows/g) * cols elements. The incorrect calculation could lead to:
Incorrect dequantization in tests
Potential memory access issues
Test failures on certain tensor sizes
Code Cleanup
File: kernels/attention.cu
Removed 12 lines of unused q_reg array loading code in attention_decode_kernel. This was dead code that had no functional impact but cluttered the implementation.