v2.0.0 — Major Refactoring
Release Date: March 9, 2026
Full Changelog: v1.0.0 → v2.0.0
⚠️ Breaking Changes
KVCache API Redesign
Problem: The previous appendKV() implementation had fragile layer-order dependencies that could lead to incorrect cache writes if layers were called in different orders.
Solution: New stateless design with explicit length advancement.
Before (v1.x)
class="highlight">
1
2
3
// Layer 0 would update current_len, other layers compensated
// Could break if layer order changed
kv_cache.appendKV(seq_id, layer_idx, k, v, num_tokens);
After (v2.0+)
class="highlight">1
2
3
4
5
6
// appendKV is stateless - all layers write at current_len
for (int i = 0; i < num_layers; i++) {
layers[i]->forward(hidden_states, kv_cache, seq_id, position, stream);
}
// Explicitly advance length once after all layers
kv_cache.advanceSeqLen(seq_id, num_tokens);
Migration: Update any code using KVCacheManager directly. See Migration Guide below.
🟢 Added
CI/CD Improvements
- GitHub Actions workflow for continuous integration
- Automated
clang-format checking - Format validation on pull requests
CMake Modernization
Feature Before After Version 1.0.0 2.0.0 CUDA Arch Manual Auto-detect (native or fallback) Includes Global target_include_directories() Target Export None tiny_llm::tiny_llm alias Warnings Basic -Wall -Wextra (GCC/Clang) IDE Support Manual compile_commands.json generation
New usage:
class="highlight">