v2.0.0 — Major Refactoring

Name: Tiny-LLM
Author: LessUp

Release Date: March 9, 2026
Full Changelog: v1.0.0 → v2.0.0

⚠️ Breaking Changes

KVCache API Redesign

Problem: The previous appendKV() implementation had fragile layer-order dependencies that could lead to incorrect cache writes if layers were called in different orders.

Solution: New stateless design with explicit length advancement.

Before (v1.x)

class="highlight">

1
2
3
// Layer 0 would update current_len, other layers compensated
// Could break if layer order changed
kv_cache.appendKV(seq_id, layer_idx, k, v, num_tokens);
   After (v2.0+) 
 class="highlight">1
2
3
4
5
6
// appendKV is stateless - all layers write at current_len
for (int i = 0; i < num_layers; i++) {
    layers[i]->forward(hidden_states, kv_cache, seq_id, position, stream);
}
// Explicitly advance length once after all layers
kv_cache.advanceSeqLen(seq_id, num_tokens);
 Migration: Update any code using KVCacheManager directly. See Migration Guide below.
   🟢 Added 
   CI/CD Improvements 
  GitHub Actions workflow for continuous integration
 Automated clang-format checking
 Format validation on pull requests
 
   CMake Modernization 
    Feature  Before  After  
 
   Version  1.0.0  2.0.0  
  CUDA Arch  Manual  Auto-detect (native or fallback)  
  Includes  Global  target_include_directories()  
  Target Export  None  tiny_llm::tiny_llm alias  
  Warnings  Basic  -Wall -Wextra (GCC/Clang)  
  IDE Support  Manual  compile_commands.json generation  
 
 
 New usage:
 class="highlight">1
2
find_package(tiny_llm)
target_link_libraries(myapp tiny_llm::tiny_llm)
   🟡 Changed 
   Build System 
  Minimum CMake version: 3.18
 CUDA architecture auto-detection with fallback to common arches
 Improved compiler warning flags
 
   Test Coverage 
  Added property-based tests with RapidCheck
 Expanded kernel test coverage
 Integration tests for end-to-end workflows
 
   🔄 Migration Guide 
   Updating KVCache Usage 
 If you’re using KVCacheManager directly in your code:
 class="highlight">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// v1.x code
void generateStep() {
    for (int i = 0; i < num_layers; i++) {
        // appendKV managed length internally
        kv_cache.appendKV(seq_id, i, k_data[i], v_data[i], 1);
    }
}

// v2.0+ code
void generateStep() {
    for (int i = 0; i < num_layers; i++) {
        // appendKV is stateless
        kv_cache.appendKV(seq_id, i, k_data[i], v_data[i], 1, stream_);
    }
    // Must explicitly advance
    kv_cache.advanceSeqLen(seq_id, 1);
}
 The InferenceEngine class handles this automatically for standard use cases.
   📊 Performance 
    Metric  v1.0.0  v2.0.0  Change  
 
   Build time  45s  38s  -15%  
  Test runtime  2.1s  1.8s  -14%  
  Memory (KV Cache)  Same  Same  Correctness only  
  Throughput  Same  Same  No impact  
 
 
   ✅ Verification 
 class="highlight">1
2
3
4
5
$ ctest --output-on-failure
100% tests passed, 0 tests failed

$ clang-format --dry-run --Werror src/*.cpp tests/*.cpp kernels/*.cu
$ # No output = no format issues
   📚 Documentation 
 New documentation structure:
  Multi-language support (EN/ZH)
 API reference with examples
 Architecture documentation
 Contribution guidelines
 
 ← Back to Changelog
  
  Back to top
     
   Quick Links
   Documentation
  Quick Start
 Architecture
 API Reference
 
 
  Resources
  Changelog
 Releases
 Troubleshooting
 
 
  Community
  GitHub
 Contributing
 Developer Guide
 
 
 
 
    📦 New version available!