Troubleshooting

Name: Tiny-LLM
Author: LessUp

Common issues and solutions for Tiny-LLM.

Build Issues
Runtime Issues
Performance Issues
Model Loading Issues
Getting Help

Build Issues

CUDA not found

Error: Could not find CUDA or nvcc not found

Solutions:

class="highlight">

1
2
3
4
5
6
7
8
9
# Check CUDA installation
nvcc --version

# Set CUDA path explicitly
cmake .. -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.2

# Or add to PATH
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
   CMake version too old 
 Error: CMake 3.18 or higher is required
 Solutions:
 class="highlight">1
2
3
4
5
6
7
8
9
# Using pip
pip install --upgrade cmake

# Using snap (Ubuntu)
sudo snap install cmake --classic

# Build from source
curl -L https://cmake.org/files/v3.28/cmake-3.28.0.tar.gz | tar xz
cd cmake-3.28.0 && ./bootstrap && make && sudo make install
   C++17 not supported 
 Error: error: 'auto' in lambda parameter not supported
 Solutions:
 class="highlight">1
2
3
4
5
6
7
8
9
# Check compiler version
gcc --version  # Should be 9+
clang --version  # Should be 10+

# Specify compiler
cmake .. -DCMAKE_CXX_COMPILER=g++-11

# Or use environment variable
CC=gcc-11 CXX=g++-11 cmake ..
   CUDA architecture mismatch 
 Error: No kernel image is available for execution on the device
 Solutions:
 class="highlight">1
2
3
4
5
6
7
8
9
10
# Check your GPU compute capability
nvidia-smi --query-gpu=compute_cap --format=csv

# Build for your specific architecture
cmake .. -DCUDA_ARCH="80"  # For SM 8.0 (A100)
cmake .. -DCUDA_ARCH="86"  # For SM 8.6 (RTX 3090)
cmake .. -DCUDA_ARCH="89"  # For SM 8.9 (RTX 4090)

# Or use native detection
cmake .. -DCUDA_ARCH="native"
   Runtime Issues 
   CUDA out of memory 
 Error: CUDA out of memory or cudaErrorMemoryAllocation
 Solutions:
  Reduce batch size class="highlight">1
cache_config.max_batch_size = 1;  // Reduce from 4
   Reduce sequence length class="highlight">1
config.max_seq_len = 1024;  // Reduce from 2048
   Monitor memory class="highlight">1
2
3
size_t free, total;
cudaMemGetInfo(&free, &total);
std::cout << "Free: " << free / 1024 / 1024 << " MB" << std::endl;
      Illegal memory access 
 Error: an illegal memory access was encountered
 Possible causes:
  Incorrect model file format
 Dimension mismatch between model and config
 Uninitialized memory
 
 Solutions:
  Enable debug mode class="highlight">1
2
cmake .. -DCMAKE_BUILD_TYPE=Debug
CUDA_LAUNCH_BLOCKING=1 ./tiny_llm_demo
   Run with cuda-memcheck class="highlight">1
2
cuda-memcheck ./tiny_llm_demo
compute-sanitizer ./tiny_llm_demo
   Verify model dimensions class="highlight">1
2
std::cout << "Config: " << config.hidden_dim 
          << " x " << config.num_layers << std::endl;
      Slow generation speed 
 Possible causes:
  Debug build
 Not using W8A16 quantization
 Incorrect CUDA architecture
 
 Solutions:
  Use Release build class="highlight">1
cmake .. -DCMAKE_BUILD_TYPE=Release
   Verify GPU utilization class="highlight">1
watch -n 1 nvidia-smi
   Profile the application class="highlight">1
2
nsys profile -o profile ./tiny_llm_demo
nsys-ui profile.qdrep
   
   Performance Issues 
   Low GPU utilization 
 Symptom: GPU utilization < 50%
 Solutions:
  Increase batch size
 Check memory bandwidth bound operations
 Profile kernels with Nsight Compute
 
   Memory bandwidth bottleneck 
 Symptom: Decode phase slower than expected
 Cause: Attention decode is memory bandwidth bound
 Solutions:
  Use faster GPU with higher bandwidth
 Reduce KV cache size (smaller batch/seq_len)
 Enable flash attention (if available)
 
   Model Loading Issues 
   Invalid model file 
 Error: Failed to load model: invalid format
 Checklist:
  File exists and is readable
 Magic number matches (first 4 bytes)
 Version is supported
 Dimensions match config
 
   Dimension mismatch 
 Error: Weight dimension mismatch
 Solutions:
 class="highlight">1
2
3
4
// Verify config
std::cout << "vocab_size: " << config.vocab_size << std::endl;
std::cout << "hidden_dim: " << config.hidden_dim << std::endl;
std::cout << "intermediate_dim: " << config.intermediate_dim << std::endl;
   Getting Help 
   Debug Information to Include 
 When reporting issues, please provide:
  System info class="highlight">1
2
3
nvidia-smi
nvcc --version
cmake --version
   Build output class="highlight">1
2
cmake .. 2>&1 | tee cmake.log
make VERBOSE=1 2>&1 | tee build.log
   Runtime error class="highlight">1
CUDA_LAUNCH_BLOCKING=1 ./tiny_llm_demo 2>&1 | tee runtime.log
      Resources 
  GitHub Issues
 Documentation
 API Reference
 
    Languages: English  中文  
 
 
    ← Benchmarks  Home →  
 
 
  
  Back to top
     
   Quick Links
   Documentation
  Quick Start
 Architecture
 API Reference
 
 
  Resources
  Changelog
 Releases
 Troubleshooting
 
 
  Community
  GitHub
 Contributing
 Developer Guide
 
 
 
 
    📦 New version available!

Troubleshooting

Table of Contents

Build Issues

CUDA not found

CMake version too old

C++17 not supported

CUDA architecture mismatch

Runtime Issues

CUDA out of memory

Illegal memory access

Slow generation speed

Performance Issues

Low GPU utilization

Memory bandwidth bottleneck

Model Loading Issues

Invalid model file

Dimension mismatch

Getting Help

Debug Information to Include

Resources