Developer Guide
Development environment setup and contribution guidelines.
Table of Contents
Development Environment
Prerequisites
| Tool | Minimum | Recommended |
| CUDA Toolkit | 11.0 | 12.0+ |
| CMake | 3.18 | 3.25+ |
| GCC | 9.4 | 11+ |
| Clang | 10 | 14+ |
| Python | 3.8 | 3.10+ |
IDE Setup
VS Code
Recommended extensions:
ms-vscode.cpptools — C/C++ extension llvm-vs-code-extensions.vscode-clangd — Clangd language server NVIDIA.nsight-vscode-edition — CUDA support
class="highlight">
1
2
3
4
5
6
7
| // .vscode/settings.json
{
"cmake.configureSettings": {
"CMAKE_EXPORT_COMPILE_COMMANDS": "ON"
},
"C_Cpp.default.configurationProvider": "ms-vscode.cmake-tools"
}
CLion class="highlight"> 1
2
| Settings → Build, Execution, Deployment → Toolchains
→ Add → CUDA (set CUDA path)
Build System Debug Build class="highlight"> 1
2
3
| mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTS=ON
make -j$(nproc)
Release Build with Debug Info class="highlight"> 1
2
| cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo
make -j$(nproc)
Sanitizer Build class="highlight"> 1
2
3
| cmake .. -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_CXX_FLAGS="-fsanitize=address,undefined"
make -j$(nproc)
Cross-Compilation class="highlight"> 1
2
3
4
5
| # For specific architecture
cmake .. -DCUDA_ARCH="75;80;86"
# For all common architectures
cmake .. -DCUDA_ARCH="70;75;80;86;89;90"
Testing Run All Tests class="highlight"> 1
| ctest --output-on-failure
Run Specific Test class="highlight"> 1
| ./tests/tiny_llm_test --gtest_filter="W8A16MatmulTest.*"
Debug CUDA Kernels class="highlight"> 1
2
3
4
5
6
7
8
| # Enable CUDA launch blocking for debugging
CUDA_LAUNCH_BLOCKING=1 ./tests/tiny_llm_test
# CUDA memcheck
cuda-memcheck ./tests/tiny_llm_test
# Compute sanitizer (CUDA 11.1+)
compute-sanitizer --tool memcheck ./tests/tiny_llm_test
Profiling class="highlight"> 1
2
3
4
5
| # Nsight Compute
ncu -o profile.ncu-rep ./test_kernel
# Nsight Systems
nsys profile -o profile.qdrep ./test_app
Code Style C++ Style Guide We follow the Google C++ Style Guide with modifications: | Rule | Convention | | Naming | CamelCase for classes, snake_case for functions/variables | | Members | trailing underscore for private members: member_ | | Constants | kCamelCase for constants, SCREAMING_SNAKE for macros | | Indent | 4 spaces (no tabs) | | Line length | 100 characters | Example class="highlight"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| class MyClass {
public:
explicit MyClass(int size);
void doSomething(const std::string& input);
int getSize() const { return size_; }
private:
int size_;
std::vector<float> data_;
};
namespace tiny_llm {
constexpr int kDefaultGroupSize = 128;
Result<float> computeValue(int input) {
if (input < 0) {
return Result<float>::err("Negative input");
}
return Result<float>::ok(std::sqrt(input));
}
} // namespace tiny_llm
Use clang-format with the project’s .clang-format file: class="highlight"> 1
2
3
4
5
6
| # Format a file
clang-format -i src/myfile.cpp
# Format all source files
find src tests kernels -name "*.cpp" -o -name "*.h" -o -name "*.cu" -o -name "*.cuh" \
| xargs clang-format -i
Contributing Workflow - Fork & Clone
class="highlight"> 1
2
| git clone https://github.com/your-username/tiny-llm.git
cd tiny-llm
- Create Branch
class="highlight"> 1
| git checkout -b feature/my-feature
- Make Changes
- Write code
- Add tests
- Update documentation
- Commit
class="highlight"> 1
| git commit -m "feat: add new feature"
- Push & PR
class="highlight"> 1
| git push origin feature/my-feature
Then create a Pull Request on GitHub. Follow Conventional Commits: class="highlight"> 1
2
3
4
5
| <type>(<scope>): <description>
[optional body]
[optional footer]
Types: feat: New feature fix: Bug fix docs: Documentation changes style: Code style changes (formatting, semicolons) refactor: Code refactoring perf: Performance improvements test: Adding or correcting tests ci: CI/CD changes chore: Maintenance tasks Examples: class="highlight"> 1
2
3
4
5
6
7
8
9
| feat(attention): add multi-head attention kernel
Implement optimized attention_decode kernel with online
softmax for improved numerical stability.
fix(kvcache): correct scale dimension calculation
The scale tensor was using incorrect dimension calculation
causing memory corruption in W8A16 matmul.
PR Checklist Review Process - All PRs require at least one review
- CI must pass (format check, build, tests)
- Address review feedback
- Squash commits if requested
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|