CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Philosophy: OpenSpec Spec-Driven Development

This project follows OpenSpec Spec-Driven Development (SDD) framework. All code implementations must use the specification documents in the /openspec/specs/ directory as the Single Source of Truth.

OpenSpec Commands

Use these slash commands for the development workflow:

Command Purpose
/opsx:explore Think through ideas, investigate problems
/opsx:propose Create change with proposal, design, tasks
/opsx:apply Implement tasks from a change
/opsx:verify Verify implementation matches specs
/opsx:archive Archive completed change
/opsx:status Show status of changes and specs

Specification Documents

  • /openspec/specs/product/: Product feature definitions and acceptance criteria
    • gemm-optimization-requirements.md: Core requirements R1-R9
    • implementation-plan.md: Implementation roadmap
  • /openspec/specs/architecture/: Technical design documents (RFCs)
    • 0001-core-architecture.md: System architecture design
    • 0002-memory-pool.md: Memory pool design
    • 0003-quantization.md: INT8 quantization
    • 0004-stream-manager.md: CUDA stream management
    • 0005-auto-tuner.md: Auto-tuning system
    • 0006-logger-config-profiler.md: Infrastructure components
    • 0007-half-precision-gemm.md: FP16 support
    • 0008-batch-gemm.md: Batched operations
  • /openspec/specs/api/: API interface definitions
  • /openspec/specs/data/: Data schemas and model definitions
  • /openspec/specs/testing/: BDD test specifications

See AGENTS.md for complete AI workflow instructions.

Build Commands

1
2
3
4
5
6
7
8
9
10
11
# Debug build with tests (recommended for development)
cmake --preset default
cmake --build --preset default

# Release build without tests
cmake --preset release
cmake --build --preset release

# CI build (Release + tests)
cmake --preset ci
cmake --build --preset ci

Test Commands

1
2
# Run tests (requires NVIDIA GPU)
ctest --preset default

Tests require an NVIDIA GPU with CUDA support and cannot run in CI.

Code Style

  • Follow .clang-format (Google-based, 4-space indent, 100 column limit)
  • Use clang-format --style=file -i <file> to format

Naming Conventions

1
2
3
4
5
6
7
8
9
10
11
class ClassName;              // PascalCase
void function_name();         // snake_case
int variable_name;            // snake_case
const int CONSTANT_NAME;      // UPPER_SNAKE_CASE
int member_variable_;         // snake_case with trailing underscore

// CUDA specifics
__global__ void my_kernel();  // snake_case
template<int BLOCK_SIZE>      // UPPER_SNAKE_CASE
__shared__ float s_data[256]; // s_ prefix for shared memory
float r_sum = 0.0f;           // r_ prefix for registers

Commit Style

Follow Conventional Commits:

  • feat: new features
  • fix: bug fixes
  • docs: documentation
  • perf: performance improvements
  • refactor: code refactoring
  • test: test changes

Environment Requirements

  • CUDA Toolkit 11.0+
  • CMake 3.18+
  • C++17 compiler
  • Target GPU architectures: 75, 80, 86, 89, 90 (Turing through Blackwell)

Back to top

MIT License | A learning project for the CUDA community