Project Status

CuFlash-Attn is maintained as a stable v0.3.0 baseline and an archive-ready reference implementation. The current work focuses on governance cleanup, documentation quality, workflow simplification, and bug fixes rather than feature expansion.

What this project covers

From-scratch CUDA C++ implementation of FlashAttention
Forward and backward passes for float and half
Supported head_dim values: 32, 64, 128
Public C++ API plus C ABI for Python ctypes integration
OpenSpec-tracked design and verification rules

Maintenance posture

This repository is intentionally optimized for:

clarity over breadth: no speculative feature growth
stable integration surface: examples, docs, and ABI stay aligned
lightweight engineering: preset-based builds, focused CI, bilingual docs
handoff readiness: contributors and follow-up models can continue from explicit specs and control docs

Development workflow

The preferred workflow is:

Read the relevant files in openspec/specs/
Capture changes through an OpenSpec change when scope or behavior shifts
Build with CMake presets only
Run verification appropriate to the environment
Use review before landing non-trivial changes

Validation boundaries

Local CUDA builds require a working toolkit and nvcc
GPU tests are skipped automatically on systems without a CUDA device
Documentation and workflow cleanup can be validated without a GPU

Project Status

What this project covers

Maintenance posture

Development workflow

Validation boundaries

Canonical references

Contributors

Project Status ​

What this project covers ​

Maintenance posture ​

Development workflow ​

Validation boundaries ​

Canonical references ​

Contributors

Project Status

What this project covers

Maintenance posture

Development workflow

Validation boundaries

Canonical references