OpenSpec System: Behavioral Specification
YOLO-Toys uses OpenSpec, a specification system based on Gherkin syntax, to define behavioral contracts. This article explores how this system improves documentation quality and development workflow.
Problem Statement
Traditional documentation suffers from:
- Drift: Code changes but docs don't update
- Ambiguity: Natural language is imprecise
- Fragmentation: Specs scattered across READMEs, wikis, issues
- No enforcement: Nothing verifies docs match implementation
The challenge: How do we create living documentation that stays synchronized with code?
Theoretical Foundation
Gherkin Syntax
Gherkin is a domain-specific language for behavioral specification, popularized by Cucumber:
Feature: Feature Name
As a [role]
I want [feature]
So that [benefit]
Scenario: Scenario Name
Given [precondition]
When [action]
Then [expected result]Key characteristics:
- Structured: Machine-parseable format
- Readable: Non-technical stakeholders can understand
- Testable: Can be executed as automated tests
- Living: Version-controlled alongside code
OpenSpec Conventions
OpenSpec adapts Gherkin for API specification:
## Purpose
Define the [domain] contract.
---
### Requirement: Requirement Name
The system MUST [behavioral constraint].
#### Scenario: Scenario Name
Given: [precondition]
When: [action]
Then: [expected result]Implementation Deep Dive
Directory Structure
openspec/
├── config.yaml # OpenSpec configuration
├── specs/
│ ├── api/
│ │ ├── spec.md # REST API specification
│ │ ├── openapi.yaml # OpenAPI schema
│ │ └── websocket.md # WebSocket protocol
│ ├── domain/
│ │ └── spec.md # Domain model specification
│ ├── testing/
│ │ ├── spec.md # Testing strategy
│ │ └── rest-api.feature # Gherkin test scenarios
│ └── product/
│ └── spec.md # Product requirements
└── changes/
├── archive/ # Completed changes
└── active/ # Work in progressAPI Specification Example
## Purpose
Define the REST API contract for the YOLO-Toys inference platform.
---
### Requirement: Health Check Endpoint
The system MUST provide a `/health` endpoint that returns service status.
#### Scenario: Service is healthy
Given: The FastAPI application is running
When: A GET request is made to `/health`
Then: Response status is 200 with `{ "status": "ok", "version": "...", "device": "..." }`
---
### Requirement: Inference Endpoint
The system MUST provide a `/infer` endpoint for detection, segmentation, and pose tasks.
#### Scenario: Successful object detection
Given: A valid image file and model_id for detection
When: A POST request is made to `/infer` with the image
Then: Response contains `width`, `height`, `task: "detect"`, `detections` array
#### Scenario: Invalid image format
Given: An invalid file (not an image)
When: A POST request is made to `/infer`
Then: Response status is 400 with error detailDomain Specification Example
## Purpose
Define the core architecture patterns for YOLO-Toys.
---
### Requirement: Handler Pattern (Strategy)
The system MUST implement all model inference through a unified `BaseHandler` interface.
#### Scenario: Load and cache model
Given: A model_id is requested for inference
When: The handler's `load()` method is called
Then: Model is loaded, cached, and returned with optional processor
#### Scenario: Execute inference
Given: A model is loaded and cached
When: The handler's `infer()` method is called with an image
Then: Results are returned in the standard format for the task typeGherkin Test Scenarios
Feature: REST API Inference
As a user
I want to perform inference on images via REST API
So that I can detect objects in my images
Background:
Given the server is running on port 8000
And the default model is yolov8n.pt
Scenario: Successful detection inference
Given I have a valid image file "test.jpg"
When I send a POST request to "/infer" with:
| field | value |
| file | test.jpg |
| model | yolov8n.pt |
| conf | 0.25 |
Then the response status should be 200
And the response should contain "width"
And the response should contain "height"
And the response should contain "task" with value "detect"
And the response should contain "detections" as an array
Scenario Outline: Open-vocabulary detection
Given I have a valid image file "test.jpg"
When I send a POST request to "/infer" with:
| field | value |
| file | test.jpg |
| model | <model> |
| text_queries | "cat, dog" |
Then the response status should be 200
Examples:
| model |
| google/owlvit-base-patch32 |
| IDEA-Research/grounding-dino-tiny |Specification as Documentation
Dual Purpose
OpenSpec files serve as both:
- Documentation: Humans read markdown specs
- Tests: Tools execute Gherkin scenarios
Documentation Generation
Workflow Integration
Development Workflow
Change Management
OpenSpec includes a change management system:
openspec/changes/
├── active/
│ └── 2026-05-15-add-sam-support/
│ ├── proposal.md
│ ├── design.md
│ └── tasks.md
└── archive/
└── 2026-04-24-normalize-project/
├── proposal.md
└── tasks.mdEach change follows a structured process:
- Propose: Create proposal with rationale
- Design: Document technical approach
- Tasks: Break down implementation steps
- Archive: Move to archive when complete
Trade-offs
What We Gained
| Benefit | Description |
|---|---|
| Living Documentation | Specs version-controlled with code |
| Testability | Gherkin scenarios executable as tests |
| Precision | Structured format reduces ambiguity |
| Traceability | Requirements linked to implementation |
| Reviewability | Changes visible in PRs |
What We Sacrificed
| Cost | Mitigation |
|---|---|
| Learning Curve | Gherkin syntax takes practice |
| Maintenance Overhead | Specs need updating with code |
| Tooling Complexity | Need pytest-bdd, parsers |
Alternative Considered: Traditional Docs
We considered using only markdown documentation:
# API Reference
## POST /infer
Accepts an image file and returns detection results.
Parameters:
- `model`: Model ID (default: yolov8n.pt)
- `conf`: Confidence threshold
...Why rejected:
- No structural enforcement
- Easy for docs to drift from implementation
- No automatic verification
Best Practices
Writing Good Scenarios
Focus on behavior, not implementation:
gherkin# Good: What the system does Then the response contains "detections" as an array # Bad: How it does it Then the YOLOHandler is called with the imageUse scenario outlines for variations:
gherkinScenario Outline: Detection with different models When I send a request with model "<model>" Then the response status is 200 Examples: | model | | yolov8n.pt | | yolov8s.pt | | yolov8m.pt |Keep scenarios independent:
- Each scenario should run in isolation
- No dependency between scenarios
Organizing Specifications
- One spec per domain: API, Domain, Testing, etc.
- Group by feature: Health, Inference, Models
- Use consistent terminology: Match codebase naming
Integration with Testing
pytest-bdd Integration
# test_api.py
from pytest_bdd import scenario, given, when, then
@scenario("rest-api.feature", "Successful detection inference")
def test_detection_inference():
pass
@given("I have a valid image file")
def valid_image():
return load_test_image("test.jpg")
@when("I send a POST request to /infer")
def send_inference_request(valid_image):
return client.post("/infer", files={"file": valid_image})
@then("the response status should be 200")
def check_status(send_inference_request):
assert send_inference_request.status_code == 200Summary
The OpenSpec system provides YOLO-Toys with:
- Living documentation that evolves with code
- Executable specifications for verification
- Structured change management
- Traceability from requirements to implementation
The key insight is that documentation shouldn't be separate from code—it should be a first-class artifact that undergoes the same rigor as implementation.