ADR-002: Centralized Registry over Distributed Metadata

Status	Date	Decision Makers
Accepted	2024-01-15	Architecture Team

Context

YOLO-Toys needs to:

Know what models are available
Map model IDs to handlers
Display model metadata in API responses
Infer categories for unknown models

We had several options for organizing this information:

Central registry constant
Metadata distributed across handlers
Configuration files (YAML, JSON)
External database

Decision

We adopted a Centralized Registry Pattern where:

MODEL_REGISTRY is a single dict constant
ModelCategory enum encapsulates all categories
Category inference logic lives in ModelCategory.infer_from_id()

python

# Single source of truth
MODEL_REGISTRY: dict[str, dict[str, Any]] = {
    "yolov8n.pt": {
        "category": ModelCategory.YOLO_DETECT,
        "name": "YOLOv8 Nano",
        "description": "Ultra-lightweight detection model",
        "speed": "extreme fast",
        "accuracy": "medium",
    },
    ...
}

Alternatives Considered

Alternative 1: Distributed Metadata

Store metadata alongside each handler:

python

class YOLOHandler(BaseHandler):
    MODELS = {
        "yolov8n.pt": {"name": "YOLOv8 Nano", ...},
        "yolov8s.pt": {"name": "YOLOv8 Small", ...},
    }

    @classmethod
    def get_models(cls):
        return cls.MODELS

class DETRHandler(BaseHandler):
    MODELS = {
        "facebook/detr-resnet-50": {...},
    }

# To get all models
all_models = {}
for handler in [YOLOHandler, DETRHandler, ...]:
    all_models.update(handler.get_models())

Pros:

Co-locates model info with inference logic
Handler authors own their models

Cons:

Hard to enumerate all models (need to import all handlers)
Category information scattered
Difficult to maintain consistency
Circular import risks

Alternative 2: Configuration Files

Store metadata in YAML/JSON:

yaml

# models.yaml
models:
  yolov8n.pt:
    category: yolo_detect
    name: YOLOv8 Nano
    handler: YOLOHandler
  facebook/detr-resnet-50:
    category: hf_detr
    name: DETR ResNet-50
    handler: DETRHandler

Pros:

Easy to edit without touching code
Can be validated with schema
Supports runtime updates (reload config)

Cons:

Separation from code (drift risk)
No type safety
Need to load and parse at startup
Harder to version control changes

Alternative 3: External Database

Store metadata in PostgreSQL/Redis:

sql

CREATE TABLE models (
    id VARCHAR PRIMARY KEY,
    category VARCHAR,
    name VARCHAR,
    handler VARCHAR,
    ...
);

Pros:

Dynamic updates without restart
Query flexibility
Can store usage statistics

Cons:

Operational complexity
Network dependency at startup
Overkill for static metadata
Need migration management

Alternative 4: Automatic Discovery

Scan filesystem/HuggingFace Hub:

python

def discover_models():
    models = {}
    # Scan .pt files
    for path in Path("models").glob("*.pt"):
        models[path.name] = infer_metadata(path)
    # Query HuggingFace API
    for model_id in get_popular_vision_models():
        models[model_id] = fetch_hf_metadata(model_id)
    return models

Pros:

Always up-to-date
No manual maintenance

Cons:

Startup latency (network calls)
Non-deterministic results
Hard to customize metadata
Requires network access

Consequences

Positive

Single Source of Truth: All metadata in one place
Type Safety: Enum for categories, typed dicts
Predictable: Same models available every run
Simple: Dict lookup, no external dependencies
Version Controlled: Changes tracked in git

Negative

Centralized Changes: Adding models requires code change
Memory Overhead: Registry always in memory
No Runtime Updates: Adding models requires restart

Mitigations

Centralized Changes: Simple dict structure, PRs easy to review
Memory Overhead: Negligible (few KB)
No Runtime Updates: Support ad-hoc .pt files via inference

Implementation Notes

Registry Structure

python

MODEL_REGISTRY: dict[str, dict[str, Any]] = {
    model_id: {
        "category": ModelCategory,    # Required: enum value
        "name": str,                  # Display name
        "description": str,           # Human description
        "speed": str,                 # Qualitative speed
        "accuracy": str,              # Qualitative accuracy
    }
}

Category Inference Strategy

python

@classmethod
def infer_from_id(cls, model_id: str, registry: dict | None = None):
    # Priority 1: Explicit registry lookup
    if registry and model_id in registry:
        return registry[model_id]["category"]

    # Priority 2: File extension patterns
    if model_id.endswith(".pt"):
        return cls._infer_yolo_variant(model_id)

    # Priority 3: String patterns
    # Priority 4: Fallback heuristics

API Usage

python

# List all models
models = get_available_models()
# → {"yolo_detect": {"name": "...", "models": [...]}, ...}

# Get specific model
info = get_model_info("yolov8n.pt")
# → {"category": ..., "name": ..., ...}

# Resolve handler
registry = HandlerRegistry(device)
handler = registry.get_handler("yolov8n.pt")

ADR-002: Centralized Registry over Distributed Metadata ​

Context ​

Decision ​

Alternatives Considered ​

Alternative 1: Distributed Metadata ​

Alternative 2: Configuration Files ​

Alternative 3: External Database ​

Alternative 4: Automatic Discovery ​

Consequences ​

Positive ​

Negative ​

Mitigations ​

Implementation Notes ​

Registry Structure ​

Category Inference Strategy ​

API Usage ​

References ​

ADR-002: Centralized Registry over Distributed Metadata

Context

Decision

Alternatives Considered

Alternative 1: Distributed Metadata

Alternative 2: Configuration Files

Alternative 3: External Database

Alternative 4: Automatic Discovery

Consequences

Positive

Negative

Mitigations

Implementation Notes

Registry Structure

Category Inference Strategy

API Usage

References