Skip to content

Configuration Reference

YOLO-Toys uses environment variables for all configuration, following the 12-Factor App methodology. Configuration is managed through Pydantic Settings for type safety and validation.

Core Server Settings

VariableTypeDefaultDescription
PORTint8000Server port
HOSTstr0.0.0.0Bind address
LOG_LEVELstrINFOLogging level (DEBUG, INFO, WARNING, ERROR)
WORKERSint1Number of worker processes

Example

bash
export PORT=8080
export LOG_LEVEL=DEBUG

Model Settings

VariableTypeDefaultDescription
MODEL_NAMEstryolov8s.ptDefault model for /infer
DEVICEstrautoDevice (auto, cpu, cuda, mps)
SKIP_WARMUPboolfalseSkip model warmup on startup

Device Selection Logic

auto → CUDA if available → MPS if available → CPU

Inference Parameters

VariableTypeDefaultRangeDescription
CONF_THRESHOLDfloat0.250.0-1.0Confidence threshold
IOU_THRESHOLDfloat0.450.0-1.0IoU threshold for NMS
MAX_DETint3001-1000Maximum detections per image
IMGSZint64032-4096Inference image size
HALFboolfalse-Use FP16 inference

Per-Request Override

json
{
  "model": "yolov8n.pt",
  "image": "<base64>",
  "conf": 0.5,
  "iou": 0.4,
  "max_det": 100
}

Cache Settings

VariableTypeDefaultDescription
MODEL_CACHE_MAXSIZEint10Maximum cached models
MODEL_CACHE_TTLint3600Cache TTL in seconds
MODEL_MEMORY_THRESHOLDfloat0.85Memory threshold (0-1)

Cache Behavior

Request → Cache Check
         ├─ Hit  → Return cached model
         └─ Miss → Check memory
                   ├─ Under threshold → Load model
                   └─ Over threshold  → Evict LRU → Load model

Concurrency Settings

VariableTypeDefaultDescription
MAX_CONCURRENCYint4Max concurrent inferences
REQUEST_TIMEOUTint60Request timeout in seconds

Semaphore-Based Control

python
# Internally managed
self._semaphore = asyncio.Semaphore(MAX_CONCURRENCY)

async def infer_with_limit(...):
    async with self._semaphore:
        return await self._infer_internal(...)

Upload Limits

VariableTypeDefaultDescription
MAX_UPLOAD_MBint10Max upload size in MB

Enforced At

  1. FastAPI request size limit
  2. Image validation middleware
  3. Memory guard before processing

Security Settings

VariableTypeDefaultDescription
RATE_LIMIT_RPMint60Requests per minute per IP
CORS_ORIGINSstr""Comma-separated CORS origins

CORS Configuration

bash
# Single origin
export CORS_ORIGINS="https://example.com"

# Multiple origins
export CORS_ORIGINS="https://example.com,https://api.example.com"

# Allow all (development only!)
export CORS_ORIGINS="*"

BLIP-Specific Settings

VariableTypeDefaultDescription
BLIP_MAX_TOKENSint20Max tokens for captioning
BLIP_VQA_MAX_TOKENSint50Max tokens for VQA

Monitoring Settings

VariableTypeDefaultDescription
METRICS_ENABLEDbooltrueEnable Prometheus metrics
METRICS_PORTint9090Metrics server port

Prometheus Endpoints

EndpointPurpose
/metricsPrometheus metrics
/healthHealth check + system info
/system/statsDetailed system statistics
/system/cache/clearClear model cache (POST)

Environment File Example

Create a .env file in your project root:

env
# Server
PORT=8000
LOG_LEVEL=INFO

# Model
MODEL_NAME=yolov8s.pt
DEVICE=auto

# Inference
CONF_THRESHOLD=0.25
IOU_THRESHOLD=0.45
MAX_DET=300

# Cache
MODEL_CACHE_MAXSIZE=10
MODEL_CACHE_TTL=3600
MODEL_MEMORY_THRESHOLD=0.85

# Concurrency
MAX_CONCURRENCY=4
REQUEST_TIMEOUT=60

# Upload
MAX_UPLOAD_MB=10

# Security
RATE_LIMIT_RPM=60
CORS_ORIGINS=https://example.com

# Monitoring
METRICS_ENABLED=true

Configuration in Docker

Dockerfile

dockerfile
ENV PORT=8000
ENV DEVICE=auto
ENV MODEL_NAME=yolov8s.pt

docker-compose.yml

yaml
services:
  yolo-toys:
    image: yolo-toys:latest
    environment:
      - PORT=8000
      - DEVICE=cuda
      - MODEL_NAME=yolov8x.pt
      - CONF_THRESHOLD=0.3
    ports:
      - "8000:8000"

Configuration in Kubernetes

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: yolo-toys-config
data:
  PORT: "8000"
  DEVICE: "cuda"
  MODEL_NAME: "yolov8s.pt"
---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: yolo-toys
          envFrom:
            - configMapRef:
                name: yolo-toys-config

Validation

All settings are validated at startup:

python
class AppSettings(BaseSettings):
    conf_threshold: float = 0.25

    @validator("conf_threshold")
    def validate_conf(cls, v):
        if not 0 <= v <= 1:
            raise ValueError("conf_threshold must be in [0, 1]")
        return v

Invalid configuration will prevent startup with a clear error message.

DeploymentDocker DeploymentReferencePerformance Benchmarks

Released under the MIT License.