Architecture Atlas
The Architecture Atlas explains why YOLO-Toys looks the way it does: thin routes, a central manager, explicit handler dispatch, and normalization boundaries that let heterogeneous model families share one service contract.
Read this chapter when: you need the system map, the request lifecycle, and the decision boundaries that make extension safe.
System overviewRead the service as a layered runtime, not a flat list of endpoints.Request lifecycleTrace a request from ingress through cache lookup, handler dispatch, and result shaping.Execution boundariesSee how model-specific logic stays localized inside handler implementations.Middleware stackSecurity, metrics, timeout, rate limit, compression, and CORS in layered order.Config injectionHow Pydantic settings flow through adapter classes into the runtime.Model cacheLRU + TTL hybrid caching with memory-pressure eviction and thread safety.
Questions this chapter answers
- Why not expose one endpoint per model family?
- Why centralize model resolution through the registry?
- Where does normalization happen, and what does it cost?
- How does the runtime stay extensible without becoming opaque?
- How does the middleware stack order reflect production concerns?
- Why is the cache operationally aware rather than just time-based?
Recommended path
- Start with System Overview
- Continue to Request Lifecycle
- Read Handler Topology for execution boundaries
- Read Middleware Stack for operational layers
- Read Config Injection for settings flow
- Read Model Cache for caching strategy
- Finish with the ADR set to understand intentional trade-offs