Status: Accepted
Created: 2024
Last Updated: 2024
Supersedes: None
Overview
Design a memory pool subsystem for the Mini-Inference Engine to reduce CUDA memory allocation overhead, prevent memory fragmentation, and provide deterministic memory management for inference workloads.
Motivation
Frequent cudaMalloc/cudaFree calls cause:
High overhead (~10-50us per allocation)
Memory fragmentation during variable tensor lifetimes
classMemoryPool{public:explicitMemoryPool(size_tcapacity_bytes);~MemoryPool();// Non-copyable, movableMemoryPool(constMemoryPool&)=delete;MemoryPool&operator=(constMemoryPool&)=delete;MemoryPool(MemoryPool&&)noexcept;MemoryPool&operator=(MemoryPool&&)noexcept;// Allocationfloat*allocate(size_tbytes);// Deallocationvoiddeallocate(float*ptr);// Statisticssize_tcapacity()const;size_tused()const;size_tfree_space()const;floatutilization()const;size_tfragmentation_ratio()const;// Diagnosticsvoidprint_memory_map()const;voidvalidate_integrity()const;private:structBlock;Block*find_best_fit(size_tbytes);voidcoalesce_adjacent_blocks();voidsplit_block(Block*block,size_tbytes);void*pool_ptr_;// CUDA memory pool basesize_tcapacity_;size_tused_;std::vector<Block>blocks_;// Free list (sorted by address)};
Allocation Algorithm
Best-fit: Find smallest free block that satisfies request
Splitting: If block is significantly larger, split into allocated + remaining free
Coalescing: Merge adjacent free blocks on deallocation
Alignment: All allocations aligned to 256 bytes for CUDA coalescing
Error Handling
Condition
Behavior
Request > capacity
Returns nullptr
No contiguous block large enough
Returns nullptr
Double free
Throws std::runtime_error
Invalid pointer
Throws std::invalid_argument
Memory Budget
Scenario
Pool Size
Notes
MNIST inference (batch=1)
4 MB
Minimal workload
MNIST inference (batch=256)
128 MB
Production workload
Large models
512 MB+
Configurable
Testing Strategy
Allocation correctness: Sequential, interleaved, random sizes
Coalescing: Verify adjacent free blocks merge
Fragmentation: Measure fragmentation ratio under various patterns