Initial Implementation
Summary
Initial implementation of Mini-ImagePipe framework completed.
Status: Completed Date Archived: 2026-04-23
Completed Work
1. Set up project structure and core infrastructure
- 1.1 Create CMake project structure with CUDA support
- Set up CMakeLists.txt with CUDA language support
- Configure include directories and library targets
- Requirements: Project setup
- 1.2 Implement ImageBuffer and KernelConfig data structures
- Create
include/types.hwith ImageBuffer, KernelConfig, PipelineConfig structs - Requirements: Data Models
- Create
- 1.3 Implement IOperator base interface
- Create
include/operator.hwith abstract IOperator class - Requirements: Component Interfaces
- Create
2. Implement Memory Manager
- 2.1 Implement MemoryManager singleton with pinned memory allocation
- Create
src/memory_manager.cuwith cudaHostAlloc/cudaFree - Implement allocatePinned, freePinned, allocateDevice, freeDevice
- Implement async copy functions with CUDA streams
- Requirements: 7.1, 7.2
- Create
- 2.2 Implement memory pool for pinned memory reuse
- Add MemoryPool struct with free block tracking
- Implement block reuse logic in allocate/free
- Requirements: 7.4
- 2.3 Implement fallback to pageable memory
- Add fallback logic when cudaHostAlloc fails
- Log warning on fallback
- Requirements: 7.3
- 2.4 Implement shutdown and cleanup
- Free all tracked allocations on shutdown
- Requirements: 7.5
- 2.5 Write property test for memory pool reuse
- Property 17: Memory Pool Reuse
- Validates: Requirements 7.4
- 2.6 Write property test for memory cleanup
- Property 18: Memory Cleanup
- Validates: Requirements 7.5
3. Checkpoint - Memory Manager
- Ensure all tests pass, ask the user if questions arise.
4. Implement Color Conversion Operator
- 4.1 Implement ColorConvertOperator class
- Create
src/operators/color_convert.cu - Implement RGB_TO_GRAY, BGR_TO_RGB, RGBA_TO_RGB conversions
- Use luminance formula Y = 0.299R + 0.587G + 0.114*B
- Requirements: 4.1, 4.2, 4.3
- Create
- 4.2 Implement alpha channel preservation
- Preserve alpha channel during conversions when present
- Requirements: 4.4
- 4.3 Write property test for RGB to Grayscale formula
- Property 8: RGB to Grayscale Formula
- Validates: Requirements 4.2
- 4.4 Write property test for BGR to RGB channel swap
- Property 9: BGR to RGB Channel Swap
- Validates: Requirements 4.3
- 4.5 Write property test for alpha channel preservation
- Property 10: Alpha Channel Preservation
- Validates: Requirements 4.4
5. Implement Resize Operator
- 5.1 Implement ResizeOperator class with bilinear interpolation
- Create
src/operators/resize.cu - Implement bilinear interpolation kernel
- Requirements: 3.1
- Create
- 5.2 Implement nearest-neighbor interpolation
- Add NEAREST mode to resize kernel
- Requirements: 3.2
- 5.3 Implement coordinate mapping and arbitrary scale factors
- Compute src coordinates from dst coordinates
- Support both upscaling and downscaling
- Requirements: 3.3, 3.4
- 5.4 Write property test for resize coordinate mapping
- Property 6: Resize Coordinate Mapping
- Validates: Requirements 3.1, 3.2, 3.3
- 5.5 Write property test for arbitrary scale factors
- Property 7: Resize Arbitrary Scale Factors
- Validates: Requirements 3.4
6. Implement Sobel Edge Detection Operator
- 6.1 Implement SobelOperator class
- Create
src/operators/sobel.cu - Implement 3x3 Sobel kernels for Gx and Gy
- Use shared memory for efficient access
- Requirements: 2.1, 2.3
- Create
- 6.2 Implement gradient magnitude computation
- Compute magnitude as sqrt(Gx² + Gy²)
- Output single-channel result
- Requirements: 2.2, 2.4
- 6.3 Write property test for Sobel gradient computation
- Property 4: Sobel Gradient Computation
- Validates: Requirements 2.1, 2.2
- 6.4 Write property test for Sobel single-channel output
- Property 5: Sobel Single-Channel Output
- Validates: Requirements 2.4
7. Checkpoint - Basic Operators
- Ensure all tests pass, ask the user if questions arise.
8. Implement Gaussian Blur Operator
- 8.1 Implement GaussianBlurOperator class with separable filter
- Create
src/operators/gaussian_blur.cu - Generate 1D Gaussian kernels for horizontal and vertical passes
- Support kernel sizes 3x3, 5x5, 7x7
- Requirements: 1.1, 1.2
- Create
- 8.2 Implement shared memory with halo regions
- Load tile + halo into shared memory
- Handle boundary with reflection padding
- Requirements: 1.3, 1.4
- 8.3 Implement multi-channel support
- Support 1, 3, and 4 channel images
- Requirements: 1.5
- 8.4 Write property test for Gaussian blur multi-channel support
- Property 1: Gaussian Blur Multi-Channel Support
- Validates: Requirements 1.1, 1.5
- 8.5 Write property test for separable filter equivalence
- Property 2: Separable Filter Equivalence
- Validates: Requirements 1.2
- 8.6 Write property test for reflection padding boundary handling
- Property 3: Reflection Padding Boundary Handling
- Validates: Requirements 1.4
9. Checkpoint - All Operators
- Ensure all tests pass, ask the user if questions arise.
10. Implement Task Graph
- 10.1 Implement TaskNode and TaskGraph classes
- Create
src/task_graph.cpp - Implement addTask, addDependency methods
- Track node states (PENDING, READY, RUNNING, COMPLETED, FAILED)
- Requirements: 5.1
- Create
- 10.2 Implement cycle detection
- Use DFS-based cycle detection in addDependency
- Reject edges that would create cycles
- Requirements: 5.1
- 10.3 Implement topological sorting
- Implement getTopologicalOrder using Kahn’s algorithm
- Implement getReadyTasks for scheduler
- Requirements: 5.6
- 10.4 Write property test for DAG cycle detection
- Property 11: DAG Cycle Detection
- Validates: Requirements 5.1
11. Implement DAG Scheduler
- 11.1 Implement DAGScheduler class with CUDA streams
- Create
src/scheduler.cu - Create configurable number of CUDA streams
- Implement stream assignment for tasks
- Requirements: 6.1, 6.4
- Create
- 11.2 Implement dependency-based execution
- Execute tasks in topological order
- Respect all dependency constraints
- Trigger dependents when task completes
- Requirements: 5.2, 5.4
- 11.3 Implement CUDA event synchronization
- Insert events for cross-stream dependencies
- Synchronize all streams on completion
- Requirements: 6.2, 6.5
- 11.4 Implement error propagation
- Mark failed tasks and halt dependents
- Invoke error callback on failure
- Requirements: 5.5
- 11.5 Write property test for dependency ordering
- Property 12: Dependency Ordering
- Validates: Requirements 5.2, 5.4, 5.6
- 11.6 Write property test for error propagation
- Property 13: Error Propagation
- Validates: Requirements 5.5
- 11.7 Write property test for stream assignment and synchronization
- Property 14: Stream Assignment and Synchronization
- Validates: Requirements 6.1, 6.2
- 11.8 Write property test for stream synchronization on completion
- Property 15: Stream Synchronization on Completion
- Validates: Requirements 6.5
12. Checkpoint - Scheduler
- Ensure all tests pass, ask the user if questions arise.
13. Implement Pipeline Builder
- 13.1 Implement Pipeline class
- Create
src/pipeline.cpp - Implement addOperator, connect methods
- Wire to TaskGraph and DAGScheduler
- Requirements: 8.1
- Create
- 13.2 Implement automatic intermediate buffer allocation
- Allocate buffers based on operator output dimensions
- Manage buffer lifecycle
- Requirements: 8.2
- 13.3 Implement shared output for multiple dependents
- Ensure single execution for nodes with multiple dependents
- Share output buffer reference
- Requirements: 8.3
- 13.4 Implement runtime parameter configuration
- Add setParameter method for runtime updates
- Apply without graph reconstruction
- Requirements: 8.4
- 13.5 Implement batch processing
- Implement executeBatch for multiple frames
- Requirements: 8.5
- 13.6 Write property test for pipeline topology and buffer management
- Property 19: Pipeline Topology and Buffer Management
- Validates: Requirements 8.1, 8.2
- 13.7 Write property test for no redundant computation
- Property 20: No Redundant Computation
- Validates: Requirements 8.3
- 13.8 Write property test for runtime parameter configuration
- Property 21: Runtime Parameter Configuration
- Validates: Requirements 8.4
- 13.9 Write property test for batch processing
- Property 22: Batch Processing
- Validates: Requirements 8.5
14. Final Checkpoint
- Ensure all tests pass, ask the user if questions arise.
15. Integration and wiring
- 15.1 Create example pipeline demonstrating all operators
- Create
examples/demo_pipeline.cpp - Chain Resize → ColorConvert → GaussianBlur → Sobel
- Requirements: 8.1
- Create
- 15.2 Write integration tests for end-to-end pipeline
- Test complete pipeline execution
- Verify output correctness
- Requirements: All
Notes
- All tasks including property tests are required for comprehensive coverage
- Each task references specific requirements for traceability
- Checkpoints ensure incremental validation
- Property tests validate universal correctness properties
- Unit tests validate specific examples and edge cases
- CUDA code files use
.cuextension, C++ files use.cpp