Third Round Optimization (2026-03-10)
Code quality, performance, and documentation improvements.
Changes
Performance
- Sobel kernel: Moved per-thread local Sobel weight arrays to
__constant__memory — eliminates redundant local memory allocation across all threads - Gaussian blur: Replaced
#define TILE_SIZE/#define MAX_KERNEL_RADIUSwithstatic constexpr— type-safe, no namespace pollution
Bug Fixes
- Pipeline
findInputOutputNodes(): Fixed bug where manually-set input nodes (viasetInput()) with dependencies were cleared on eachexecute()call — now preserves them - Pipeline
execute(): Merged two redundant validation loops (null operator check + input validity) into a single coherent pass - DAGScheduler destructor: Added null checks before
cudaStreamDestroy/cudaEventDestroyto prevent undefined behavior if creation failed
Code Quality
- MemoryManager: Simplified redundant tracking — replaced separate
pinnedSizes_/pinnedFlags_/deviceSizes_maps with unifiedMemoryBlock-basedactivePinnedAllocs_/activeDeviceAllocs_maps - CMake: Added MSVC-compatible compile options via
$<CXX_COMPILER_ID:MSVC>generator expressions (/O2,/W4) - CMake: Added
testPresetstoCMakePresets.json—ctest --preset defaultnow works as documented
CI
- ci.yml: Added
cteststep after build (continue-on-errorsince tests require GPU)
Documentation
- README.md: Fixed wrong include path (
pipeline/pipeline.h→pipeline.h) and outdated API in usage example - README.md: Added CI/Docs badges, operator table, GPU architecture table, expanded project structure with per-file descriptions, architecture diagram, engineering quality section
- index.md: Fixed incorrect build paths in quick start section (
build/release/→build-release/) - .gitignore: Added
.cache/directory