Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
For the complete changelog, see GitHub CHANGELOG.md.
Unreleased
Added
- Comprehensive bilingual documentation (English & Chinese)
- New documentation site structure under
docs/ - API reference documentation
- Architecture design documentation
- Performance optimization guide
- Development contribution guide
2.0.1 - 2026-04-16
Security
- Fixed 5 moderate npm vulnerabilities (esbuild, vite, vitest)
- CVE-2024-XXXX: Regular expression denial of service in esbuild
- CVE-2024-XXXX: Path traversal vulnerability in vite
- Updated
vitestfrom v1.6.1 to v4.1.4 - Updated
@vitest/coverage-v8from v1.6.1 to v4.1.4
Changed
- Updated
@webgpu/typesfrom 0.1.68 to 0.1.69 - Updated
@types/nodeto latest v20.x
2.0.0 - 2026-03-09
Added
- Zero-copy Tensor.reshape() via tensor views
- Significant memory reduction for reshape-heavy models
Tensor._createView()static method for buffer sharingTensor._isViewflag to track view tensors
- GPUContext.deferDestroy() for safe async buffer cleanup
- Prevents use-after-free race conditions
- Automatic cleanup of temporary GPU resources
- GPUContext.sync() and
GPUContext.waitForSubmittedWork()methods- Better control over GPU synchronization
- FlattenOperator for CNN-to-FC layer transitions
- DenseOperator for fully connected layers
- Conv2dBiasReLUOperator - fused kernel optimization
- 3x memory traffic reduction vs sequential execution
Fixed
- [Critical]
Tensor.reshape()unnecessary GPU buffer allocation- Now uses zero-copy views, reducing memory pressure
- [Critical]
InferenceEngine.infer()GPU memory leak- Intermediate tensors now properly destroyed after use
- [Critical] Race conditions in operator temp buffer cleanup
- Now uses
deferDestroy()pattern consistently
- Now uses
Changed
Tensor.destroy()is now a no-op for view tensors- All operators use deferred buffer destruction pattern
Performance
| Operation | Before | After | Improvement |
|---|---|---|---|
| Flatten | Full buffer copy | Zero overhead | ~100% |
| Fused Conv2d+Bias+ReLU | 6 mem ops | 2 mem ops | 3× |
| Memory leaks | Persistent | Properly freed | Stable |
1.0.0 - 2026-03-09
Added
- Standardized CI workflow with
permissionsandconcurrencyconfiguration - Node.js 20 validation pipeline (lint, test, build)
- Automated security scanning for dependencies
Changed
lintscript now runs TypeScript type checking- Updated CI triggers for better resource efficiency
0.9.0 - 2026-03-10
Added
- GitHub Pages deployment workflow
- Bilingual documentation support
- Chinese documentation (
README.zh-CN.md)
- Chinese documentation (
- Architecture diagram in
index.md - SEO metadata in
_config.yml
Changed
- Refined Pages workflow path triggers for documentation files only
- Optimized Jekyll build exclusions
0.8.0 - 2026-01-08
Added
tests/setup.ts- WebGPU globals for Node.js test environmenttests/tsconfig.json- separate TypeScript config for tests@vitest/coverage-v8for V8 coverage reporting
Fixed
- Vitest failing in Node.js due to missing WebGPU globals
- Operator test mocks not writing computed results to output buffers
GPUContext.test.tsfailing whennavigatoris read-only
Changed
- All operator tests have properly mocked WebGPU buffer lifecycle
- 54 tests passing (expanded from 47)
0.1.0 - 2025-02-13
Added
Initial project infrastructure
- MIT LICENSE
.editorconfigfor consistent code formatting- TypeScript 5.x with strict mode
- Vitest for testing with coverage
Core Components
GPUContext- WebGPU device management with error handlingTensor- GPU buffer management with layout support (NCHW/NHWC)- Custom error classes for all failure modes
Neural Network Operators
ReLUOperator- Element-wise activationSoftmaxOperator- Numerically stable probability distributionMaxPoolOperator- 2D max poolingConv2dOperator- Direct convolutionDenseOperator- Matrix multiplicationFlattenOperator- Tensor reshaping
Inference Engine
InferenceEngine- Complete inference pipelineModelLoader- JSON model loading
Testing Framework
- Property-based testing with
fast-check - 100+ iterations per property test
- CPU reference implementations for validation
- Property-based testing with
Examples
- MNIST digit classification demo
- Performance benchmark demo
Performance Features
- Hand-written WGSL compute shaders
- Kernel fusion support architecture
- Im2Col utilities for experimentation
Stats
- ~2,000 lines of source code
- ~1,100 lines of tests
- 7 neural network operators
- 13 property-based tests
- Zero runtime dependencies