Release v0.2.0

Release Date: 2025-03-15

🌐 English

Overview

Mini-Inference Engine v0.2.0 is a major refactoring release with advanced GEMM optimizations and new infrastructure components.

🚀 Major GEMM Optimizations

Memory coalescing optimization for improved bandwidth utilization
Double buffering technique for latency hiding
Register blocking optimization achieving ~70% of cuBLAS
Kernel fusion (GEMM + Bias + ReLU)
Vectorized memory loads with float4

🔧 New Features

Half-precision (FP16) GEMM support
Batched GEMM operations
Memory pool with caching
Stream manager for concurrency
Configuration system
Logging system
INT8 quantization support
Auto-tuner for kernel selection
Performance profiler

📊 Performance

Kernel	vs cuBLAS
Coalesced GEMM	~30%
Double Buffer GEMM	~40%
Optimized GEMM	~70%
Fused GEMM	~80%

Breaking Changes

None. All APIs are backward compatible with v0.1.0.

🌐 简体中文

概述

Mini-Inference Engine v0.2.0 是一个重大重构版本，包含高级 GEMM 优化和新的基础设施组件。

🚀 主要 GEMM 优化

内存合并优化，提高带宽利用率
双缓冲技术隐藏延迟
寄存器分块优化，达到 ~70% cuBLAS 性能
算子融合（GEMM + Bias + ReLU）
使用 float4 的向量化内存加载

🔧 新功能

半精度 (FP16) GEMM 支持
批量 GEMM 操作
带缓存的内存池
流管理器支持并发
配置系统
日志系统
INT8 量化支持
自动调优器选择最优内核
性能分析器

📊 性能

内核	相对 cuBLAS
Coalesced GEMM	~30%
Double Buffer GEMM	~40%
Optimized GEMM	~70%
Fused GEMM	~80%

破坏性变更

无。所有 API 与 v0.1.0 向后兼容。

🔗 Links

v1.1.0 v1.0.0 v0.1.0