Are you an LLM? You can read better optimized documentation at /compress-kit/en/architecture.md for this page in Markdown format
System Architecture Design
CompressKit employs a clear layered architecture ensuring code maintainability, testability, and cross-language consistency.
Architecture Overview
Layer Descriptions
1. CLI Layer
Each algorithm ships as a separate binary with a unified interface:
bash
# Contract: <binary> <encode|decode> <input> <output>
./huffman_go encode input.bin input.huf
./huffman_rust decode input.huf decoded.bin
./arithmetic_cpp encode input.bin input.aenc1
2
3
4
2
3
4
Design highlight: 94% boilerplate reduction through unified launcher.
2. Buffer Layer
Stateless convenience wrapper for simple use cases:
go
// Go example
encoder := huffman.NewBufferedEncoder()
output, err := encoder.Encode(input)1
2
3
2
3
rust
// Rust example
let encoder = huffman::BufferedEncoder::new();
let output = encoder.encode(&input)?;1
2
3
2
3
Features:
- Each call is independent
- Automatic buffer management
- Simplified error handling
3. Streaming Layer
Core state machine implementation supporting incremental processing:
go
encoder := huffman.NewStreamingEncoder()
// Incremental processing
encoder.Process(chunk1)
encoder.Process(chunk2)
encoder.Process(chunk3)
// Finish and get result
output, err := encoder.Finish()1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Features:
- 5-state finite state machine
- Transactional error handling
- Flush and reset support
4. Algorithm Core
Implementations of four compression algorithms:
| Algorithm | File | Core Functions |
|---|---|---|
| Huffman | huffman/encode.go | encodeBlock(), buildTree() |
| Arithmetic | arithmetic/encode.go | encodeSymbol(), normalize() |
| Range | range/encode.go | encodeSymbol(), shiftBytes() |
| RLE | rle/encode.go | encodeRun() |
5. Shared Utilities
Cross-algorithm shared infrastructure:
| Module | Function |
|---|---|
codec | Encoder/Decoder interface definitions |
errors | Unified error types and codes |
bits | Bit writer/reader |
frequency | Frequency table processing |
buffer | Buffer management |
Binary Format Specification
Common Structure
| Magic (4 bytes) | Header | Payload |1
Frequency Table Format
Cross-language unified:
- Order: symbols 0-255 (byte values), symbol 256 (EOF)
- Byte order: Little-Endian
- Total size: 4 bytes (symbol count) + 257 × 4 bytes = 1032 bytes
Security Boundaries
| Limit | Value | Purpose |
|---|---|---|
| Max input size | 4 GiB | Prevent frequency overflow and decompression bomb attacks |
| Max output size (decode only) | 1 GiB | Prevent decompression bomb attacks |
Deep Module Design
CompressKit follows the Deep Module principle:
Deep Module = Simple interface + Complex implementation
BufferedEncoder.Encode(input) → output
↓
Hidden complexity:
- State machine management
- Buffer expansion
- Error handling
- Bit alignment1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
Further Reading
- Streaming API - 5-state FSM details and complete API documentation
- Cross-Language Testing - Conformance verification