Range Coder
Range coding is an integer-based implementation equivalent to arithmetic coding. It uses integer interval operations instead of floating point, making it more suitable for production systems while achieving the same compression ratios.
How It Works
Range coding maintains an interval [low, low + range) using fixed-width integers. Unlike arithmetic coding's bit output, range coding outputs bytes, which significantly improves I/O efficiency.
void encode(const vector<uint8_t>& data,
const uint32_t cumFreq[257]) {
uint64_t low = 0;
uint64_t range = MAX_RANGE;
uint64_t total = cumFreq[256];
for (uint8_t symbol : data) {
uint32_t symLow = cumFreq[symbol];
uint32_t symHigh = cumFreq[symbol + 1];
range /= total;
low += symLow * range;
range *= (symHigh - symLow);
// Renormalize
while (range < MIN_RANGE) {
output_byte(low >> 56);
low <<= 8;
range <<= 8;
}
}
// Output final bytes
flush(low);
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
func (rc *RangeCoder) Encode(data []byte, cumFreq [257]uint32) error {
const maxRange uint64 = 1 << 32
const minRange uint64 = 1 << 24
total := cumFreq[256]
for _, symbol := range data {
symLow := cumFreq[symbol]
symHigh := cumFreq[symbol+1]
rc.range /= uint64(total)
rc.low += uint64(symLow) * rc.range
rc.range *= uint64(symHigh - symLow)
for rc.range < minRange {
rc.outputByte(byte(rc.low >> 56))
rc.low <<= 8
rc.range <<= 8
}
}
rc.flush()
return nil
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
pub fn encode(&mut self, data: &[u8], cum_freq: &[u32; 257]) {
const MAX_RANGE: u64 = 1u64 << 32;
const MIN_RANGE: u64 = 1u64 << 24;
let total = cum_freq[256];
for &symbol in data {
let sym_low = cum_freq[symbol as usize];
let sym_high = cum_freq[symbol as usize + 1];
self.range /= total as u64;
self.low += (sym_low as u64) * self.range;
self.range *= (sym_high - sym_low) as u64;
while self.range < MIN_RANGE {
self.output((self.low >> 56) as u8);
self.low <<= 8;
self.range <<= 8;
}
}
self.flush();
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Arithmetic vs Range Coder
| Aspect | Arithmetic | Range Coder |
|---|---|---|
| Arithmetic | Floating point | Fixed-width integers |
| Output unit | Bits | Bytes |
| I/O efficiency | Lower | Higher |
| Compression | Nearly identical | Nearly identical |
| Patent status | Had historical patents | No restrictions |
| Production use | Academic | Industry standard |
Complexity
| Aspect | Complexity | Notes |
|---|---|---|
| Time (encode) | O(n) | Similar to arithmetic |
| Time (decode) | O(n) | Byte-level I/O faster |
| Space | O(σ) | Cumulative frequency table |
| Precision | Fixed | 64-bit integers |
Performance Characteristics
| Language | Text Compression | Speed | Memory |
|---|---|---|---|
| C++ | 1.90× | 58 MiB/s | Low |
| Go | 1.90× | 50 MiB/s | Low |
| Rust | 1.90× | 64 MiB/s | Low |
Use Cases
- ✅ Production systems — Most widely deployed entropy coder
- ✅ Balanced workloads — Good speed and compression
- ✅ Video codecs — H.264, HEVC use range coding
- ✅ Compression tools — Used in modern archivers
Library API
Go
import "github.com/LessUp/compress-kit/algorithms/range/go/rangecoder"
// Create encoder with frequency table
freq := rangecoder.BuildFrequencyTable(data)
cumFreq := rangecoder.BuildCumulative(freq)
// Encode
encoded, err := rangecoder.Encode(data, cumFreq)
// Decode
decoded, err := rangecoder.Decode(encoded, cumFreq, len(data))2
3
4
5
6
7
8
9
10
11
Rust
use rangecoder;
let freq = rangecoder::build_frequency_table(&data);
let cum_freq = rangecoder::cumulative(&freq);
let encoded = rangecoder::encode(&data, &cum_freq)?;
let decoded = rangecoder::decode(&encoded, &cum_freq, data.len())?;2
3
4
5
6
7
Further Reading
- Arithmetic Coding — Floating point equivalent
- Benchmarks — Performance comparison
- OpenSpec Architecture Specs
Known Limitations
Performance Issue with Large Files
The current Range Coder implementation has a known decode performance issue for files larger than 500 KiB. The decode operation may become significantly slower or appear to hang.
Workaround: For testing purposes, use files smaller than 100 KiB. This is reflected in the CI pipeline which uses 100 KiB test files for Range Coder verification.
Status: This is a known issue that is documented for future improvement. The encode operation works correctly for all file sizes.