fq-compressor
fq-compressor is a high-performance, next-generation FASTQ compression tool designed for the sequencing era. It combines state-of-the-art Assembly-based Compression (ABC) strategies with robust industrial-grade engineering to deliver extreme compression ratios, fast parallel processing, and native random access.
Key Features
| Feature | Description |
|---|---|
| Extreme Compression | Approaching theoretical limits using Assembly-based reordering and consensus generation |
| Hybrid Quality Compression | Statistical Context Mixing (SCM) for quality scores, balancing ratio and speed |
| Parallel Powerhouse | Built on Intel oneTBB with a scalable Producer-Consumer pipeline |
| Random Access | Native block-based format (like BGZF) enables instant access to any part of the file |
| Standard Compliant | Written in C++23, using Modern CMake, Conan 2.x, and GitHub Actions CI/CD |
Why fq-compressor?
Traditional FASTQ compressors treat reads as independent strings and rely on general-purpose compression algorithms. fq-compressor takes a fundamentally different approach:
- Reads are fragments of a genome — we exploit the biological redundancy by reordering and assembling reads before compression.
- Each data stream gets a specialized compressor — sequences use ABC, quality scores use SCM, and identifiers use tokenization + delta encoding.
- The archive format is designed for real-world use — independent blocks enable random access, parallel decompression, and fault isolation.
Performance at a Glance
| Compiler | Compression | Decompression | Compression Ratio |
|---|---|---|---|
| GCC | 11.30 MB/s | 60.10 MB/s | 3.97x |
| Clang | 11.90 MB/s | 62.30 MB/s | 3.97x |
Tested on Intel Core i7-9700 @ 3.00GHz (8 cores), 2.27M Illumina reads (511 MB uncompressed)
Get Started
- Installation — build from source
- Quick Start — compress your first FASTQ file
- CLI Reference — all commands and options
- Architecture — how it works under the hood