FQC Archive Format

The FQC (FASTQ Compressed) format is a domain-specific archive optimized for genomic FASTQ data.

Features

  • Block-based storage: Independent compressed blocks
  • Columnar streams: Separate ID/Sequence/Quality compression
  • Random access: O(1) extraction of any read range
  • Integrity verification: CRC32C checksums
  • Reorder support: Original order restoration

File Structure

┌─────────────────────────────────────────────┐
│ FQC Archive                                 │
├─────────────────────────────────────────────┤
│ Fixed Header (256 bytes)                    │
│   - Magic: "FQCC"                           │
│   - Version: 0.1.0                          │
│   - Format flags                            │
│   - Block count                             │
│   - Total reads/bases                       │
├─────────────────────────────────────────────┤
│ Compression Parameters                      │
├─────────────────────────────────────────────┤
│ Block 0                                     │
│   ┌─ Block Header (64 bytes)                │
│   ├─ ID Stream (zstd)                       │
│   ├─ Seq Stream (ABC+zstd)                  │
│   └─ Qual Stream (SCM+zstd)                 │
├─────────────────────────────────────────────┤
│ Block 1 ...                                 │
├─────────────────────────────────────────────┤
│ Block N-1                                   │
├─────────────────────────────────────────────┤
│ Footer                                      │
│   - Block Index Table                       │
│   - Reorder Map (optional)                  │
│   - Footer checksum                         │
└─────────────────────────────────────────────┘

Fixed Header

struct FQCHeader {
    char magic[4] = {'F', 'Q', 'C', 'C'};
    uint16_t version_major = 0;
    uint16_t version_minor = 1;
    uint16_t version_patch = 0;
    uint16_t format_flags;
    uint32_t block_count;
    uint64_t total_reads;
    uint64_t total_bases;
    uint8_t reserved[224];
};

Format Flags

BitNameDescription
0PAIRED_ENDPaired-end data
1REORDEREDReads were reordered
2ORIG_ORDER_PRESERVEDCan restore original order
3HAS_METADATAContains metadata block

Block Format

Block Header

struct BlockHeader {
    uint32_t block_id;
    uint32_t flags;
    uint64_t original_size;
    uint64_t compressed_size;
    uint32_t read_count;
    uint64_t base_count;
    uint32_t crc32c_data;
    uint32_t crc32c_uncompressed;
};

Block Data

Each block contains three compressed streams:

StreamEncodingContent
IDToken + Delta + ZstdRead identifiers
SequenceABC + ZstdDelta from consensus
QualitySCM + ZstdContext-mixed scores

Block Index Table

struct BlockIndexEntry {
    uint64_t offset;          // File offset
    uint64_t size;            // Total block size
    uint32_t read_count;      // Number of reads
    uint64_t first_read_id;   // Global read ID
};

Reorder Map

Maps compressed order to original order:

struct ReorderMap {
    uint32_t version = 1;
    uint64_t read_count;
    uint32_t compression_type;
    uint8_t compressed_mapping[];  // Delta-encoded varints
};

Version History

VersionDateChanges
0.1.02026-04-16Initial release

Implementation Notes

Block Size

  • Default: ~10MB uncompressed per block
  • Smaller blocks: Better random access, more overhead
  • Larger blocks: Better compression, worse access

Compression Levels

ModeZstd LevelUse Case
fast1Speed critical
balanced3Default
best19Storage critical

Memory Mapping

Block headers are aligned to 4096 bytes (page size) for efficient memory mapping.