FQC format and random access
The .fqc container is a block archive, not a raw byte stream wrapper. Its layout is defined in include/fqc/format/fqc_format.h and implemented by include/fqc/format/fqc_writer.h and include/fqc/format/fqc_reader.h.
File layout
At a high level the file is written in this order:
magic + version
-> global header
-> block payloads
-> optional reorder map
-> block index
-> file footerThat order matters. The payload region stays append-only during compression, and the block index plus footer are written only after the writer knows every block offset and size.
What the format stores per block
A block groups a bounded run of reads and keeps its logical streams separate:
- identifiers
- sequences
- qualities
- auxiliary data such as variable read lengths
include/fqc/format/fqc_reader.h exposes this as BlockData, while include/fqc/format/fqc_writer.h exposes BlockPayload. The split keeps codec choices local to each stream and lets readers skip data they do not need.
Global metadata and flags
include/fqc/format/fqc_format.h also carries the archive-wide flags:
- paired-end vs single-end layout
- quality and ID handling modes
- preserve-order vs reordered archive layout
- reorder-map presence
- streaming mode
Those flags tell the reader whether original order can be reconstructed and which decoding path is valid.
What “random access” means here
In this project, random access is block-addressable lookup. The reader opens the archive, reads the footer and index once, then uses index entries to seek directly to the block that owns a target read range. It does not need to replay every earlier block.
That is why FQCReader::open() loads metadata first and why FQCReader::readBlock() accepts a block id directly. The cost is effectively constant with respect to archive length once the index is loaded: metadata read, one seek, then decode only the needed block payloads.
Role of the reorder map
Reordering improves compression locality, but it would destroy original read order unless the mapping were persisted. include/fqc/format/reorder_map.h stores both directions:
- original id -> archive id
- archive id -> original id
The map is delta- and varint-encoded so it remains small enough to keep with the archive. When decompression wants original order, the reverse map tells the writer how to restore it. When targeted lookup starts from original read coordinates, the forward map tells the reader which archive positions own those reads.
Why the format stays block-based
The project wants three things at once: domain-specific compression, bounded memory, and direct retrieval. A single monolithic entropy stream would improve some ratios but would make direct seeking expensive. A block container is the compromise: each block is self-contained enough to decode independently, while the footer and index make the archive globally navigable.
Code anchors
- Format definitions:
include/fqc/format/fqc_format.h - Writer path:
include/fqc/format/fqc_writer.h,src/format/fqc_writer.cpp - Reader path:
include/fqc/format/fqc_reader.h,src/format/fqc_reader.cpp - Reorder metadata:
include/fqc/format/reorder_map.h,src/format/reorder_map.cpp