Skip to content

System design

This section maps the conceptual story to the codebase and archive layout. Use it when you need to understand how fq-compressor is split across parsing, analysis, block compression, format materialization, and targeted retrieval.

System blueprint

The archive is built as a chain of explicit contracts.

fq-compressor is easier to audit when each phase has a clear boundary: ingest, analysis, block transforms, archive materialization, and selective retrieval.

Open system design

01

Ingest

FASTQ plus compressed FASTQ streams enter through parser and stream adapters.

io/fastq_parser + io/compressed_stream

02

Analyze

Global statistics establish reorder intent, chunk sizing, and memory discipline.

algo/global_analyzer + common/memory_budget

03

Compress

Block-local transforms split sequence, ID, and quality responsibilities across dedicated codecs.

algo/block_compressor + quality/id streams

04

Store

FQC writes blocks, checksums, reorder metadata, and the lookup structures needed later.

format/fqc_writer + format/index tables

05

Retrieve

Readers can verify, range decode, or restore original order without replaying the whole archive.

format/fqc_reader + pipeline/decompressor

Why the block matters

The block is the smallest unit that still carries compression leverage, checksum scope, and direct lookup.

Where integrity lives

Checksums and verify flows live at the archive boundary, which keeps retrieval semantics inspectable.

Where to continue

Read pipeline for concurrency and flow control, then format and random access for the archive contract.

Layer map

LayerResponsibilityKey anchors
IngestOpen FASTQ and compressed FASTQ inputs, normalize stream handlinginclude/fqc/io/fastq_parser.h, include/fqc/io/compressed_stream.h
AnalysisCollect global statistics, reorder intent, and memory disciplineinclude/fqc/algo/global_analyzer.h, include/fqc/common/memory_budget.h
CompressionEncode sequence, IDs, and quality values on block-local unitsinclude/fqc/algo/block_compressor.h, include/fqc/algo/id_compressor.h, include/fqc/algo/quality_compressor.h
FormatWrite blocks, checksums, reorder metadata, and direct-lookup structuresinclude/fqc/format/fqc_writer.h, include/fqc/format/fqc_header.h
RetrievalVerify, range decode, and optionally restore original orderinclude/fqc/format/fqc_reader.h, include/fqc/pipeline/decompressor_node.h

Non-negotiable invariants

  • The block is the unit that makes throughput, checksum scope, and random access compatible.
  • The archive format is part of the product contract. It is not just an opaque byte bucket behind the CLI.
  • The command layer stays thin so compression and decompression behavior lives in reusable library code, not in CLI-only branches.

Continue with