Skip to content

fq-compressorSystems Whitepaper for FASTQ Compression

Audit the algorithmic thesis, archive contract, performance boundary, and reference shelf behind fq-compressor in one bilingual portal.

Abstract

Compression ratio matters here because it is tied to retrieval semantics, evidence provenance, and code boundaries.

fq-compressor is documented as a coupled system. Read ordering, block-local transforms, FQC indexing, benchmark methodology, and operational workflows are treated as one public contract.

What you can audit here

  • Algorithm framing for ABC, SCM, reversible reordering, and consensus-plus-delta encoding
  • System design for block-local compression, archive materialization, and O(1) random access
  • Performance language bounded by tracked benchmark artifacts and explicit methodology

Primary repository anchors

  • Architecture maps public concepts to include/fqc/, src/, and format responsibilities
  • Performance keeps benchmark claims tied to repository artifacts and proof limits
  • References connects the site to papers, comparator repositories, and archived research notes

Performance ledger

Public claims stay attached to method, artifacts, and retrieval cost.

This portal does not separate headline numbers from archive semantics. Every metric is paired with the subsystem, methodology, or repository artifact that makes it auditable.

Open methodology
01

3.97x

Archive density

Compression ratio is presented as a bounded result, not as a floating claim detached from dataset scope.

  • ERR091571 smoke-scale artifact
  • Public benchmark report in repo
Trace the evidence
02

11.9 MB/s

Write-path throughput

Compression speed is interpreted through the pipeline, chunking, and backpressure design, not as an isolated stopwatch number.

  • Pipeline topology
  • Block-local work scheduling
Inspect the pipeline
03

62.3 MB/s

Read-back speed

Decode speed stays in scope because random access is only useful if retrieval remains practical under real archive workflows.

  • Decompression path
  • Original-order restore boundary
Read the algorithm brief
04

O(1)

Random access

Indexed lookup is treated as a first-class contract. The format and block map are part of the public thesis, not an implementation footnote.

  • FQC block index
  • Range decode without full expansion
Study the format

System blueprint

The archive is built as a chain of explicit contracts.

fq-compressor is easier to audit when each phase has a clear boundary: ingest, analysis, block transforms, archive materialization, and selective retrieval.

Open system design

01

Ingest

FASTQ plus compressed FASTQ streams enter through parser and stream adapters.

io/fastq_parser + io/compressed_stream

02

Analyze

Global statistics establish reorder intent, chunk sizing, and memory discipline.

algo/global_analyzer + common/memory_budget

03

Compress

Block-local transforms split sequence, ID, and quality responsibilities across dedicated codecs.

algo/block_compressor + quality/id streams

04

Store

FQC writes blocks, checksums, reorder metadata, and the lookup structures needed later.

format/fqc_writer + format/index tables

05

Retrieve

Readers can verify, range decode, or restore original order without replaying the whole archive.

format/fqc_reader + pipeline/decompressor

Why the block matters

The block is the smallest unit that still carries compression leverage, checksum scope, and direct lookup.

Where integrity lives

Checksums and verify flows live at the archive boundary, which keeps retrieval semantics inspectable.

Where to continue

Read pipeline for concurrency and flow control, then format and random access for the archive contract.

Algorithms

ABC plus SCM, framed as a system thesis

The whitepaper lane explains why fq-compressor splits read ordering, consensus-style sequence reduction, and quality modeling into distinct but cooperating stages.

Evidence

Performance claims stay narrower than aspiration

The benchmark lane is intentionally conservative. It shows what the repository can prove today, not every future claim the project may eventually support.

References

The portal cites literature and comparators explicitly

The reference shelf ties fq-compressor back to SPRING [R1], fqzcomp [R2], HARC [C2], NanoSpring [R3], and repository-local evidence anchors.

Reading tracks

Choose a route that matches your question, then stay in that lane.

  1. 01

    Staff-level reviewer

    Evaluate the project thesis in one sitting

    Start with algorithms, then verify every public claim against the evidence contract.

    Entry
    Whitepaper -> Performance
    Outcome
    You can judge whether the public story outruns the repository.
    Open this track
  2. 02

    Operators

    Move from install to verified archive handling

    Stay in operations when the immediate goal is to install, run, verify, or spot-check real outputs.

    Entry
    Operations -> System Design
    Outcome
    You can execute the tool without guessing hidden format rules.
    Open this track
  3. 03

    Contributors

    Map the code before changing anything

    Read the system blueprint with contribution guidance open beside it so architectural boundaries stay visible during edits.

    Entry
    System Design -> Operations
    Outcome
    You know which modules own parsing, compression, format, and command glue.
    Open this track
  4. 04

    Research readers

    Put the design choices back into external context

    Use the reference shelf for papers, comparator repositories, and closeout-mode evolution notes.

    Entry
    References -> Algorithms
    Outcome
    You can explain which upstream ideas fq-compressor keeps, adapts, or rejects.
    Open this track

Citation apparatus

The public story is backed by papers, repositories, and local evidence anchors.

Core literature

  • [R1]
    SPRING paper

    Closest paper-level frame for assembly-based compression and reversible reordering.

  • [R2]
    fqzcomp repository

    Quality-value coding reference for stream-specific trade-offs.

  • [R3]
    NanoSpring paper

    Long-read comparator that clarifies what fq-compressor is not tuned for first.

Continue reading

Comparator repositories

  • [C1]
    Spring

    Upstream reference for ordering plus consensus-and-delta reasoning.

  • [C2]
    HARC

    FASTQ-specialized comparator for architecture and scope choices.

  • [C3]
    fqzcomp

    Compact quality model used as a pressure test for stream design.

Continue reading

Repository evidence

  • [E1]
    benchmark/results/

    Tracked machine-readable and narrative benchmark artifacts.

  • [E2]
    docs/archive/

    Historical research and governance material kept for reference only.

  • [E3]
    vendor/spring-core/

    License-bounded extracted reference code used for study.

Continue reading