Skip to content

Performance

Performance evidence only matters if it stays attached to system boundaries. This page therefore treats benchmark numbers as a maintained argument: which workloads are represented, which assumptions are fixed, and which parts of the conclusion must stay conditional.

Evidence summary

The representative snapshot cited in the maintained docs still comes from the standard scenario documented in the baseline: 100K reads, 150 bp, AMD Ryzen 9 5900X, Release build. Under that framing, the read path is about 1696 MB/s, the write path about 1.76M reads/s, combined filtering about 1.67M reads/s, and full statistics about 302 MB/s.

The point of repeating those numbers is not to promise that every dataset or storage path will behave the same. The point is to show that FastQTools already lands in a throughput class that matters for FASTQ QC and that the project publishes enough method context to keep those numbers reviewable.

FastQTools execution modelStaged execution path with ingest, processing, sink, and benchmark interpretation surfaces.Ingestand framinggzip decodingbounded FastqBatch reuseParallel processingand statistics pathpredicates and mutatorssummary metrics and per-position workoneTBB pipeline orchestrationview lifetime remains explicitSink andreportingordered outputsummary emissionBenchmark interpretation belongs to the whole path, not to one isolated stage.
Execution model: performance evidence should be read together with execution path, methodology, and maintenance boundaries.

Benchmark envelope

Treat the headline numbers as the envelope of the maintained benchmark story, not as universally portable constants. The envelope includes hardware class, build mode, workload shape, and the fact that the numbers are representative publication samples rather than release-time SLAs.

The evidence ladder

Treat the performance material as a ladder, not as a single benchmark table:

  1. Narrative layer — this page explains what question the numbers are meant to answer.
  2. Results layerBenchmark Report shows the representative snapshot.
  3. Method layerBenchmark Guide explains how results are reproduced.
  4. Policy layerRFC-0003 and RFC-0006 define collection, storage, thresholds, and release-facing interpretation.

If you skip a layer, you usually end up overstating the conclusion. Architecture still matters here, which is why it helps to read this page alongside Architecture and Algorithms.

Interpretation rules

Use these rules before quoting a number outside this site:

RuleConsequence
Quote the workload with the number.A throughput figure without read count, read length, build mode, and machine class is not reviewable.
Keep read, write, filter, and statistics paths separate.A strong read-path number does not automatically prove the whole QC workflow.
Treat compression as part of the workload.gzip cost can dominate or mask parser behavior depending on input and storage.
Repeat locally for adoption decisions.The published snapshot is a starting point for evaluation, not a procurement guarantee.

Threats to validity

Published results are representative samples, not universal constants. The most important moving parts are:

  • Compression ratio and codec cost: gzip level and input compressibility directly affect CPU cost in read and write stages.
  • Storage I/O: NVMe, network storage, container volumes, and shared filesystems can turn a benchmark into more of a disk test than a parsing test.
  • Thread count and concurrency parameters: single-thread and multi-thread pipeline results are not directly comparable, and too many threads can add contention, scheduling noise, and NUMA effects.
  • Input distribution: read length, quality distribution, predicate combinations, and pass rate all change the hotspots in the path.
  • Machine topology: CPU microarchitecture, cache hierarchy, memory bandwidth, SMT, and container limits all affect the final curve.

So the right question is usually not “is 1696 MB/s the truth?” It is “does the project publish a result, method, and policy that make further evaluation worthwhile?”

What can change the conclusion

For migration work, procurement review, or SLA framing, reproduce the workload yourself and read the policy documents before repeating any number as if it were a guarantee.

Reproduction trail

Once the raw metrics are clear, use the maintained trail to validate and reproduce them:

Cross-checks beyond the benchmark report

Use the research layer to avoid reading metrics in isolation:

  • Research bibliography collects the formal sources behind format, architecture, and benchmark language.
  • Related projects helps you compare FastQTools with FastQC, fastp, Cutadapt, and seqtk without flattening them into a winner table.
  • Evolution notes explains why benchmark policy and memory policy were elevated into maintained architecture decisions.

Return to the whitepaper storyline

If you have not yet built the system model, go back to Whitepaper and Architecture. If you want the maintained behavior behind the measured paths, continue to Algorithms. If you are ready to act on exact commands or APIs, move to Reference.

MIT License © LessUp