Performance

Performance evidence only matters if it stays attached to system boundaries. This page therefore treats benchmark numbers as a maintained argument: which workloads are represented, which assumptions are fixed, and which parts of the conclusion must stay conditional.

Evidence summary

The representative snapshot cited in the maintained docs still comes from the standard scenario documented in the baseline: 100K reads, 150 bp, AMD Ryzen 9 5900X, Release build. Under that framing, the read path is about 1696 MB/s, the write path about 1.76M reads/s, combined filtering about 1.67M reads/s, and full statistics about 302 MB/s.

The point of repeating those numbers is not to promise that every dataset or storage path will behave the same. The point is to show that FastQTools already lands in a throughput class that matters for FASTQ QC and that the project publishes enough method context to keep those numbers reviewable.

Execution model: performance evidence should be read together with execution path, methodology, and maintenance boundaries.

Benchmark envelope

Treat the headline numbers as the envelope of the maintained benchmark story, not as universally portable constants. The envelope includes hardware class, build mode, workload shape, and the fact that the numbers are representative publication samples rather than release-time SLAs.

The evidence ladder

Treat the performance material as a ladder, not as a single benchmark table:

Narrative layer — this page explains what question the numbers are meant to answer.
Results layer — Benchmark Report shows the representative snapshot.
Method layer — Benchmark Guide explains how results are reproduced.
Policy layer — RFC-0003 and RFC-0006 define collection, storage, thresholds, and release-facing interpretation.

If you skip a layer, you usually end up overstating the conclusion. Architecture still matters here, which is why it helps to read this page alongside Architecture and Algorithms.

Interpretation rules

Use these rules before quoting a number outside this site:

Rule	Consequence
Quote the workload with the number.	A throughput figure without read count, read length, build mode, and machine class is not reviewable.
Keep read, write, filter, and statistics paths separate.	A strong read-path number does not automatically prove the whole QC workflow.
Treat compression as part of the workload.	gzip cost can dominate or mask parser behavior depending on input and storage.
Repeat locally for adoption decisions.	The published snapshot is a starting point for evaluation, not a procurement guarantee.

Threats to validity

Published results are representative samples, not universal constants. The most important moving parts are:

Compression ratio and codec cost: gzip level and input compressibility directly affect CPU cost in read and write stages.
Storage I/O: NVMe, network storage, container volumes, and shared filesystems can turn a benchmark into more of a disk test than a parsing test.
Thread count and concurrency parameters: single-thread and multi-thread pipeline results are not directly comparable, and too many threads can add contention, scheduling noise, and NUMA effects.
Input distribution: read length, quality distribution, predicate combinations, and pass rate all change the hotspots in the path.
Machine topology: CPU microarchitecture, cache hierarchy, memory bandwidth, SMT, and container limits all affect the final curve.

So the right question is usually not “is 1696 MB/s the truth?” It is “does the project publish a result, method, and policy that make further evaluation worthwhile?”

What can change the conclusion

For migration work, procurement review, or SLA framing, reproduce the workload yourself and read the policy documents before repeating any number as if it were a guarantee.

Reproduction trail

Once the raw metrics are clear, use the maintained trail to validate and reproduce them:

Benchmark Report for the representative published snapshot;
Benchmark Guide for environment and command reproduction;
Research bibliography for terminology and standards references;
Evolution notes for the historical reasons benchmark policy became first-class documentation.

Cross-checks beyond the benchmark report

Use the research layer to avoid reading metrics in isolation:

Research bibliography collects the formal sources behind format, architecture, and benchmark language.
Related projects helps you compare FastQTools with FastQC, fastp, Cutadapt, and seqtk without flattening them into a winner table.
Evolution notes explains why benchmark policy and memory policy were elevated into maintained architecture decisions.

Return to the whitepaper storyline

If you have not yet built the system model, go back to Whitepaper and Architecture. If you want the maintained behavior behind the measured paths, continue to Algorithms. If you are ready to act on exact commands or APIs, move to Reference.

Performance ​

Evidence summary ​

Benchmark envelope ​

The evidence ladder ​

Interpretation rules ​

Threats to validity ​

What can change the conclusion ​

Reproduction trail ​

Cross-checks beyond the benchmark report ​

Return to the whitepaper storyline ​