Statistics API¶
Namespace: fq::statistic
StatisticCalculatorInterface¶
Abstract interface for statistical calculation tasks, create instances via factory function.
Factory Creation¶
fq::statistic::StatisticOptions options;
options.inputFastqPath = "input.fastq.gz";
options.outputStatPath = "output.stat.txt";
options.threadCount = 4;
auto calculator = fq::statistic::createStatisticCalculator(options);
calculator->run();
Interface Definition¶
class StatisticCalculatorInterface {
public:
virtual ~StatisticCalculatorInterface() = default;
virtual void run() = 0;
};
StatisticOptions¶
Statistics task configuration.
| Field | Type | Description |
|---|---|---|
inputFastqPath |
std::string |
Input FASTQ file path |
outputStatPath |
std::string |
Output statistics file path |
threadCount |
size_t |
Number of threads |
batchSize |
size_t |
Batch processing size |
StatisticInterface¶
Low-level abstract interface for statistical calculation, supports custom statistics logic extension.
class StatisticInterface {
public:
using Batch = fq::io::FastqBatch;
using Result = FqStatisticResult;
virtual ~StatisticInterface() = default;
virtual auto calculateStats(const Batch& batch) -> Result = 0;
};
FqStatisticResult¶
Statistics result data structure.
| Field | Type | Description |
|---|---|---|
readCount |
uint64_t |
Total read count |
totalBases |
uint64_t |
Total base count |
maxReadLength |
uint32_t |
Maximum read length |
posQualityDist |
vector<vector<uint64_t>> |
Position quality distribution |
posBaseDist |
vector<vector<uint64_t>> |
Position base distribution |
Supports operator+= to merge statistics results from multiple batches.
Parallel Processing Architecture¶
Statistical analysis uses TBB parallel pipeline:
Input file → FastqReader → [Source] → [Processing] → [Aggregation] → Output file
Serial read Parallel calc Serial merge
- Source (serial_in_order): Serially read FastqBatch
- Processing (parallel): Parallelly calculate statistics for each batch
- Aggregation (serial_in_order): Merge all statistics results
Output Format¶
Statistics results support JSON and text format output, containing:
- Total reads, valid reads
- Sequence length distribution (min/max/average)
- Base composition (A/T/C/G/N ratio)
- GC content
- Position quality distribution (Q20/Q30 percentage)