Pipeline Foundations

MICOS-2024 should be read as a staged metagenomic evidence pipeline. The repository is not only a collection of tools, it is an attempt to standardize how raw read data becomes interpretable biological output.

Why this section exists

Many documentation sites jump straight into installation or commands. That helps operators, but it does not help a reviewer understand the logic of the system. This section frames the conceptual model first:

what goes in,
what transforms it,
what comes out,
what each output can and cannot support analytically.

Pipeline stages at a glance

Stage	Core tools	Main objective	Primary outputs
Quality control	FastQC, KneadData	Remove technical noise and host contamination	Clean reads, QC summaries
Taxonomic profiling	Kraken2, kraken-biom, Krona	Convert reads into taxonomic evidence	Reports, BIOM table, interactive taxonomy view
Diversity analysis	QIIME2, Phyloseq-facing outputs	Estimate ecological differences across samples	Alpha and beta diversity artifacts
Functional readout	HUMAnN-oriented workflow and helper scripts	Infer pathway or gene-family signals	Functional abundance tables
Summarization	HTML report generation	Package outputs into reviewable deliverables	Report-ready result directories

Stage 1, quality control

The first stage is about defensibility. Reads that still contain adapters, low-quality segments, or host contamination produce downstream bias. In MICOS-2024 this concern is represented through:

CLI-level quality control execution,
WDL steps for FastQC and KneadData,
configuration templates for quality thresholds and host databases.

The purpose is not only cleaner input, but cleaner interpretation later.

Stage 2, taxonomic profiling

This is the strongest visible branch in the current codebase. Kraken2 reports, BIOM conversion, and Krona visualization form a coherent sub-pipeline with clear deliverables.

Interpretation note

Taxonomic profiling gives ranked evidence, not absolute biological truth. Confidence thresholds, database choice, and contamination handling still shape the result.

Stage 3, diversity analysis

Diversity analysis is where abundance evidence becomes ecological interpretation. QIIME2-oriented outputs allow readers to ask:

how rich each sample is,
how evenly abundance is distributed,
how far communities are from one another,
whether cohort or metadata groupings explain that distance.

The important architectural point is that this stage depends on cleaner upstream outputs and trustworthy metadata joins.

Stage 4, functional readout

The repository carries both a stable CLI functional annotation command and broader script-level functional ambitions. From a reviewer perspective, that means MICOS-2024 spans two layers:

a narrower stable interface,
a wider experimentation and expansion surface.

The docs keep those layers distinct to avoid overstating maturity.

Stage 5, summarization

A platform becomes usable when its outputs become navigable. Summarization is therefore not cosmetic. It is the layer where technical execution turns into a deliverable another scientist or reviewer can inspect.

Reading strategy

If you are new to the repository:

read this page,
continue to Data Products and Interpretation,
then switch to System Overview in the Architecture section.

That order mirrors how the repository itself should be understood.

Pipeline Foundations ​

Why this section exists ​

Pipeline stages at a glance ​

Stage 1, quality control ​

Stage 2, taxonomic profiling ​

Interpretation note ​

Stage 3, diversity analysis ​

Stage 4, functional readout ​

Stage 5, summarization ​

Reading strategy ​