Skip to content

Open-source metagenomics whitepaper

MICOS-2024

A documentation experience rebuilt as a technical brief: not a checklist of commands, but a guided map of how this repository turns raw sequencing input into reproducible, reviewable microbiome analysis outputs.

The site is optimized for demanding readers: interviewers, maintainers, and senior open-source engineers evaluating whether the project is coherent beyond its README.

  • Python package + Click CLI
  • WDL workflow assets
  • Container-aware execution
  • Bilingual technical whitepaper

What this site emphasizes

Repository truth, execution boundaries, architecture intent, and research lineage.

What is deliberately surfaced

The current stable CLI core, the broader workflow assets, and the gap between ambition and implemented runtime surface.

How to read it

Start in Academy, then move into Architecture, then drop into Guides or Research depending on whether you are operating or auditing.

Primary runtime

CLI-first orchestration

The stable surface is a Click CLI backed by Python modules, with shell wrappers kept as compatibility layers.

Workflow posture

WDL + containers

The repository carries step-level WDL assets, Singularity definitions, and a Docker Compose example for reproducible environments.

Reader outcome

Interview-grade clarity

This site is organized to let a reviewer understand scientific scope, software boundaries, and operational trade-offs quickly.

Pipeline narrative

The platform is easier to trust when the execution chain is visible.

MICOS-2024 is best understood as an analysis story with four checkpoints: input hygiene, taxonomic evidence, ecological interpretation, and report-facing outputs.

MICOS pipeline overviewFrom raw FASTQ to interpretable microbiome outputsSTAGE 01QCFastQC, KneadData,trimming, host depletion.Outputclean readsSTAGE 02TaxonomyKraken2, kraken-biom,Krona summaries.Outputreports, biom, kronaSTAGE 03DiversityQIIME2 metrics, alphaand beta views.Outputordination and tablesSTAGE 04ReportFunctional outputs andfinal summary views.
Vue SVG component for zero-delay theme switching, single-source maintenance.
Reader signal
The pipeline is presented as a sequence of evidence transformations, not just a list of tools.
Engineering signal
The repository carries both stable CLI entrypoints and broader workflow assets. This site distinguishes them instead of flattening everything into one surface.
Operational signal
Several advanced analyses live under scripts/ as specialist tools. They matter, but they are not described as part of the same stability contract as the CLI core.

Stage 01

Quality control and host depletion

FastQC and KneadData frame the entry gate, producing cleaner reads before downstream interpretation begins.

  • Raw FASTQ intake
  • Filtering and trimming
  • Host read removal

Stage 02

Taxonomic profiling

Kraken2, kraken-biom, and Krona turn cleaned reads into ranked taxonomic evidence and navigable summaries.

  • Kraken2 reports
  • BIOM conversion
  • Interactive Krona views

Stage 03

Diversity interpretation

QIIME2 and associated metadata joins convert abundance tables into ecological signals and cohort comparisons.

  • Alpha diversity
  • Beta diversity
  • Ordination-ready outputs

Stage 04

Functional readout and reporting

Functional profiling and summarization consolidate pathways, annotations, and report-facing deliverables.

  • Functional tables
  • Auxiliary scripts
  • HTML-oriented summaries

System anatomy

Repository layers, not just pages

A reviewer should be able to map the docs to the codebase: entry commands, Python modules, workflow definitions, configuration templates, containers, and validation surfaces.

Runtime topologyThe repository blends Python orchestration, shell wrappers, WDL stages, and container assets.ENTRY LAYERmicos CLIClick-based commands,validation and dry-run.ORCHESTRATIONPython modulesFull pipeline, quality control,taxonomy, diversity, reporting.WORKFLOW ASSETSsteps/ + containers/WDL stages, Singularity defs,Docker Compose services.CONFIG SURFACEconfig/*.templateProject, database, and samplemetadata templates.POWER USER SURFACEscripts/Thin wrappers plus experimentalanalyses outside the CLI core.
The project operates across a CLI layer, Python orchestration modules, workflow assets, and power-user scripts.

Stable core

micos/cli.py exposes full-run, validate-config, and module-level commands for quality control, taxonomy, diversity, functional annotation, and summarization.

Workflow assets

steps/, deploy/, and containers/ extend the platform into reproducible environments and step-level orchestration patterns.

Research posture

The project benefits from established microbiome tooling, then attempts to wrap it into a more coherent end-to-end suite. That lineage is explicit in the Research section.

Execution chain

A quick mental model

The runtime stack spans more than one abstraction level. This diagram makes the split explicit so contributors and reviewers can reason about where each concern lives.

Research grounding

The project inherits credibility from the ecosystem it assembles.

MICOS-2024 is not a blank-slate invention. It is an integration effort sitting on top of well-cited microbiome tooling. That is a strength, and this site treats it as one.

  1. Wood DE, Lu J, Langmead B

    Improved metagenomic analysis with Kraken 2

    Genome Biology · 2019

    Open source / paper link
  2. Bolyen E, Rideout JR, Dillon MR, et al.

    Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

    Nature Biotechnology · 2019

    Open source / paper link
  3. McMurdie PJ, Holmes S

    phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data

    PLoS ONE · 2013

    Open source / paper link

MICOS-2024 whitepaper for reproducible metagenomics engineering.