enFAQ

Frequently Asked Questions (FAQ)

Quick answers to common questions about MICOS-2024.


General Questions

What is MICOS-2024?

MICOS-2024 (Metagenomic Intelligence and Comprehensive Omics Suite) is an integrated platform for end-to-end metagenomic analysis. It combines quality control, taxonomic profiling, functional annotation, and diversity analysis into a unified workflow.

Who should use MICOS-2024?

  • Biologists: With minimal bioinformatics experience
  • Bioinformaticians: Who need standardized, reproducible workflows
  • Clinical researchers: Analyzing microbiome samples
  • Ecologists: Studying environmental microbiomes

How does MICOS-2024 compare to other tools?

FeatureMICOS-2024Qiime2MG-RASTmothur
End-to-end workflowPartialPartial
WDL workflows
Docker support
CLI interfaceWeb
Pipeline flexibilityHighMediumLowMedium

Installation Questions

Docker for:

  • Maximum reproducibility
  • Easy deployment on servers
  • Production environments

Conda for:

  • Development work
  • Custom modifications
  • Systems without Docker

How much disk space do I need?

ComponentMinimumRecommended
Software installation5 GB10 GB
Kraken2 database70 GB150 GB (PlusPF)
HUMAnN database30 GB50 GB
Analysis output100 GB500 GB
Total~200 GB~700 GB

Can I install without administrator privileges?

Yes! Both installation methods work without sudo:

  • Docker: Requires user added to docker group
  • Conda: Installs entirely in user directory

Do I need a GPU?

No. MICOS-2024 is CPU-based. GPU acceleration is not currently supported.


Analysis Questions

How long does analysis take?

Approximate times per sample (10M paired-end reads):

StepTimeNotes
Quality Control10-30 minDepends on host genome size
Taxonomic Profiling5-15 minWith Kraken2 standard
Functional Annotation1-4 hoursDepends on database
Diversity Analysis10-30 minPer calculation
Total2-6 hoursHighly parallelizable

Speed tips:

  • Use SSD for temporary files
  • Increase --threads parameter
  • Use MiniKraken for testing

What sequencing depth do I need?

Study TypeMinimumRecommended
Pilot/testing1M reads/sample2M reads/sample
General profiling5M reads/sample10M reads/sample
Rare species detection20M reads/sample50M reads/sample
Functional analysis10M reads/sample30M reads/sample

Can I analyze single-end data?

Yes, MICOS-2024 supports both paired-end and single-end data. However:

  • Paired-end recommended for better taxonomic resolution
  • Single-end sufficient for functional profiling
  • Specify format in configuration or let auto-detection handle it

What about amplicon data (16S)?

While MICOS-2024 is optimized for shotgun metagenomics, it can process 16S data:

# Configuration for amplicon data
quality_control:
  kneaddata:
    bypass_trf: true    # Skip tandem repeat filter (not needed)
 
taxonomic_profiling:
  kraken2:
    confidence: 0.05    # Lower for conserved regions

For dedicated 16S analysis, consider DADA2 or QIIME2’s q2-dada2 plugin.


Database Questions

Which Kraken2 database should I use?

DatabaseSizeBest For
MiniKraken2 (8GB)SmallTesting, limited RAM
Standard (70GB)MediumGeneral purpose
PlusPF (100GB)LargeIncluding fungi/protozoa
PlusPFP (150GB)X-LargeMaximum coverage
CustomVariableSpecific organisms

Can I use custom reference databases?

Yes:

# Build custom Kraken2 database
kraken2-build --download-taxonomy --db custom_db
kraken2-build --add-to-library my_genomes/*.fa --db custom_db
kraken2-build --build --threads 16 --db custom_db
 
# Use in MICOS
kraken2-db: "/path/to/custom_db"

Do I need to build databases myself?

No. Pre-built databases are available:

  • Kraken2: AWS Index
  • HUMAnN: humann_databases --download
  • KneadData: kneaddata_database --download

Results Questions

Why are most of my reads unclassified?

Possible reasons:

ReasonCheckSolution
Database too smallUse Standard not MiniKrakenDownload larger DB
Confidence too highCheck configLower to 0.05-0.1
Novel organismsLiterature searchBuild custom DB
Data qualityRun FastQCImprove QC parameters
Environmental sampleSample typeUse PlusPF database

Which diversity metric should I use?

Alpha Diversity:

  • Shannon: General diversity (richness + evenness)
  • Chao1: Richness estimation
  • Observed: Simple species count

Beta Diversity:

  • Bray-Curtis: Standard for abundance data
  • UniFrac: When phylogeny matters
  • Aitchison: For compositional data with zeros

How do I interpret statistical results?

Testp < 0.05 meansVisualization
PERMANOVAGroups differ significantlyPCoA plot
Kruskal-WallisAlpha diversity differsBox plot
LEfSeSpecific taxa differentBar plot

What’s a “good” diversity value?

Human gut typical ranges:

MetricLowNormalHigh
Shannon<2.52.5-4.0>4.0
Species richness<5050-150>150

Note: Context-dependent! Compare within same sample type.


Troubleshooting Questions

Analysis failed - where do I start?

  1. Check logs: tail -n 50 logs/analysis.log
  2. Verify inputs: Input files exist and valid
  3. Check resources: free -h, df -h
  4. Test with subset: Run on 1-2 samples first

Can I resume a failed analysis?

Yes, if outputs from completed steps exist:

# MICOS will skip completed steps
python -m micos.cli full-run \
  --input-dir data/raw \
  --results-dir results \
  --threads 16
  # Steps already done will be skipped

For manual control, run individual modules:

python -m micos.cli run taxonomic-profiling ...
python -m micos.cli run functional-annotation ...

How do I update MICOS-2024?

# Update code
git pull origin main
 
# Docker - rebuild image
docker compose -f deploy/docker-compose.example.yml build
 
# Conda - update environment
conda activate micos-2024
pip install -e . --upgrade

Performance Questions

Can I run MICOS on a laptop?

For small datasets (test data, <5 samples):

  • Minimum: 16GB RAM, 4 cores
  • Recommended: 32GB RAM, 8 cores

For production analysis, use servers or cloud instances.

How do I optimize for speed?

ActionImprovementTrade-off
Use SSD for temp2-3x fasterNeed SSD storage
Increase threadsLinear to ~32More RAM needed
Use MiniKraken5-10x fasterLower sensitivity
Disable gap filling2x fasterLess pathway info

Can I run MICOS on a cluster?

Yes! Several approaches:

  1. WDL/Cromwell: Use provided workflows
  2. Snakemake: Convert to Snakemake workflow
  3. Manual parallelization: Split samples, merge results

Integration Questions

Can I use MICOS output with other tools?

Yes! Output formats are standard:

OutputFormatCompatible With
Feature tableBIOMQIIME2, phyloseq, vegan
TaxonomyTSVR, Python, Excel
PathwaysTSVGephi, Cytoscape, R

How do I import to R?

library(phyloseq)
 
# Import BIOM
physeq <- import_biom("feature-table.biom")
 
# Import metadata
sample_data <- read.csv("metadata.tsv", sep="\t", row.names=1)
sample_data(physeq) <- sample_data(sample_data)

How do I import to Python?

import pandas as pd
import biom
 
# Load BIOM table
with biom.load_table("feature-table.biom") as table:
    df = table.to_dataframe()
 
# Load taxonomy
taxonomy = pd.read_csv("taxonomy.tsv", sep="\t", index_col=0)

Citation and License

How do I cite MICOS-2024?

@software{micos2024,
  title = {MICOS-2024: Metagenomic Intelligence and Comprehensive Omics Suite},
  author = {MICOS-2024 Team},
  year = {2024},
  url = {https://github.com/BGI-MICOS/MICOS-2024},
  version = {1.1.0}
}

Also cite individual tools used (Kraken2, HUMAnN, QIIME2, etc.)

What license is MICOS-2024 under?

MIT License - free for academic and commercial use.


Still Have Questions?