Learning Path

Overview

This academy path provides a four-level progressive curriculum designed for readers of diverse backgrounds, ranging from "understanding the knowledge base structure" to "mastering frontier algorithm reproduction and community contribution." Each level includes explicit prerequisite knowledge, expected deliverables, assessment criteria, and 3–5 required classic papers. We recommend selecting the appropriate entry level based on your current technical stack and research objectives.

Goal

Establish an intuitive understanding of the bioinformatics algorithm landscape within 2 hours, and be able to skillfully use the knowledge base's category taxonomy, tag network, and search functions to locate any algorithm.

Core Content

Category Taxonomy: Understand the logic behind the 16 top-level categories (sequence alignment, assembly, variant calling, protein structure prediction, etc.) and their subcategory hierarchies.
Tag System: Master the naming conventions and cross-category association capabilities of 392 semantic tags; learn to discover alternative algorithms through tag intersection search.
Rapid Retrieval: Skillfully use table sorting and filtering on the algorithm index page; understand the meaning of ComplexityBadge and difficulty ratings.

Prerequisites

Basic molecular biology concepts (DNA, RNA, protein sequences)
Basic algorithmic complexity notation (Big-O)
Markdown basic syntax

Expected Deliverables

Independently locate the category归属, time complexity, and primary purpose of any 3 unfamiliar algorithms
Describe the algorithmic association between at least 2 categories

Assessment Criteria

Assessment Item	Pass Criteria
Category Location	Given an algorithm name, find its category page within 30 seconds
Tag Retrieval	Given 2 tags, correctly list intersection algorithms
Complexity Recognition	Correctly explain the typical meaning of O(mn) and O(n log n) in bioinformatics

Level 2: Algorithm Evaluation

Goal

Possess the ability to evaluate algorithms from multiple dimensions (purpose, complexity, difficulty, implementation language, ecosystem maturity) and make selection decisions.

Core Content

Purpose Evaluation: Distinguish the core application scenarios of algorithms (e.g., local vs global alignment, de novo vs reference-guided assembly).
Complexity Analysis: Deeply understand the engineering implications of time and space complexity on real-world big data (GB–TB scale genomic data).
Difficulty Grading: Understand the conceptual depth and implementation threshold implied by the three-level difficulty rating (beginner / intermediate / advanced).
Language Assessment: Match project requirements based on language characteristics such as C/C++ (high performance), Python (rapid prototyping), and Rust (memory safety).
Cross-Search: Use the tag network for comparative analysis of similar solutions (e.g., Smith-Waterman vs Needleman-Wunsch vs BLAST).

Prerequisites

Basic algorithm design paradigms: dynamic programming, greedy algorithms, graph algorithms
Basic Linux command-line operations and common bioinformatics file formats (FASTA, FASTQ, SAM/BAM, VCF)

Expected Deliverables

For a specific bioinformatics task (e.g., "single-cell RNA-seq clustering"), produce a comparative report covering at least 3 candidate algorithms
The report must include a complexity comparison table, implementation language analysis, and toolchain recommendations

Assessment Criteria

Assessment Item	Pass Criteria
Complexity Explanation	Correctly explain the time/space complexity of at least 5 algorithms and assess their feasibility on 100 GB datasets
Selection Report	Produce a structured comparison report covering complexity, language, license, and community activity
Tag Cross-Search	Use tag combinations to discover at least 1 non-obvious alternative algorithm

Level 3: Architecture and Engineering

Goal

Gain a deep understanding of this knowledge base's data sources, generator, VitePress publishing pipeline, and CLI workflow; possess the ability to independently extend the knowledge base structure and maintain data consistency.

Core Content

Data Source Layer: Master the schema definitions, field constraints, and version evolution strategies of categories.yaml and algorithms/*.yaml.
Generator Layer: Understand the functional division in generate_docs.py (whitepaper generation, algorithm page generation, index page generation) and template rendering logic.
VitePress Pipeline: Familiarity with VitePress static site generation mechanisms, theme configuration, navigation structure, and Markdown extension syntax.
CLI Workflow: Proficiency in daily maintenance using subcommands such as validate, stats, search, info, compare, export, and vitepress.
CI/CD Integration: Understanding the complete automation flow of validation → generation → build → deployment in GitHub Actions.

Prerequisites

Python 3.10+ programming and type hints
YAML syntax and data modeling basics
Front-end build toolchain basics (Node.js, npm, VitePress concepts)
Git workflow and GitHub Actions basics

Expected Deliverables

Successfully submit a new algorithm entry PR to this knowledge base, including complete YAML data, passing all validations, and automatically generating the corresponding VitePress pages
Write a technical document fragment on "How to add a new category to the knowledge base"

Assessment Criteria

Assessment Item	Pass Criteria
YAML Authoring	Independently write algorithm YAML compliant with schema; `validate` reports zero errors
Generation Pipeline	Explain the complete data flow from YAML to VitePress page (>=6 nodes)
CLI Proficiency	Complete search / info / compare combined queries without consulting documentation

Level 4: Expert Research

Goal

Stand at the frontier of the field, understand the core innovations of the latest algorithms (2022–2025), and possess the ability to reproduce papers, perform benchmarking, and contribute to the community.

Core Content

Frontier Tracking: Continuously track the latest advances in AlphaFold series, ESM series, single-cell foundation models, graph genomics, and other frontier directions.
Paper Reproduction: Locate original papers through DOI links in the knowledge base, understand algorithm pseudocode and key formulas, and complete a minimal runnable reproduction in an open-source framework.
Performance Benchmarking: Design fair comparative experiments (unified dataset, unified hardware environment, unified evaluation metrics) and produce publishable benchmark reports.
Community Contribution: Improve existing algorithm entries by submitting PRs (supplement missing fields, correct complexity, update implementation links), or write original technical whitepaper supplement pages.

Prerequisites

In-depth research experience in at least 1 bioinformatics subfield (e.g., protein structure prediction or single-cell analysis)
Top-tier conference paper reading and reproduction experience (ISMB, RECOMB, NeurIPS, ICML, etc.)
High-performance computing (HPC) or GPU acceleration programming basics (CUDA / PyTorch)

Expected Deliverables

Complete the code reproduction of at least 1 frontier algorithm paper, and submit an improvement PR under the corresponding entry in this knowledge base
Produce 1 community-facing benchmark comparison report that is adopted or cited by project maintainers

Assessment Criteria

Assessment Item	Pass Criteria
Paper Reproduction	Reproduce core metrics on standard datasets with <5% error
Benchmark Design	Experimental design covers at least 3 similar algorithms, including time/memory/accuracy dimensions
Community Contribution	Submitted PR is merged, and includes test cases or documentation improvements

Summary and Advanced Recommendations

Level	Target Audience	Estimated Study Time	Key Deliverable
Level 1	Beginners / Cross-domain Developers	2–4 hours	Landscape awareness + independent retrieval
Level 2	Mid-level Developers / Graduate Students	1–2 weeks	Selection report + complexity analysis
Level 3	Senior Developers / Maintainers	2–4 weeks	Data maintenance capability + CI/CD understanding
Level 4	Researchers / Algorithm Engineers	Continuous	Paper reproduction + community contribution

Regardless of your current level, we recommend starting from the Algorithm Index page of this knowledge base, building intuition through actual retrieval and comparison. The academy path is not a linear shackle, but a reference map for flexible jumps according to need.

Learning Path

Overview

Level 1: Navigation Literacy

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Level 2: Algorithm Evaluation

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Level 3: Architecture and Engineering

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Level 4: Expert Research

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Summary and Advanced Recommendations

Learning Path ​

Overview ​

Level 1: Navigation Literacy ​

Goal ​

Core Content ​

Prerequisites ​

Expected Deliverables ​

Assessment Criteria ​

Recommended Reading ​

Level 2: Algorithm Evaluation ​

Goal ​

Core Content ​

Prerequisites ​

Expected Deliverables ​

Assessment Criteria ​

Recommended Reading ​

Level 3: Architecture and Engineering ​

Goal ​

Core Content ​

Prerequisites ​

Expected Deliverables ​

Assessment Criteria ​

Recommended Reading ​

Level 4: Expert Research ​

Goal ​

Core Content ​

Prerequisites ​

Expected Deliverables ​

Assessment Criteria ​

Recommended Reading ​

Summary and Advanced Recommendations ​

Learning Path

Overview

Level 1: Navigation Literacy

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Level 2: Algorithm Evaluation

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Level 3: Architecture and Engineering

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Level 4: Expert Research

Goal

Core Content

Prerequisites

Expected Deliverables

Assessment Criteria

Recommended Reading

Summary and Advanced Recommendations