参考文献与相关项目

引用格式规范

本知识库所有算法条目中的参考文献遵循 GB-T 7714-2015《信息与文献参考文献著录规则》 标准格式。著录要素包括：主要责任者、题名、文献类型标识、出版项、获取与访问路径。

格式示例

JONES N C, PEVZNER P A. An Introduction to Bioinformatics Algorithms[M]. Cambridge: MIT Press, 2004.

ALTSCHUL S F, MADDEN T L, SCHAFFER A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402. DOI:10.1093/nar/25.17.3389.

按领域分类的经典论文

序列比对（Sequence Alignment）

NEEDLEMAN S B, WUNSCH C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48(3): 443-453. DOI:10.1016/0022-2836(70)90057-4.
SMITH T F, WATERMAN M S. Identification of common molecular subsequences[J]. Journal of Molecular Biology, 1981, 147(1): 195-197. DOI:10.1016/0022-2836(81)90087-5.
ALTSCHUL S F, GISH W, MILLER W, et al. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215(3): 403-410. DOI:10.1016/S0022-2836(05)80360-2.
LI H, DURBIN R. Fast and accurate short read alignment with Burrows-Wheeler transform[J]. Bioinformatics, 2009, 25(14): 1754-1760. DOI:10.1093/bioinformatics/btp324.
LI H. Minimap2: pairwise alignment for nucleotide sequences[J]. Bioinformatics, 2018, 34(18): 3094-3100. DOI:10.1093/bioinformatics/bty191.

序列组装（Sequence Assembly）

PEVZNER P A, TANG H, WATERMAN M S. An Eulerian path approach to DNA fragment assembly[J]. Proceedings of the National Academy of Sciences, 2001, 98(17): 9748-9753. DOI:10.1073/pnas.171285098.
ZERBINO D R, BIRNEY E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs[J]. Genome Research, 2008, 18(5): 821-829. DOI:10.1101/gr.074492.107.
BANKEVICH A, NURK S, ANTIPOV D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing[J]. Journal of Computational Biology, 2012, 19(5): 455-477. DOI:10.1089/cmb.2012.0021.
KOREN S, WALENZ B P, BERLIN K, et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation[J]. Genome Research, 2017, 27(5): 722-736. DOI:10.1101/gr.215087.116.
KOLMOGOROV M, YUAN J, LIN Y, et al. Assembly of long, error-prone reads using repeat graphs[J]. Nature Biotechnology, 2019, 37(5): 540-546. DOI:10.1038/s41587-019-0072-8.

变异检测（Variant Calling）

MCKENNA A, HANNA M, BANKS E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data[J]. Genome Research, 2010, 20(9): 1297-1303. DOI:10.1101/gr.107524.110.
DEPRISTO M A, BANKS E, POPLIN R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data[J]. Nature Genetics, 2011, 43(5): 491-498. DOI:10.1038/ng.806.
POPLIN R, CHANG P C, ALEXANDER D, et al. A universal SNP and small-indel variant caller using deep neural networks[J]. Nature Biotechnology, 2018, 36(10): 983-987. DOI:10.1038/nbt.4235.
KIM S, SCHEFFLER K, HALPERN A L, et al. Strelka2: fast and accurate calling of germline and somatic variants[J]. Nature Methods, 2018, 15(8): 591-594. DOI:10.1038/s41592-018-0051-x.
CIBULSKIS K, LAWRENCE M S, CARTER S L, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples[J]. Nature Biotechnology, 2013, 31(3): 213-219. DOI:10.1038/nbt.2514.

蛋白质结构预测（Protein Structure Prediction）

JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. DOI:10.1038/s41586-021-03819-2.
BAEK M, DIMAIO F, ANISHCHENKO I, et al. Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876. DOI:10.1126/science.abj8754.
LIN Z, AKIN H, RAO R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130. DOI:10.1126/science.ade2574.
WU R, DING F, WANG R, et al. High-resolution de novo structure prediction from primary sequence[J]. Nature Methods, 2024, 21(4): 682-690. DOI:10.1038/s41592-024-02272-z.
SENIOR A W, EVANS R, JUMPER J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710. DOI:10.1038/s41586-019-1923-7.

单细胞分析（Single-Cell Analysis）

SATIJA R, FARRELL J A, GENNERT D, et al. Spatial reconstruction of single-cell gene expression data[J]. Nature Biotechnology, 2015, 33(5): 495-502. DOI:10.1038/nbt.3192.
WOLF F A, ANGERER P, THEIS F J. SCANPY: large-scale single-cell gene expression data analysis[J]. Genome Biology, 2018, 19(1): 15. DOI:10.1186/s13059-017-1382-0.
TRAPNELL C, CACCHIARELLI D, GRIMSBY J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells[J]. Nature Biotechnology, 2014, 32(4): 381-386. DOI:10.1038/nbt.2859.
LOPEZ R, REGIER J, COLE M B, et al. Deep generative modeling for single-cell transcriptomics[J]. Nature Methods, 2018, 15(12): 1053-1058. DOI:10.1038/s41592-018-0229-2.
ZHENG G X Y, TERRY J M, BELGRADER P, et al. Massively parallel digital transcriptional profiling of single cells[J]. Nature Communications, 2017, 8: 14049. DOI:10.1038/ncomms14049.

宏基因组学（Metagenomics）

WOOD D E, SALZBERG S L. Kraken: ultrafast metagenomic sequence classification using exact alignments[J]. Genome Biology, 2014, 15(3): R46. DOI:10.1186/gb-2014-15-3-r46.
QIN J, LI R, RAES J, et al. A human gut microbial gene catalogue established by metagenomic sequencing[J]. Nature, 2010, 464(7285): 59-65. DOI:10.1038/nature08821.
TRUONG D T, FRANZOSA E A, TICKLE T L, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling[J]. Nature Methods, 2015, 12(10): 902-903. DOI:10.1038/nmeth.3589.
ABUBUCKER S, SEGATA N, GOLL J, et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome[J]. PLoS Computational Biology, 2012, 8(6): e1002358. DOI:10.1371/journal.pcbi.1002358.
SUNG J, ZHENG L, DUVVURI V, et al. Metabolic modeling with objective quantification of the human gut microbiome in inflammatory bowel disease[J]. Nature Microbiology, 2022, 7(7): 1126-1136. DOI:10.1038/s41564-022-01147-6.

必读综述

以下综述为各领域的"地图级"文献，建议作为进入该子领域的首要阅读材料：

序列比对与序列搜索：ALTSCHUL S F, et al. Basic local alignment search tool[J]. J. Mol. Biol., 1990.（BLAST 奠基之作，理解启发式搜索的必读文献）
蛋白质结构预测：JUMPER J, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021.（AlphaFold，结构生物学分水岭）
单细胞技术：PAPALEXI E, SATIJA R. High-dimensional genomic data analysis: methods and challenges[J]. Nature Methods, 2022.（单细胞高维数据分析的方法论综述）
宏基因组学：QUINCE C, et al. Shotgun metagenomics, from sampling to analysis[J]. Nature Biotechnology, 2017.（从湿实验到干实验的完整方法论）
图基因组学：PATEN B, et al. Genome graphs and the evolution of genome inference[J]. Genome Research, 2017.（图基因组学的系统性综述）

项目名称	核心功能	Stars	主要语言	许可	与本项目差异
Awesome-Bioinformatics	算法与工具列表	2.8k+	Markdown	CC0	纯列表，无结构化元数据与生成链路
bioinformatics-workflows	分析流程模板	N/A	Snakemake / Nextflow	混合	聚焦流程而非算法本体
biostars-handbook	教程与指南	N/A	—	商业	面向初学者的操作手册，非架构级知识库
OBF / BioPython	工具库与社区	N/A	Python	MIT/BSD	提供算法实现，非算法元数据索引
本项目	结构化算法知识库 + 白皮书	—	Python	MIT	强调数据驱动、生成链路、质量验证与双语支持

工程启发

在构建与维护本知识库的过程中，我们总结出以下三条对大型技术知识系统具有普适性的工程原则：

1. 数据单一真相源

当知识条目超过 100 时，"分散在多处的手写文档"必然出现不一致。将数据集中为结构化 YAML，所有展示层均从同一来源生成，是维持一致性的唯一可持续方案。

2. 生成驱动文档

人类编辑 Markdown 的效率在条目数达到 50 后急剧下降，且格式漂移不可避免。用代码生成文档，将人类的创造力聚焦于"数据内容"而非"排版格式"，可将维护成本降低一个数量级。

3. 验证优先于部署

在 CI/CD 中，任何未通过验证的数据变更必须阻断构建。"先验证、后生成、再部署"的顺序不可颠倒，否则死链、格式错误与数据不一致将污染生产环境。

参考文献与相关项目 ​

引用格式规范 ​

格式示例 ​

按领域分类的经典论文 ​

序列比对（Sequence Alignment） ​

序列组装（Sequence Assembly） ​

变异检测（Variant Calling） ​

蛋白质结构预测（Protein Structure Prediction） ​

单细胞分析（Single-Cell Analysis） ​

宏基因组学（Metagenomics） ​

必读综述 ​

相关开源项目探究 ​

工程启发 ​

1. 数据单一真相源 ​

2. 生成驱动文档 ​

3. 验证优先于部署 ​