参考文献与相关项目
引用格式规范
本知识库所有算法条目中的参考文献遵循 GB-T 7714-2015《信息与文献 参考文献著录规则》 标准格式。著录要素包括:主要责任者、题名、文献类型标识、出版项、获取与访问路径。
格式示例
JONES N C, PEVZNER P A. An Introduction to Bioinformatics Algorithms[M]. Cambridge: MIT Press, 2004.
ALTSCHUL S F, MADDEN T L, SCHAFFER A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402. DOI:10.1093/nar/25.17.3389.
按领域分类的经典论文
序列比对(Sequence Alignment)
- NEEDLEMAN S B, WUNSCH C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48(3): 443-453. DOI:10.1016/0022-2836(70)90057-4.
- SMITH T F, WATERMAN M S. Identification of common molecular subsequences[J]. Journal of Molecular Biology, 1981, 147(1): 195-197. DOI:10.1016/0022-2836(81)90087-5.
- ALTSCHUL S F, GISH W, MILLER W, et al. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215(3): 403-410. DOI:10.1016/S0022-2836(05)80360-2.
- LI H, DURBIN R. Fast and accurate short read alignment with Burrows-Wheeler transform[J]. Bioinformatics, 2009, 25(14): 1754-1760. DOI:10.1093/bioinformatics/btp324.
- LI H. Minimap2: pairwise alignment for nucleotide sequences[J]. Bioinformatics, 2018, 34(18): 3094-3100. DOI:10.1093/bioinformatics/bty191.
序列组装(Sequence Assembly)
- PEVZNER P A, TANG H, WATERMAN M S. An Eulerian path approach to DNA fragment assembly[J]. Proceedings of the National Academy of Sciences, 2001, 98(17): 9748-9753. DOI:10.1073/pnas.171285098.
- ZERBINO D R, BIRNEY E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs[J]. Genome Research, 2008, 18(5): 821-829. DOI:10.1101/gr.074492.107.
- BANKEVICH A, NURK S, ANTIPOV D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing[J]. Journal of Computational Biology, 2012, 19(5): 455-477. DOI:10.1089/cmb.2012.0021.
- KOREN S, WALENZ B P, BERLIN K, et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation[J]. Genome Research, 2017, 27(5): 722-736. DOI:10.1101/gr.215087.116.
- KOLMOGOROV M, YUAN J, LIN Y, et al. Assembly of long, error-prone reads using repeat graphs[J]. Nature Biotechnology, 2019, 37(5): 540-546. DOI:10.1038/s41587-019-0072-8.
变异检测(Variant Calling)
- MCKENNA A, HANNA M, BANKS E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data[J]. Genome Research, 2010, 20(9): 1297-1303. DOI:10.1101/gr.107524.110.
- DEPRISTO M A, BANKS E, POPLIN R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data[J]. Nature Genetics, 2011, 43(5): 491-498. DOI:10.1038/ng.806.
- POPLIN R, CHANG P C, ALEXANDER D, et al. A universal SNP and small-indel variant caller using deep neural networks[J]. Nature Biotechnology, 2018, 36(10): 983-987. DOI:10.1038/nbt.4235.
- KIM S, SCHEFFLER K, HALPERN A L, et al. Strelka2: fast and accurate calling of germline and somatic variants[J]. Nature Methods, 2018, 15(8): 591-594. DOI:10.1038/s41592-018-0051-x.
- CIBULSKIS K, LAWRENCE M S, CARTER S L, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples[J]. Nature Biotechnology, 2013, 31(3): 213-219. DOI:10.1038/nbt.2514.
蛋白质结构预测(Protein Structure Prediction)
- JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. DOI:10.1038/s41586-021-03819-2.
- BAEK M, DIMAIO F, ANISHCHENKO I, et al. Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876. DOI:10.1126/science.abj8754.
- LIN Z, AKIN H, RAO R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130. DOI:10.1126/science.ade2574.
- WU R, DING F, WANG R, et al. High-resolution de novo structure prediction from primary sequence[J]. Nature Methods, 2024, 21(4): 682-690. DOI:10.1038/s41592-024-02272-z.
- SENIOR A W, EVANS R, JUMPER J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710. DOI:10.1038/s41586-019-1923-7.
单细胞分析(Single-Cell Analysis)
- SATIJA R, FARRELL J A, GENNERT D, et al. Spatial reconstruction of single-cell gene expression data[J]. Nature Biotechnology, 2015, 33(5): 495-502. DOI:10.1038/nbt.3192.
- WOLF F A, ANGERER P, THEIS F J. SCANPY: large-scale single-cell gene expression data analysis[J]. Genome Biology, 2018, 19(1): 15. DOI:10.1186/s13059-017-1382-0.
- TRAPNELL C, CACCHIARELLI D, GRIMSBY J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells[J]. Nature Biotechnology, 2014, 32(4): 381-386. DOI:10.1038/nbt.2859.
- LOPEZ R, REGIER J, COLE M B, et al. Deep generative modeling for single-cell transcriptomics[J]. Nature Methods, 2018, 15(12): 1053-1058. DOI:10.1038/s41592-018-0229-2.
- ZHENG G X Y, TERRY J M, BELGRADER P, et al. Massively parallel digital transcriptional profiling of single cells[J]. Nature Communications, 2017, 8: 14049. DOI:10.1038/ncomms14049.
宏基因组学(Metagenomics)
- WOOD D E, SALZBERG S L. Kraken: ultrafast metagenomic sequence classification using exact alignments[J]. Genome Biology, 2014, 15(3): R46. DOI:10.1186/gb-2014-15-3-r46.
- QIN J, LI R, RAES J, et al. A human gut microbial gene catalogue established by metagenomic sequencing[J]. Nature, 2010, 464(7285): 59-65. DOI:10.1038/nature08821.
- TRUONG D T, FRANZOSA E A, TICKLE T L, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling[J]. Nature Methods, 2015, 12(10): 902-903. DOI:10.1038/nmeth.3589.
- ABUBUCKER S, SEGATA N, GOLL J, et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome[J]. PLoS Computational Biology, 2012, 8(6): e1002358. DOI:10.1371/journal.pcbi.1002358.
- SUNG J, ZHENG L, DUVVURI V, et al. Metabolic modeling with objective quantification of the human gut microbiome in inflammatory bowel disease[J]. Nature Microbiology, 2022, 7(7): 1126-1136. DOI:10.1038/s41564-022-01147-6.
必读综述
以下综述为各领域的"地图级"文献,建议作为进入该子领域的首要阅读材料:
- 序列比对与序列搜索:ALTSCHUL S F, et al. Basic local alignment search tool[J]. J. Mol. Biol., 1990.(BLAST 奠基之作,理解启发式搜索的必读文献)
- 蛋白质结构预测:JUMPER J, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021.(AlphaFold,结构生物学分水岭)
- 单细胞技术:PAPALEXI E, SATIJA R. High-dimensional genomic data analysis: methods and challenges[J]. Nature Methods, 2022.(单细胞高维数据分析的方法论综述)
- 宏基因组学:QUINCE C, et al. Shotgun metagenomics, from sampling to analysis[J]. Nature Biotechnology, 2017.(从湿实验到干实验的完整方法论)
- 图基因组学:PATEN B, et al. Genome graphs and the evolution of genome inference[J]. Genome Research, 2017.(图基因组学的系统性综述)
相关开源项目探究
以下表格对比了本知识库与同类开源项目在产品定位、功能范围与工程实践上的差异:
| 项目名称 | 核心功能 | Stars | 主要语言 | 许可 | 与本项目差异 |
|---|---|---|---|---|---|
| Awesome-Bioinformatics | 算法与工具列表 | 2.8k+ | Markdown | CC0 | 纯列表,无结构化元数据与生成链路 |
| bioinformatics-workflows | 分析流程模板 | N/A | Snakemake / Nextflow | 混合 | 聚焦流程而非算法本体 |
| biostars-handbook | 教程与指南 | N/A | — | 商业 | 面向初学者的操作手册,非架构级知识库 |
| OBF / BioPython | 工具库与社区 | N/A | Python | MIT/BSD | 提供算法实现,非算法元数据索引 |
| 本项目 | 结构化算法知识库 + 白皮书 | — | Python | MIT | 强调数据驱动、生成链路、质量验证与双语支持 |
工程启发
在构建与维护本知识库的过程中,我们总结出以下三条对大型技术知识系统具有普适性的工程原则:
1. 数据单一真相源
当知识条目超过 100 时,"分散在多处的手写文档"必然出现不一致。将数据集中为结构化 YAML,所有展示层均从同一来源生成,是维持一致性的唯一可持续方案。
2. 生成驱动文档
人类编辑 Markdown 的效率在条目数达到 50 后急剧下降,且格式漂移不可避免。用代码生成文档,将人类的创造力聚焦于"数据内容"而非"排版格式",可将维护成本降低一个数量级。
3. 验证优先于部署
在 CI/CD 中,任何未通过验证的数据变更必须阻断构建。"先验证、后生成、再部署"的顺序不可颠倒,否则死链、格式错误与数据不一致将污染生产环境。