Skip to content

参考文献与相关项目

引用格式规范

本知识库所有算法条目中的参考文献遵循 GB-T 7714-2015《信息与文献 参考文献著录规则》 标准格式。著录要素包括:主要责任者、题名、文献类型标识、出版项、获取与访问路径。

格式示例

JONES N C, PEVZNER P A. An Introduction to Bioinformatics Algorithms[M]. Cambridge: MIT Press, 2004.

ALTSCHUL S F, MADDEN T L, SCHAFFER A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research, 1997, 25(17): 3389-3402. DOI:10.1093/nar/25.17.3389.


按领域分类的经典论文

序列比对(Sequence Alignment)

  1. NEEDLEMAN S B, WUNSCH C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48(3): 443-453. DOI:10.1016/0022-2836(70)90057-4.
  2. SMITH T F, WATERMAN M S. Identification of common molecular subsequences[J]. Journal of Molecular Biology, 1981, 147(1): 195-197. DOI:10.1016/0022-2836(81)90087-5.
  3. ALTSCHUL S F, GISH W, MILLER W, et al. Basic local alignment search tool[J]. Journal of Molecular Biology, 1990, 215(3): 403-410. DOI:10.1016/S0022-2836(05)80360-2.
  4. LI H, DURBIN R. Fast and accurate short read alignment with Burrows-Wheeler transform[J]. Bioinformatics, 2009, 25(14): 1754-1760. DOI:10.1093/bioinformatics/btp324.
  5. LI H. Minimap2: pairwise alignment for nucleotide sequences[J]. Bioinformatics, 2018, 34(18): 3094-3100. DOI:10.1093/bioinformatics/bty191.

序列组装(Sequence Assembly)

  1. PEVZNER P A, TANG H, WATERMAN M S. An Eulerian path approach to DNA fragment assembly[J]. Proceedings of the National Academy of Sciences, 2001, 98(17): 9748-9753. DOI:10.1073/pnas.171285098.
  2. ZERBINO D R, BIRNEY E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs[J]. Genome Research, 2008, 18(5): 821-829. DOI:10.1101/gr.074492.107.
  3. BANKEVICH A, NURK S, ANTIPOV D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing[J]. Journal of Computational Biology, 2012, 19(5): 455-477. DOI:10.1089/cmb.2012.0021.
  4. KOREN S, WALENZ B P, BERLIN K, et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation[J]. Genome Research, 2017, 27(5): 722-736. DOI:10.1101/gr.215087.116.
  5. KOLMOGOROV M, YUAN J, LIN Y, et al. Assembly of long, error-prone reads using repeat graphs[J]. Nature Biotechnology, 2019, 37(5): 540-546. DOI:10.1038/s41587-019-0072-8.

变异检测(Variant Calling)

  1. MCKENNA A, HANNA M, BANKS E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data[J]. Genome Research, 2010, 20(9): 1297-1303. DOI:10.1101/gr.107524.110.
  2. DEPRISTO M A, BANKS E, POPLIN R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data[J]. Nature Genetics, 2011, 43(5): 491-498. DOI:10.1038/ng.806.
  3. POPLIN R, CHANG P C, ALEXANDER D, et al. A universal SNP and small-indel variant caller using deep neural networks[J]. Nature Biotechnology, 2018, 36(10): 983-987. DOI:10.1038/nbt.4235.
  4. KIM S, SCHEFFLER K, HALPERN A L, et al. Strelka2: fast and accurate calling of germline and somatic variants[J]. Nature Methods, 2018, 15(8): 591-594. DOI:10.1038/s41592-018-0051-x.
  5. CIBULSKIS K, LAWRENCE M S, CARTER S L, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples[J]. Nature Biotechnology, 2013, 31(3): 213-219. DOI:10.1038/nbt.2514.

蛋白质结构预测(Protein Structure Prediction)

  1. JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. DOI:10.1038/s41586-021-03819-2.
  2. BAEK M, DIMAIO F, ANISHCHENKO I, et al. Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876. DOI:10.1126/science.abj8754.
  3. LIN Z, AKIN H, RAO R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130. DOI:10.1126/science.ade2574.
  4. WU R, DING F, WANG R, et al. High-resolution de novo structure prediction from primary sequence[J]. Nature Methods, 2024, 21(4): 682-690. DOI:10.1038/s41592-024-02272-z.
  5. SENIOR A W, EVANS R, JUMPER J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710. DOI:10.1038/s41586-019-1923-7.

单细胞分析(Single-Cell Analysis)

  1. SATIJA R, FARRELL J A, GENNERT D, et al. Spatial reconstruction of single-cell gene expression data[J]. Nature Biotechnology, 2015, 33(5): 495-502. DOI:10.1038/nbt.3192.
  2. WOLF F A, ANGERER P, THEIS F J. SCANPY: large-scale single-cell gene expression data analysis[J]. Genome Biology, 2018, 19(1): 15. DOI:10.1186/s13059-017-1382-0.
  3. TRAPNELL C, CACCHIARELLI D, GRIMSBY J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells[J]. Nature Biotechnology, 2014, 32(4): 381-386. DOI:10.1038/nbt.2859.
  4. LOPEZ R, REGIER J, COLE M B, et al. Deep generative modeling for single-cell transcriptomics[J]. Nature Methods, 2018, 15(12): 1053-1058. DOI:10.1038/s41592-018-0229-2.
  5. ZHENG G X Y, TERRY J M, BELGRADER P, et al. Massively parallel digital transcriptional profiling of single cells[J]. Nature Communications, 2017, 8: 14049. DOI:10.1038/ncomms14049.

宏基因组学(Metagenomics)

  1. WOOD D E, SALZBERG S L. Kraken: ultrafast metagenomic sequence classification using exact alignments[J]. Genome Biology, 2014, 15(3): R46. DOI:10.1186/gb-2014-15-3-r46.
  2. QIN J, LI R, RAES J, et al. A human gut microbial gene catalogue established by metagenomic sequencing[J]. Nature, 2010, 464(7285): 59-65. DOI:10.1038/nature08821.
  3. TRUONG D T, FRANZOSA E A, TICKLE T L, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling[J]. Nature Methods, 2015, 12(10): 902-903. DOI:10.1038/nmeth.3589.
  4. ABUBUCKER S, SEGATA N, GOLL J, et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome[J]. PLoS Computational Biology, 2012, 8(6): e1002358. DOI:10.1371/journal.pcbi.1002358.
  5. SUNG J, ZHENG L, DUVVURI V, et al. Metabolic modeling with objective quantification of the human gut microbiome in inflammatory bowel disease[J]. Nature Microbiology, 2022, 7(7): 1126-1136. DOI:10.1038/s41564-022-01147-6.

必读综述

以下综述为各领域的"地图级"文献,建议作为进入该子领域的首要阅读材料:

  1. 序列比对与序列搜索:ALTSCHUL S F, et al. Basic local alignment search tool[J]. J. Mol. Biol., 1990.(BLAST 奠基之作,理解启发式搜索的必读文献)
  2. 蛋白质结构预测:JUMPER J, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021.(AlphaFold,结构生物学分水岭)
  3. 单细胞技术:PAPALEXI E, SATIJA R. High-dimensional genomic data analysis: methods and challenges[J]. Nature Methods, 2022.(单细胞高维数据分析的方法论综述)
  4. 宏基因组学:QUINCE C, et al. Shotgun metagenomics, from sampling to analysis[J]. Nature Biotechnology, 2017.(从湿实验到干实验的完整方法论)
  5. 图基因组学:PATEN B, et al. Genome graphs and the evolution of genome inference[J]. Genome Research, 2017.(图基因组学的系统性综述)

相关开源项目探究

以下表格对比了本知识库与同类开源项目在产品定位、功能范围与工程实践上的差异:

项目名称核心功能Stars主要语言许可与本项目差异
Awesome-Bioinformatics算法与工具列表2.8k+MarkdownCC0纯列表,无结构化元数据与生成链路
bioinformatics-workflows分析流程模板N/ASnakemake / Nextflow混合聚焦流程而非算法本体
biostars-handbook教程与指南N/A商业面向初学者的操作手册,非架构级知识库
OBF / BioPython工具库与社区N/APythonMIT/BSD提供算法实现,非算法元数据索引
本项目结构化算法知识库 + 白皮书PythonMIT强调数据驱动、生成链路、质量验证与双语支持

工程启发

在构建与维护本知识库的过程中,我们总结出以下三条对大型技术知识系统具有普适性的工程原则:

1. 数据单一真相源

当知识条目超过 100 时,"分散在多处的手写文档"必然出现不一致。将数据集中为结构化 YAML,所有展示层均从同一来源生成,是维持一致性的唯一可持续方案。

2. 生成驱动文档

人类编辑 Markdown 的效率在条目数达到 50 后急剧下降,且格式漂移不可避免。用代码生成文档,将人类的创造力聚焦于"数据内容"而非"排版格式",可将维护成本降低一个数量级。

3. 验证优先于部署

在 CI/CD 中,任何未通过验证的数据变更必须阻断构建。"先验证、后生成、再部署"的顺序不可颠倒,否则死链、格式错误与数据不一致将污染生产环境。

Released under the MIT License.