Skip to content

ProtBERT

基于 BERT 架构的蛋白质语言模型,在 UniRef100 上预训练,学习氨基酸序列 的上下文相关表征。该模型可用于蛋白质家族分类、亚细胞定位预测和 翻译后修饰位点预测等任务。

PropertyValue
Purpose基于 BERT 的蛋白质序列表征学习
Time ComplexityO(n^2 * d)
Space ComplexityO(n^2)
Year2020
DifficultyIntermediate
LanguagesPython
CategoryProtein Language Model

Complexity Analysis

  • Time Complexity: O(n^2 * d)
  • Space Complexity: O(n^2)

Performance Insight: The time complexity of this algorithm is polynomial. High space complexity; consider Hirschberg-style space-optimized variants for very long sequences.

Note: Complexity analysis is based on theoretical models. Actual runtime is affected by input scale, hardware, and implementation optimizations. Benchmark for your specific workload.

Literature & Implementation

ESM-2 · ProtTrans · UniRep

Tags

language-model bert sequence-embedding pretrained

Released under the MIT License.