Skip to content

ProtTrans

基于多种 Transformer 架构(T5、BERT、XLNet 等)的蛋白质语言模型集合, 在 UniRef 和 BFD 等大规模数据库上预训练。该模型可生成高质量的蛋白质 序列表征,支持迁移学习用于各种下游任务。

PropertyValue
Purpose多架构蛋白质语言模型预训练表征
Time ComplexityO(n^2 * d)
Space ComplexityO(n^2)
Year2021
DifficultyIntermediate
LanguagesPython
CategoryProtein Language Model

Complexity Analysis

  • Time Complexity: O(n^2 * d)
  • Space Complexity: O(n^2)

Performance Insight: The time complexity of this algorithm is polynomial. High space complexity; consider Hirschberg-style space-optimized variants for very long sequences.

Note: Complexity analysis is based on theoretical models. Actual runtime is affected by input scale, hardware, and implementation optimizations. Benchmark for your specific workload.

Literature & Implementation

ESM-2 · Ankh · ProtGPT2

Tags

language-model transfer-learning representation-learning t5

Released under the MIT License.