Skip to content

MANGO

基于上下文建模的参考基因组无关序列压缩方法,通过学习序列局部统计特征实现基因组数据的高效压缩。 该方法无需参考基因组即可达到优秀的压缩比,适用于新物种或参考基因组不可用的场景。

PropertyValue
Purpose无需参考基因组的基因组序列压缩
Time ComplexityO(n)
Space ComplexityO(n)
Year2018
DifficultyAdvanced
LanguagesC++
CategoryData Compression

Complexity Analysis

  • Time Complexity: O(n)
  • Space Complexity: O(n)

Performance Insight: The time complexity of this algorithm is linear (O(n)), scales linearly to TB-scale data and is suitable for streaming pipelines. Linear space can often be reduced by constant factors via sliding-window techniques.

Note: Complexity analysis is based on theoretical models. Actual runtime is affected by input scale, hardware, and implementation optimizations. Benchmark for your specific workload.

Genozip · CRAM · gzip

Tags

reference-free genome-compression context-modeling

Released under the MIT License.