Skip to content

Evolution Notes

Overview

Since its inception in early 2024, this knowledge base has gone through three distinct evolutionary phases. Each phase corresponds to different core objectives, key actions, and deliverables. Understanding these historical decisions helps anticipate future technical debt and expansion directions.


Phase 1: List-Oriented Curation (2024 Q1–Q2)

Goal

Solve the "breadth of coverage" problem by establishing a multi-category algorithm directory covering major bioinformatics subfields. Core KPIs: >100 algorithm entries, >=10 top-level categories.

Key Actions

  1. Designed the initial YAML schema (v1), containing six required fields: id, name, description, purpose, time_complexity, category
  2. Established a hierarchy of 16 top-level categories and 30+ subcategories based on categories.yaml
  3. Manually curated the first 100+ algorithm entries, focusing on classic algorithms (Smith-Waterman, Needleman-Wunsch, BLAST, etc.)
  4. Built a minimally viable VitePress site supporting category browsing and algorithm detail pages

Deliverables

  • 195+ algorithm entries (exceeded target)
  • 16 top-level categories × 30+ subcategory hierarchy
  • Basic VitePress site (Chinese + English mirror)

Phase 2: Engineering Governance (2024 Q2–Q4)

Goal

Solve the "consistency and maintainability" problem by upgrading from "human-maintained Markdown" to a "data-driven generation system." Core KPIs: zero false positives in data validation, generator test coverage >85%, fully automated CI/CD.

Key Actions

  1. Introduced validate.py field rules and JSON Schema dual validation mechanisms
  2. Refactored generate_docs.py to programmatically generate algorithm pages, category pages, and index pages instead of handwritten Markdown
  3. Established a CLI command suite (validate, stats, search, info, compare, export, vitepress)
  4. Integrated ruff, mypy, and pytest code quality toolchain; raised test coverage to 89%
  5. Configured GitHub Actions workflow, achieving fully automated push→validate→generate→build→deploy pipeline
  6. Extended YAML schema to v3, adding space_complexity, year, tags, difficulty, language, references, and other fields

Deliverables

  • Data-driven VitePress documentation generator
  • Complete toolchain with 8 CLI subcommands
  • Python test suite with 89% coverage
  • Fully automated CI/CD release pipeline
  • Algorithm template file (templates/algorithm_template.yaml)

Phase 3: Whitepaper Positioning (2025 Q1–Present)

Goal

Solve the "professional persuasiveness" problem by elevating the knowledge base from an "algorithm list" to a "technical whitepaper and architecture academy." Core KPIs: average whitepaper page length >200 lines, academic citation coverage >85%, full Mermaid architecture diagram coverage.

Key Actions

  1. Rewrote all whitepaper generator functions (_generate_*) to output in-depth academic content (project overview, learning path, system architecture, data pipeline, quality assurance, references, evolution notes, CLI workflow)
  2. Unified academic citation formats: GB-T 7714 for Chinese, IEEE for English
  3. Introduced Mermaid architecture diagrams (data flow, CI/CD, learning path, system architecture) to enhance visual expressiveness
  4. Optimized homepage (Hero, Features, statistics dashboard, whitepaper entry points, research directions, latest additions)
  5. Enhanced algorithm pages: independent complexity analysis section, more professional link and tag presentation
  6. Established OpenSpec specification-driven development (SDD) process; openspec/specs/ serves as the single source of requirements

Deliverables

  • 14 in-depth whitepaper documents (28 pages in Chinese + English)
  • Unified academic citation system (GB-T 7714 / IEEE)
  • Architecture Decision Records (ADR)
  • OpenSpec specification directory and proposal workflow

Technical Debt Register

Debt ItemImpact LevelDescriptionMitigation Plan
Insufficient bilingual coverageMediumOnly ~60% of entries provide English descriptionsGradually fill through community contributions and automated translation APIs
Low optional field completenessMediumspace_complexity, related_tools, references coverage <70%Add warnings (non-blocking) in validate to guide contributors
Generator not templatizedLowCurrently uses Python f-string concatenation for Markdown; maintenance becomes difficult as complexity growsEvaluate introducing Jinja2 template engine
No runtime APILowAll queries must be completed at generation time; cannot support dynamic retrievalLong-term plan for REST API layer
External links not continuously monitoredLowpaper_url / implementation_url may become invalidEnhance CI integration frequency for link_checker.py

Future Roadmap

Short-term (1–3 months)

TaskPriorityAcceptance Criteria
Raise bilingual coverage to 75%P0stats shows description_en coverage >=75%
Enhance algorithm page visualizationP1Add complexity analysis extended descriptions for top-20 algorithm pages
Optimize VitePress searchP1Support local search filtered by complexity, year, and difficulty
Dead link auto-fix suggestionsP2CI link_checker failures output alternative link suggestions

Medium-term (3–6 months)

TaskPriorityAcceptance Criteria
Introduce algorithm benchmark data fieldsP0YAML schema v4 supports accuracy, runtime, and memory fields
Plugin system MVPP1Support third-party data enrichment plugin registration and execution
Category page visualization enhancementP1Category pages add algorithm distribution bar charts and era trend line charts
Interactive complexity comparison toolP2Support selecting multiple algorithms to generate complexity comparison tables

Long-term (6–12 months)

TaskPriorityAcceptance Criteria
REST API read-only serviceP1Provide /api/v1/algorithms endpoints with latency <200ms
Multimodal content supportP2Support embedding algorithm flowcharts, pseudocode, and video tutorials
Community contribution platformP2Algorithm proposal and review workflow based on GitHub Issues
Knowledge graph constructionP3Build interactive knowledge graphs based on category/tag/citation

Design Pattern Records

The following three design patterns have been repeatedly validated as effective in the engineering implementation of this knowledge base:

Repository Pattern

DataStore serves as the unified repository for algorithm and category data, encapsulating all data loading, index building, and query logic. In the future, regardless of whether the underlying storage migrates from YAML files to SQLite, PostgreSQL, or graph databases, business layer code will require no modifications.

Template Method Pattern

The Chinese and English generators in generate_docs.py share the same traversal skeleton (traverse all algorithms to generate detail pages, traverse all categories to generate category pages), but defer language-specific content filling to subclass/function implementations. This pattern significantly reduces the marginal cost of adding new language versions (e.g., Japanese, German).

Pipeline Pattern

The entire data pipeline (load → validate → generate → build → deploy) is designed as a sequentially executed pipeline, where each stage's output serves as the next stage's input, and failure at any stage triggers a fail-fast mechanism. This pattern naturally aligns with the design philosophy of CI/CD workflows.

Released under the MIT License.