Performance Optimization Decision Tree

This guide helps you navigate the optimization process with a structured decision framework.

Overview

Performance optimization is not random—it follows a systematic approach. This decision tree helps you identify the right optimization technique for your specific situation.

The Golden Rule

Always profile before optimizing!
Premature optimization is the root of all evil (or at least most of it) in programming. — Donald Knuth

Decision Flowchart

Optimization Decision FlowMermaid

Quick Diagnosis Checklist

CPU-Bound Symptoms

Indicator	Command	Threshold
High CPU utilization	`perf stat`	> 80% CPU busy
Low cache miss rate	`perf stat -e cache-misses`	< 1% LLC miss
Hot loops in profile	`perf report`	Single loop dominates

Go to: SIMD Optimization

Memory-Bound Symptoms

Indicator	Command	Threshold
High cache misses	`perf stat -e cache-misses`	> 5% LLC miss
Memory bandwidth saturated	`perf stat -e bus-cycles`	Near peak
Long memory latency	`perf mem`	High latency

Go to: Memory Optimization

Concurrency Issues

Indicator	Command	Detection
Poor thread scaling	`OMP_NUM_THREADS` varying	Speedup < 0.5 times threads
Lock contention	`perf record -e lock:*`	High lock wait time
Data races	`cmake --preset=tsan`	TSAN reports races

Go to: Concurrency Optimization

Optimization Techniques by Category

SIMD Optimization

SIMD Optimization FlowMermaid

Key Techniques:

Auto-vectorization - Let compiler do the work
Alignment - alignas(64) for SIMD-friendly data
Restrict pointers - float* __restrict ptr
Manual intrinsics - SSE/AVX/AVX-512 when compiler fails

Reference: SIMD Module

Memory Optimization

Memory Optimization FlowMermaid

Key Techniques:

AOS to SOA - Structure of Arrays for sequential access
False sharing fix - alignas(64) padding
Prefetching - __builtin_prefetch for predictable patterns
Cache blocking - Process data in cache-sized chunks

Reference: Memory Module

Concurrency Optimization

Concurrency Optimization FlowMermaid

Key Techniques:

Atomic operations - std::atomic with correct memory ordering
Lock-free structures - SPSC queues, lock-free stacks
False sharing prevention - Cache line padding
OpenMP parallelization - #pragma omp parallel for

Reference: Concurrency Module

Decision Quick Reference

Your Situation	First Try	Alternative
Loop runs many iterations	Auto-vectorization	Manual SIMD
Sequential data access, high cache miss	AOS to SOA	Prefetching
Multi-threaded, poor scaling	Check false sharing	Lock-free
Random memory access	Blocking/tiling	B-tree structure
High branch misprediction	Branchless code	Sort data
Small critical section	Lock-free	Fine-grained locks

Profiling Workflow

Profiling WorkflowMermaid

Common Pitfalls

Premature Optimization

cpp

// BAD: Optimizing without profiling
void process() {
    // "This loop looks slow, let me SIMD it"
    for (int i = 0; i < 10; ++i) { ... }  // Only 10 iterations!
}

Fix: Profile first. Small loops may not matter.

Wrong Optimization for the Problem

cpp

// BAD: Using SIMD for memory-bound code
void process(float* data, size_t n) {
    // Memory bandwidth saturated, SIMD won't help!
    for (size_t i = 0; i < n; ++i) {
        data[i] = data[i] * 2.0f;  // Memory-bound, not CPU-bound
    }
}

Fix: Check if CPU or memory bound first.

Ignoring Algorithmic Complexity

cpp

// BAD: Optimizing O(n^2) algorithm
for (int i = 0; i < n; ++i) {
    for (int j = 0; j < n; ++j) {  // O(n^2)
        // SIMD won't save you here
    }
}

Fix: Use better algorithm first (O(n log n) or O(n)).

Next Steps

Profile your code using the Profiling Guide
Identify the bottleneck using the decision tree above
Apply the appropriate technique from the relevant module
Measure the improvement with benchmarks
Document your findings for future reference

Learning Path - Structured curriculum
Profiling Guide - How to profile
Best Practices - Industry patterns
FAQ - Common questions

Performance Optimization Decision Tree ​

Overview ​

The Golden Rule ​

Decision Flowchart ​

Quick Diagnosis Checklist ​

CPU-Bound Symptoms ​

Memory-Bound Symptoms ​

Concurrency Issues ​

Optimization Techniques by Category ​

SIMD Optimization ​

Memory Optimization ​

Concurrency Optimization ​

Decision Quick Reference ​

Profiling Workflow ​

Common Pitfalls ​

Premature Optimization ​

Wrong Optimization for the Problem ​

Ignoring Algorithmic Complexity ​

Next Steps ​

Related Resources ​

Performance Optimization Decision Tree

Overview

The Golden Rule

Decision Flowchart

Quick Diagnosis Checklist

CPU-Bound Symptoms

Memory-Bound Symptoms

Concurrency Issues

Optimization Techniques by Category

SIMD Optimization

Memory Optimization

Concurrency Optimization

Decision Quick Reference

Profiling Workflow

Common Pitfalls

Premature Optimization

Wrong Optimization for the Problem

Ignoring Algorithmic Complexity

Next Steps

Related Resources