Performance Guide

Optimization strategies and profiling for N-Body Simulation.

Performance Metrics

Benchmarks (RTX 3080)

Particles	Direct N²	Barnes-Hut	Spatial Hash
10K	60+ FPS	120+ FPS	120+ FPS
100K	~10 FPS	60+ FPS	90+ FPS
1M	<1 FPS	25+ FPS	60+ FPS
5M	N/A	~5 FPS	15+ FPS

Memory Footprint

Configuration	Memory Usage
Base (position, velocity, mass)	~52 bytes/particle
Barnes-Hut tree	+~32 bytes/particle
Spatial hash grid	+~16 bytes/particle
1M particles total	~52-84 MB

Optimization Strategies

1. Algorithm Selection

Choose the right algorithm for your particle count:

if (n < 10000) {
    method = ForceMethod::DIRECT;
} else if (long_range_forces) {
    method = ForceMethod::BARNES_HUT;
} else {
    method = ForceMethod::SPATIAL_HASH;
}

2. Time Step Tuning

Smaller dt = more accurate, slower
Larger dt = faster, potential instability
Typical range: 0.0001 to 0.01

3. Softening Parameter

Prevents numerical singularities:

// Too small: numerical instability
// Too large: distorts physics at small scales
config.softening = 0.01f;  // Good default

4. Barnes-Hut Theta

// Lower θ = more accurate, slower
config.theta = 0.3f;  // High accuracy
config.theta = 0.5f;  // Balanced
config.theta = 0.7f;  // Fast, lower accuracy

5. GPU Architecture

Compile with your GPU’s compute capability:

# RTX 30xx (Ampere)
cmake .. -DCMAKE_CUDA_ARCHITECTURES=86

# RTX 20xx (Turing)
cmake .. -DCMAKE_CUDA_ARCHITECTURES=75

# GTX 10xx (Pascal)
cmake .. -DCMAKE_CUDA_ARCHITECTURES=61

Profiling

Nsight Compute

ncu --set full ./nbody_sim 100000

Nsight Systems

nsys profile -o report ./nbody_sim 100000
nsys ui report.nsys-rep

Built-in Timing

Enable in config for frame time output:

config.enable_profiling = true;

Best Practices

Always build Release mode: cmake -DCMAKE_BUILD_TYPE=Release
Use appropriate algorithm: See selection guide above
Monitor VRAM: Use nvidia-smi to check memory usage
Batch multiple simulations: Amortize initialization cost
Profile before optimizing: Don’t guess, measure