Optimization strategies and profiling for N-Body Simulation.
Benchmarks (RTX 3080)
| Particles |
Direct N² |
Barnes-Hut |
Spatial Hash |
| 10K |
60+ FPS |
120+ FPS |
120+ FPS |
| 100K |
~10 FPS |
60+ FPS |
90+ FPS |
| 1M |
<1 FPS |
25+ FPS |
60+ FPS |
| 5M |
N/A |
~5 FPS |
15+ FPS |
| Configuration |
Memory Usage |
| Base (position, velocity, mass) |
~52 bytes/particle |
| Barnes-Hut tree |
+~32 bytes/particle |
| Spatial hash grid |
+~16 bytes/particle |
| 1M particles total |
~52-84 MB |
Optimization Strategies
1. Algorithm Selection
Choose the right algorithm for your particle count:
1
2
3
4
5
6
7
| if (n < 10000) {
method = ForceMethod::DIRECT;
} else if (long_range_forces) {
method = ForceMethod::BARNES_HUT;
} else {
method = ForceMethod::SPATIAL_HASH;
}
|
2. Time Step Tuning
- Smaller dt = more accurate, slower
- Larger dt = faster, potential instability
- Typical range: 0.0001 to 0.01
3. Softening Parameter
Prevents numerical singularities:
1
2
3
| // Too small: numerical instability
// Too large: distorts physics at small scales
config.softening = 0.01f; // Good default
|
4. Barnes-Hut Theta
1
2
3
4
| // Lower θ = more accurate, slower
config.theta = 0.3f; // High accuracy
config.theta = 0.5f; // Balanced
config.theta = 0.7f; // Fast, lower accuracy
|
5. GPU Architecture
Compile with your GPU’s compute capability:
1
2
3
4
5
6
7
8
| # RTX 30xx (Ampere)
cmake .. -DCMAKE_CUDA_ARCHITECTURES=86
# RTX 20xx (Turing)
cmake .. -DCMAKE_CUDA_ARCHITECTURES=75
# GTX 10xx (Pascal)
cmake .. -DCMAKE_CUDA_ARCHITECTURES=61
|
Profiling
Nsight Compute
1
| ncu --set full ./nbody_sim 100000
|
Nsight Systems
1
2
| nsys profile -o report ./nbody_sim 100000
nsys ui report.nsys-rep
|
Built-in Timing
Enable in config for frame time output:
1
| config.enable_profiling = true;
|
Best Practices
- Always build Release mode:
cmake -DCMAKE_BUILD_TYPE=Release
- Use appropriate algorithm: See selection guide above
- Monitor VRAM: Use
nvidia-smi to check memory usage
- Batch multiple simulations: Amortize initialization cost
- Profile before optimizing: Don’t guess, measure