Performance
BitCal treats performance as an evidence discipline, not as ambient marketing. This section now renders from the retained benchmark artifacts committed under benchmarks/results/retained/, so the public tables and the repository evidence stay coupled.
Baseline snapshot
- Current evidence target
- avx2 on x86-64
- Status of numbers
- Retained checkpoint
- Published statistic
- median ns/op
- Committed source
- 007c74071b25
- ARM status
- ARM rows stay blank
- What is not promised
- Universal wins
Each row comes from the committed summary artifact, not a hand-picked run.
No retained ARM benchmark path exists yet.
Committed local numbers are checkpoints, not blanket product guarantees.
The retained baseline now follows the shipped vNext public surface: bit_and, bit_or, bit_xor, bit_andnot, popcount, equals, is_zero, shift_left, and shift_right.
ARM rows stay blank until the project retains an ARM benchmark path with the same level of reproducibility and committed artifacts.
| Operation | BitCal (ns) | std::bitset (ns) | Ratio |
|---|---|---|---|
| bit_and<128> | 10.59 | 0.53 | 0.05x |
| bit_or<128> | 10.64 | 0.53 | 0.05x |
| bit_xor<128> | 10.77 | 0.55 | 0.05x |
| bit_andnot<128> | 10.82 | 0.53 | 0.05x |
| popcount<128> | 1.08 | 0.27 | 0.25x |
| equals<128> | 0.27 | 0.27 | 1.00x |
| is_zero<128> | 1.07 | 1.02 | 0.95x |
| shift_left<128> | 11.12 | 0.53 | 0.05x |
| shift_right<128> | 11.07 | 0.86 | 0.08x |
| Operation | BitCal (ns) | std::bitset (ns) | Ratio |
|---|---|---|---|
| bit_and<192> | 10.95 | 1.98 | 0.18x |
| bit_or<192> | 11.18 | 2.03 | 0.18x |
| bit_xor<192> | 10.03 | 1.52 | 0.15x |
| bit_andnot<192> | 10.04 | 1.51 | 0.15x |
| popcount<192> | 1.51 | 1.51 | 1.00x |
| equals<192> | 0.26 | 0.26 | 1.00x |
| is_zero<192> | 0.51 | 0.51 | 1.00x |
| shift_left<192> | 10.04 | 9.53 | 0.95x |
| shift_right<192> | 10.03 | 2.01 | 0.20x |
| Operation | BitCal (ns) | std::bitset (ns) | Ratio |
|---|---|---|---|
| bit_and<256> | 1.01 | 1.01 | 1.00x |
| bit_or<256> | 1.01 | 1.01 | 1.00x |
| bit_xor<256> | 1.01 | 1.01 | 1.00x |
| bit_andnot<256> | 1.01 | 1.01 | 1.00x |
| popcount<256> | 2.01 | 2.01 | 1.00x |
| equals<256> | 0.26 | 0.26 | 1.00x |
| is_zero<256> | 0.95 | 0.96 | 1.00x |
| shift_left<256> | 10.03 | 10.03 | 1.00x |
| shift_right<256> | 10.03 | 10.03 | 1.00x |
| Operation | BitCal (ns) | std::bitset (ns) | Ratio |
|---|---|---|---|
| bit_and<512> | 2.01 | 2.01 | 1.00x |
| bit_or<512> | 2.01 | 2.01 | 1.00x |
| bit_xor<512> | 2.01 | 2.01 | 1.00x |
| bit_andnot<512> | 2.01 | 2.01 | 1.00x |
| popcount<512> | 4.02 | 4.02 | 1.00x |
| equals<512> | 0.26 | 0.26 | 1.00x |
| is_zero<512> | 0.93 | 0.93 | 1.00x |
| shift_left<512> | 20.06 | 20.06 | 1.00x |
| shift_right<512> | 11.05 | 10.04 | 0.91x |
What the current retained baseline actually says:
- wins are narrower than the old hand-written table implied;
- 128-bit and 192-bit operations are still broadly behind
std::bitset, especially across the transform family; - 256-bit and 512-bit widths now show a mixed picture with a few isolated wins, several parity rows, and several losses;
equalsis effectively at parity across the retained widths, which is useful because it shows the evidence chain is reporting the public surface rather than cherry-picking only dramatic outcomes.
Measurement methodology
Performance claims remain attached to a reproducible command path and interpretation rules.
Reproduction commands
cmake -S . -B build-test -DCMAKE_BUILD_TYPE=Release -DBITCAL_BUILD_BENCHMARKS=ON -DBITCAL_NATIVE_ARCH=ON
cmake --build build-test --config Release --target benchmark_compare -j"$(nproc)"
./build-test/benchmarks/benchmark_compare --json-out benchmarks/results/retained/baseline-x86_64-avx2.json
node benchmarks/scripts/generate-performance-summary.mjs \
benchmarks/results/retained/baseline-x86_64-avx2.json \
benchmarks/results/retained/baseline-x86_64-avx2.summary.jsonBenchmark binary split
The performance evidence on this page comes from benchmark_compare, not from the smaller bitcal_benchmark smoke executable used in the guide.
| Binary | Role in the docs set | Why the split exists |
|---|---|---|
benchmark_compare | Publishes the retained vNext baseline for the shipped public algorithms and writes the raw JSON artifact. | The baseline needs a reproducible comparison harness, a structured report, and an explicit claim boundary. |
bitcal_benchmark | Stays in Verification Path as the smoke-level executable baseline. | Verification needs a lighter executable check that is distinct from the published comparison experiment. |
Method rules
| Rule | Why it exists |
|---|---|
| Treat current numbers as a retained baseline checkpoint | Prevent local benchmark output from becoming timeless marketing copy. |
| Publish table rows from per-scenario medians | Median rows are less sensitive to occasional scheduling noise than a single best case or ad-hoc average. |
| Always report the active backend, CPU, and commit context | A speedup without ISA, machine, and revision context is meaningless. |
| Keep benchmark stories tied to public algorithm shapes | Numbers should map back to documented public algorithms, not unnamed kernel trivia. |
| Keep retained evidence tied to the shipped public surface | The published comparison path should measure what the library actually ships today, not deleted compatibility layers. |
Interpretation guardrails
- Synthetic loops are useful, but they do not represent every workload.
- An x86-64-first posture is a support choice, not proof that all other targets are equally mature.
- A benchmark win in one algorithm family does not automatically justify a broader API or platform claim.
- A benchmark loss is still valuable evidence; the point of the retained baseline is honesty, not theater.
Claim boundary
This section intentionally refuses to claim more than the evidence supports.
Safe claims today
- BitCal retains a reproducible x86-64 benchmark path with committed raw and summary artifacts.
- The retained baseline is grounded in named public algorithms, named widths, and explicit backend/commit context.
- BitCal is not uniformly ahead of
std::bitsetyet, and the retained evidence makes that visible.
Claims that remain out of bounds
- universal superiority over standard-library or specialized bitmap implementations;
- equal maturity across ARM64, macOS, and future x86 backends;
- workload-level promises that are not backed by retained traces or scenario-specific measurements.
Where performance work goes next
The next useful expansions are methodological, not theatrical:
- aligned versus unaligned comparisons;
- owner versus borrowed-view workload differences;
- optional external comparators only when they stay clearly outside the retained headline path;
- workload traces that complement the synthetic retained baseline.
For design context, return to the Whitepaper. For contract language, continue into the Reference. For external comparison material, use Research.