Performance

BitCal treats performance as an evidence discipline, not as ambient marketing. This section now renders from the retained benchmark artifacts committed under benchmarks/results/retained/, so the public tables and the repository evidence stay coupled.

Baseline snapshot

Current evidence target: avx2 on x86-64
Status of numbers: Retained checkpoint
Published statistic: median ns/op
Committed source: 007c74071b25
ARM status: ARM rows stay blank
What is not promised: Universal wins

The retained baseline now follows the shipped vNext public surface: bit_and, bit_or, bit_xor, bit_andnot, popcount, equals, is_zero, shift_left, and shift_right.

ARM rows stay blank until the project retains an ARM benchmark path with the same level of reproducibility and committed artifacts.

128-bit retained operationsRetained baseline on avx2 (007c74071b25)

Operation	BitCal (ns)	std::bitset (ns)	Ratio
bit_and<128>	10.59	0.53	0.05x
bit_or<128>	10.64	0.53	0.05x
bit_xor<128>	10.77	0.55	0.05x
bit_andnot<128>	10.82	0.53	0.05x
popcount<128>	1.08	0.27	0.25x
equals<128>	0.27	0.27	1.00x
is_zero<128>	1.07	1.02	0.95x
shift_left<128>	11.12	0.53	0.05x
shift_right<128>	11.07	0.86	0.08x

192-bit retained operationsCustom-width checkpoint kept in the retained baseline

Operation	BitCal (ns)	std::bitset (ns)	Ratio
bit_and<192>	10.95	1.98	0.18x
bit_or<192>	11.18	2.03	0.18x
bit_xor<192>	10.03	1.52	0.15x
bit_andnot<192>	10.04	1.51	0.15x
popcount<192>	1.51	1.51	1.00x
equals<192>	0.26	0.26	1.00x
is_zero<192>	0.51	0.51	1.00x
shift_left<192>	10.04	9.53	0.95x
shift_right<192>	10.03	2.01	0.20x

256-bit retained operationsRepresentative fixed-width checkpoint on the active x86-64 path

Operation	BitCal (ns)	std::bitset (ns)	Ratio
bit_and<256>	1.01	1.01	1.00x
bit_or<256>	1.01	1.01	1.00x
bit_xor<256>	1.01	1.01	1.00x
bit_andnot<256>	1.01	1.01	1.00x
popcount<256>	2.01	2.01	1.00x
equals<256>	0.26	0.26	1.00x
is_zero<256>	0.95	0.96	1.00x
shift_left<256>	10.03	10.03	1.00x
shift_right<256>	10.03	10.03	1.00x

512-bit retained operationsLarger-width public algorithms on the retained path

Operation	BitCal (ns)	std::bitset (ns)	Ratio
bit_and<512>	2.01	2.01	1.00x
bit_or<512>	2.01	2.01	1.00x
bit_xor<512>	2.01	2.01	1.00x
bit_andnot<512>	2.01	2.01	1.00x
popcount<512>	4.02	4.02	1.00x
equals<512>	0.26	0.26	1.00x
is_zero<512>	0.93	0.93	1.00x
shift_left<512>	20.06	20.06	1.00x
shift_right<512>	11.05	10.04	0.91x

What the current retained baseline actually says:

wins are narrower than the old hand-written table implied;
128-bit and 192-bit operations are still broadly behind std::bitset, especially across the transform family;
256-bit and 512-bit widths now show a mixed picture with a few isolated wins, several parity rows, and several losses;
equals is effectively at parity across the retained widths, which is useful because it shows the evidence chain is reporting the public surface rather than cherry-picking only dramatic outcomes.

Measurement methodology

Performance claims remain attached to a reproducible command path and interpretation rules.

Reproduction commands

bash

cmake -S . -B build-test -DCMAKE_BUILD_TYPE=Release -DBITCAL_BUILD_BENCHMARKS=ON -DBITCAL_NATIVE_ARCH=ON
cmake --build build-test --config Release --target benchmark_compare -j"$(nproc)"
./build-test/benchmarks/benchmark_compare --json-out benchmarks/results/retained/baseline-x86_64-avx2.json
node benchmarks/scripts/generate-performance-summary.mjs \
  benchmarks/results/retained/baseline-x86_64-avx2.json \
  benchmarks/results/retained/baseline-x86_64-avx2.summary.json

Benchmark binary split

The performance evidence on this page comes from benchmark_compare, not from the smaller bitcal_benchmark smoke executable used in the guide.

Binary	Role in the docs set	Why the split exists
`benchmark_compare`	Publishes the retained vNext baseline for the shipped public algorithms and writes the raw JSON artifact.	The baseline needs a reproducible comparison harness, a structured report, and an explicit claim boundary.
`bitcal_benchmark`	Stays in Verification Path as the smoke-level executable baseline.	Verification needs a lighter executable check that is distinct from the published comparison experiment.

Method rules

Rule	Why it exists
Treat current numbers as a retained baseline checkpoint	Prevent local benchmark output from becoming timeless marketing copy.
Publish table rows from per-scenario medians	Median rows are less sensitive to occasional scheduling noise than a single best case or ad-hoc average.
Always report the active backend, CPU, and commit context	A speedup without ISA, machine, and revision context is meaningless.
Keep benchmark stories tied to public algorithm shapes	Numbers should map back to documented public algorithms, not unnamed kernel trivia.
Keep retained evidence tied to the shipped public surface	The published comparison path should measure what the library actually ships today, not deleted compatibility layers.

Interpretation guardrails

Synthetic loops are useful, but they do not represent every workload.
An x86-64-first posture is a support choice, not proof that all other targets are equally mature.
A benchmark win in one algorithm family does not automatically justify a broader API or platform claim.
A benchmark loss is still valuable evidence; the point of the retained baseline is honesty, not theater.

Claim boundary

This section intentionally refuses to claim more than the evidence supports.

Safe claims today

BitCal retains a reproducible x86-64 benchmark path with committed raw and summary artifacts.
The retained baseline is grounded in named public algorithms, named widths, and explicit backend/commit context.
BitCal is not uniformly ahead of std::bitset yet, and the retained evidence makes that visible.

Claims that remain out of bounds

universal superiority over standard-library or specialized bitmap implementations;
equal maturity across ARM64, macOS, and future x86 backends;
workload-level promises that are not backed by retained traces or scenario-specific measurements.

Where performance work goes next

The next useful expansions are methodological, not theatrical:

aligned versus unaligned comparisons;
owner versus borrowed-view workload differences;
optional external comparators only when they stay clearly outside the retained headline path;
workload traces that complement the synthetic retained baseline.

For design context, return to the Whitepaper. For contract language, continue into the Reference. For external comparison material, use Research.

Performance ​

Baseline snapshot ​

Measurement methodology ​

Reproduction commands ​

Benchmark binary split ​

Method rules ​

Interpretation guardrails ​

Claim boundary ​

Where performance work goes next ​