Performance Benchmarks
A high-level, cross-indicator view of CPU and CUDA performance.
How to read this
Each indicator produces its own speedup value (a ratio): higher is better, and 1.0× means “about the same”. Use Distribution to see how speedups are spread across indicators, Top to spot the biggest wins, Times to compare raw timings, and Explorer to filter/search and export.
Overall speedup (sum/sum) is the ratio of total time across all included indicators,
e.g. Σ Tulip_ms / Σ VectorTA_ms or Σ CPU_ms / Σ CUDA_ms. This answers:
“If I ran the whole set, what would I see overall?”
Median and GeoMean are computed over per-indicator speedups. Median is robust to outliers; geometric mean is often a better “typical ratio” summary than arithmetic mean for speedups.
Methodology (what’s included)
CPU comparisons use recorded per-indicator timings for “Rust Native” vs “Tulip” at each size. GPU comparisons use available CUDA timings and the corresponding VectorTA CPU baseline.
GPU results are normalized to a consistent workload: 1M candles × 250 parameter sets.
When an indicator only has 1M×N CUDA timing, it is scaled to approximate 1M×250.
The GPU aggregate view only includes indicators where CUDA is faster (> 1.0×). Indicators that are slower on CUDA are still documented on their indicator pages (with a note to prefer Rust/CPU).