Skip to main content

Technical Insights

Architecture decisions and optimization techniques from building production trading systems. Every article includes benchmarks, profiling data, and implementation details from real VectorAlpha projects.

Technical article

Achieving 20x Performance with GPU Accelerated Technical Indicators

How the VectorAlpha technical analysis library uses CUDA to accelerate indicator workloads on modern GPUs. Walks through tiled ALMA kernels, shared memory layouts and multi series paths, and uses real benchmarks to show how a heavy ALMA batch reaches around 20x the throughput of an AVX512 CPU kernel for that workload.

Performance CUDA
Read the full article
Technical article

SIMD vectorization for technical indicators

How the VectorAlpha technical analysis library uses AVX2 and AVX512 kernels to accelerate heavy indicator workloads across more than 300 functions. Covers kernel selection, windowing patterns, streaming APIs and batch parameter sweeps built on top of a shared SIMD dot product core.

Rust AVX512
Read the full article
Coming Q1 2026

Architecture of a Backtesting Engine that Handles a Million Events Per Second

Complete architectural breakdown of VectorAlpha's backtesting engine achieving 1M+ events per second throughput. Explores zero copy data pipelines, GPU resident order books, and lock free ring buffers for communication between threads.

Architecture HFT
Coming Q1 2026

Lock Free Data Structures for Real Time Market Data

Design and implementation of wait free ring buffers and lock free order books handling 10M+ updates per second. Includes latency profiling, memory ordering considerations, and comparison with traditional mutex based approaches.

Low Latency C++

Follow Our Technical Work

Watch our repositories for implementation details, benchmarks, and performance analysis updates.

View Source Code on GitHub