Architecture decisions and optimization techniques from building production trading systems.
Every article includes benchmarks, profiling data, and implementation details from real VectorAlpha projects.
Technical article
Achieving 20x Performance with GPU Accelerated Technical Indicators
How the VectorAlpha technical analysis library uses CUDA to accelerate indicator workloads on modern GPUs.
Walks through tiled ALMA kernels, shared memory layouts and multi series paths, and uses real benchmarks to show how a heavy ALMA batch reaches around 20x the throughput of an AVX512 CPU kernel for that workload.
How the VectorAlpha technical analysis library uses AVX2 and AVX512 kernels to accelerate heavy indicator workloads across more than 300 functions.
Covers kernel selection, windowing patterns, streaming APIs and batch parameter sweeps built on top of a shared SIMD dot product core.
Architecture of a Backtesting Engine that Handles a Million Events Per Second
Complete architectural breakdown of VectorAlpha's backtesting engine achieving 1M+ events per second throughput.
Explores zero copy data pipelines, GPU resident order books, and lock free ring buffers for communication between threads.
Architecture
•HFT
Coming Q1 2026
Lock Free Data Structures for Real Time Market Data
Design and implementation of wait free ring buffers and lock free order books handling 10M+ updates per second.
Includes latency profiling, memory ordering considerations, and comparison with traditional mutex based approaches.
Low Latency
•C++
Follow Our Technical Work
Watch our repositories for implementation details, benchmarks, and performance analysis updates.