GPU Optimization for Technical Indicators
How I squeezed 20x performance out of common indicators like RSI and MACD using CUDA. Spoiler: it's all about memory access patterns and avoiding divergent warps.
Real-world lessons from building high-performance trading tools. No fluff, just code that actually works and benchmarks to prove it.
How I squeezed 20x performance out of common indicators like RSI and MACD using CUDA. Spoiler: it's all about memory access patterns and avoiding divergent warps.
Turns out you don't always need a GPU. Here's how to make your CPU calculate 16 EMAs at once using AVX-512, with actual benchmarks against the naive approach.
Why I threw away Python and built a backtester that processes 1M events/sec. Hint: keep everything in VRAM and never touch the CPU if you can help it.
Running 300+ technical indicators in the browser at 60fps? Yeah, it's possible. I'll show you the gotchas I hit porting Rust to WASM and how to work around them.
Get notified when new research articles and technical deep-dives are published.
Follow on GitHub