Backtesting Fundamentals
Think of a backtest as a model of what your strategy would have been allowed to know, when it would have been allowed to act, and what the market would likely have charged for that action. Most bad backtests fail before any code becomes slow or any optimizer becomes clever. They fail because the simulation contract is vague.
This page starts with the model before the metrics. If the execution assumptions are wrong, the Sharpe ratio is just a cleaner way to summarize the wrong answer. A useful backtest removes weak ideas, surfaces obvious failure modes, and makes the surviving ideas precise enough to test under stricter conditions.
What a backtest is actually claiming
Every backtest makes four claims whether the code says so explicitly or not. First, it claims that the data visible on each bar was the data the strategy truly had at that moment. Second, it claims that orders are filled under a defined execution rule rather than by hindsight. Third, it claims that costs, slippage, and capital constraints are close enough to reality to matter. Fourth, it claims that the optimization process did not leak future information into the final result.
State those claims clearly or treat the result as a chart experiment. The experiment can be useful as a sketch of behavior, but it carries weak evidence of robustness.
Where backtests usually lie
Look-ahead bias
Look-ahead bias is the easiest way to make a strategy look smarter than it is. The general form is simple: the strategy is allowed to react to information that would not have been available at decision time. Sometimes that leak is obvious, such as using the current bar high or low before the bar has closed. Sometimes it is quieter, such as an indicator pipeline that aligns outputs one bar too early.
The prevention is also simple in principle, even if it takes discipline in code. Decide what event causes a signal to become valid, decide when an order may be sent, decide when that order may fill, and keep those boundaries consistent across the whole engine. Wrapping the timing bug inside an indicator helper still leaves the backtest broken.
Survivorship bias
Survivorship bias appears when the dataset silently excludes instruments that failed, delisted, merged away, or otherwise disappeared from the present-day universe. That produces a cleaner and stronger result than a real trader would have seen because the test never has to own the dead names. The effect is especially severe in long-horizon equity research and any universe-selection workflow that starts from the constituents of today.
The fix requires point-in-time universes, delisted names where the strategy would have seen them, and a data pipeline that is explicit about corporate actions and membership changes. If the universe definition is fuzzy, the equity curve is fuzzy as well.
Optimization leakage
Optimization leakage is what happens when the search process learns too much from the same data used to judge the final answer. The usual symptom is a parameter surface that looks impressive in sample and collapses the moment the regime shifts. The deeper problem is that optimization itself is part of the experiment. If you tune a strategy on the full history and then report the full-history result as if the strategy discovered itself honestly, the test is contaminated.
Optimization and validation have to be treated as separate stages. The goal of the optimizer is to search candidate space efficiently. The goal of validation is to ask whether the candidate that survived the search still holds together without letting the search procedure keep peeking at the answer.
Optimization And Validation Are Different Stages
An optimizer answers a narrow question: among the candidates I allowed, under the assumptions I set, which parameter sets ranked best on the chosen objective. Robustness asks a different question. It is about stability across different samples, different cost assumptions, nearby parameter values, and market conditions that fail to flatter the exact tuning run.
Walk-forward analysis is one way to force that distinction into the process. Instead of fitting once on the whole sample, you repeatedly fit on a historical window and test on the following holdout. Walk-forward makes parameter instability visible earlier by turning optimization into a rolling decision process. It still leaves overfitting on the table if the strategy idea itself is weak.
VectorGrid belongs on the optimization side of that boundary. It makes exhaustive search and large parameter sweeps practical. The search result still has to survive holdout testing, cost stress, and strategy-level scrutiny. GPU speed changes throughput; the methodological debt remains.
Metrics that are worth keeping
Metrics are useful once the simulation contract is credible. Before that they mostly function as decoration. The three that matter most early are risk-adjusted return, drawdown severity, and the shape of trade outcomes. Sharpe and Sortino are part of that picture, but neither one should be treated as a final verdict on a strategy.
Sharpe and Sortino
Sharpe = (Return - Risk-Free Rate) / Standard Deviation
Sortino = (Return - Target Return) / Downside Deviation Sharpe ratio is a compact way to ask whether the return stream justified its overall volatility. Sortino asks a more selective question by penalizing downside volatility only. Both are easy to abuse. A high Sharpe on a short or regime-specific sample gives weak evidence. A high Sortino with weak trade count or fragile parameter sensitivity gives weak evidence as well. Use them as summaries, not verdicts.
Maximum drawdown
Maximum drawdown matters because it answers a question that return metrics blur: how bad did the path get while you were waiting for the strategy to be right again? Many strategies with acceptable average behavior become unusable once you look at the full path of capital.
Profit factor and trade structure
Profit factor, win rate, average win versus average loss, and trade count help explain how the strategy is making money. A backtest with a decent headline metric but a tiny number of oversized winners is a very different object from one with broad trade support and stable loss containment. The second is usually easier to trust.
Where VectorGrid fits
VectorGrid is useful when the search itself is the bottleneck. That is the point where exact grid search, GPU-resident indicator work, and fast reruns start to matter. The product and the related technical insights material focus on how to make that search tractable without turning the whole workflow into a benchmark shell.
VectorGrid changes the cost of exploring a real parameter surface and the practicality of repeating that work under stricter validation rules. Weak strategies stay weak, biased datasets stay biased, and in-sample overfitting stays a real problem.
If you want the implementation side of that story, read Backtesting Engine for product context and VectorGrid: exact search at GPU speed for the engineering argument behind the current design.
A minimum checklist before trusting the result
- The signal timing, order timing, and fill timing are explicitly defined.
- The dataset matches the universe the strategy would actually have seen.
- Slippage, fees, and position sizing sit inside the core simulation contract.
- Optimization and validation are separated into different stages.
- The best region is stable under nearby parameter changes.
- The strategy survives holdout data that did not participate in the search.
Next reads
If the next question is risk control, read Risk Management Principles. If the next question is how to express rules as code, move to the Strategy Development Tutorial. If the next question is how the optimization side is structured in this stack, continue with the Backtesting Engine page and the related technical insights articles.