Back to overview

Why live trading rarely matches the backtest (and what to do about it)

The test looks strong, live looks weaker: often due to overfitting and missing independent review on later data. Here is a direct way to structure that check.

Spaike – backtest vs. live trading

Why does my live trading perform worse than my backtest strategy?

Short answer: Usually because simulated results rely on assumptions that are too optimistic or incomplete (fees, slippage, execution, margin), and because rules are often tuned to fit historical data without a strict hold-out period for testing. Markets also move through different regimes, so what worked in the sample may weaken later. The sections below explain each point and how out-of-sample checks reduce the gap.

Many traders see a gap: a strategy looks clear and strong in a backtest, but live results and day-to-day experience do not match. Markets do change, but the main reason is often different: the strategy was fitted to historical data so that it describes that period well, without a clean split between what is used to build and what is used to validate.

This article explains why such gaps are common and how a reserved time window and out-of-sample logic make the process stricter.

Why backtests and live trading diverge

A backtest reflects your assumptions: data quality, fees, slippage, margin, condition order, execution. If assumptions are too optimistic or incomplete, simulated performance will sit above what is realistically achievable.

Overfitting adds to this when you change many parameters, filters or windows until history looks good. The model may learn patterns in noise that do not repeat.

Markets also shift in character (volatility, liquidity, trends). A rule set that worked in one phase can be weaker in another without any bug in the code.

The missing step: data that must stay out of the build

Core method: you split the timeline. One part of history is used to develop and optimize the strategy. Another part stays untouched at first and is used only for later evaluation. That mimics not knowing the future while you design the rules.

One concrete workflow (example): you exclude the last five months from the period you optimize on. You build rules and parameters on the earlier window. Then you test the same fixed logic on the next five months: does it still hold there without those months being part of the optimization? If not, the fit was probably too specific to the first window.

The exact month count can vary; the rule matters: whatever is marked as reserved must not be folded back into parameter search. That is the idea of out-of-sample testing and a simple walk-forward structure.

What this split gives you

Stricter tests rarely produce the highest simulated curves. They lower the risk of overrating a strategy that only fits one slice of history. You get a clearer answer on whether the rules remain viable outside the optimization window.

Conclusion

When live and backtest diverge strongly, start by checking methodology: assumptions, overfitting, and whether an independent test phase is missing. Reserving part of the data strictly for later evaluation tests the idea more severely than a single optimized curve over the full sample.