Wheel θ · G. Wallace Lees Chisholm

All numbers from a self built backtest engine on free yfinance data. Honest about every limitation. Code, validation outputs, and the live paper trading harness are runnable and reproducible.

Reading map

Where I started and what I was trying to beat
What an option actually is, and why selling them pays you
The volatility risk premium, measured
Building the backtest engine
Strategy 1: the classic wheel and why it disappointed me
Strategy 2: what happens if you skip assignment
Strategy 3: adding a trend filter
Strategy 4: the short strangle, the breakout moment
Tuning delta and DTE
What 3x leveraged ETFs actually do to your portfolio
Diversification: the cross asset surprise (and correction)
The 60/20/20 blend
The regime allocator breakthrough
Walk forward, Monte Carlo, and the block bootstrap
The actual stress test: dot com and 2008
Everything I tried that did not work
Going live: the paper trading harness
Final synthesis and what I would actually deploy
What this research cannot answer

Part one. Setup

01 Where I started and what I was trying to beat

The wheel strategy keeps showing up in retail finance discussions. Sell cash secured puts on stocks you would not mind owning, take assignment if the stock falls below your strike, sell covered calls against those shares until they get called away, repeat. The pitch is that you get paid premium for being patient. I wanted to know if this was real alpha or marketing.

My personal benchmark was always QQQ buy and hold. From 2010 through 2024, QQQ buy and hold returned roughly 17 percent CAGR with a worst drawdown of minus 35 percent during 2022 and a recovery time of roughly six months. That is the bar. Anything I built had to do better than that on a risk adjusted basis, or there was no point.

My constraint was specific. I told myself that a drawdown of 30 or even 50 percent was acceptable as long as the strategy could characterize it as normal behavior and the math supported recovery. What I would not tolerate was a strategy that could permanently impair my capital. So the early question I was asking was not "what is the highest CAGR I can achieve" but rather "what is the highest CAGR I can achieve without an account blow up risk."

I wanted a system that I could leave running and trust. A system that had a known failure mode I could prepare for, not a hidden one that would surprise me.

Part two. Building blocks

02 What an option actually is, and why selling them pays you

An option is a contract. The buyer pays a premium today for the right, but not the obligation, to do something at a future date. The seller collects the premium today in exchange for taking on the obligation. There are two types of basic options.

A call option gives the buyer the right to buy 100 shares at the strike price on or before expiration. If the stock rises above the strike, the call is worth the difference. If it stays below, the call expires worthless.
A put option gives the buyer the right to sell 100 shares at the strike price on or before expiration. If the stock falls below the strike, the put is worth the difference. If it stays above, the put expires worthless.

The seller of either option collects the premium up front but takes on the corresponding obligation. The buyer is essentially purchasing insurance or leverage. The seller is essentially providing it.

Why selling pays positive expected value

On average, across long historical samples, the premium that option buyers pay is more than the eventual cost of the obligation. This gap is called the volatility risk premium. Two reasons it exists.

First, hedging demand. Pension funds, insurance companies, and mutual funds need to hedge their long equity exposure. They buy index puts as portfolio insurance. There is structurally more demand for these puts than there are sellers, so the price gets bid up. The expected payout on those puts is lower than what buyers pay.

Second, lottery preference. Retail traders and speculators buy calls hoping for outsized returns. They are systematically willing to overpay for the lottery ticket profile of out of the money calls. Same effect: the price exceeds the rational expected value.

The empirical size of this gap, on the S and P 500 index, is around 2 to 4 percent per year of realized vol overpayment. That is the structural edge that any short volatility strategy is trying to harvest. Everything else is execution detail.

03 The volatility risk premium, measured

Before I built anything, I wanted to know how big the premium gap actually is on the tickers I would be trading. So I pulled live option chains and computed the ratio of market implied vol to the underlying's trailing realized vol. This is the practical version of the academic VRP measurement.

Each bar shows median market IV divided by trailing 30 day realized vol, measured from live chains in 2026. Index ETFs price options at roughly 1.30x realized. Single stocks come in around 1.10. NVDA is unusually low because retail call buying has compressed the put side premium.

Three things are worth noting here.

Index ETFs (SPY, QQQ, IWM) command the largest premium because every pension fund and mutual fund hedges with index puts. The 1.30x multiplier is roughly textbook for the SPX put premium.

Single name stocks command much less premium, around 1.10x, because there is less hedging demand for them. Pension funds do not buy AAPL puts, they buy SPX puts.

NVDA is anomalous. Its implied vol trades at almost exactly the same level as realized. The reason is that retail call buying has been so intense that market makers can hedge calls cheaply against existing demand, suppressing the put side premium that normally lifts the average IV.

I used these ticker specific multipliers throughout the backtest, calibrated against current chains. They are not arbitrary numbers.

04 Building the backtest engine

The cheapest historical options data available costs roughly 30 dollars per month from Polygon. I wanted to start without that, so I built a synthetic engine. The logic is straightforward.

On every trading day in the simulation, I take the underlying's price from yfinance, compute its trailing 30 day realized volatility, multiply by the ticker specific uplift from the previous section to get a synthetic implied vol, and use that to price options via Black Scholes. The strike I select is whichever strike has the target delta. For example, a 35 delta put on QQQ is the strike that has roughly a 35 percent probability of finishing in the money, by Black Scholes assumptions.

The simulation runs once per day. Each iteration handles three things in order:

Settle expirations. If I have an open option that expires today, settle it at intrinsic value. For cash settled variants like strangles, the loss or gain is just the difference between the spot price and the strike. For the wheel, if the put is in the money I take assignment of 100 shares per contract at the strike price.
Process management rules. Some experiments included profit take rules, stop losses, or close at N days to expiration. These check before any new trade is opened.
Open new positions if flat. If I have no open option, select strikes by target delta, compute the premium, deduct commission, credit the cash account.

I want to be explicit about what this engine cannot model. It does not capture bid ask widening during crises. It does not model volatility skew (the fact that puts trade at higher IV than calls in real markets). It does not handle pin risk at expiration. It assumes I can always exit at intrinsic value. These are real limitations and I quantify their impact later.

Part three. Strategy by strategy

05 Strategy 1: the classic wheel and why it disappointed me

How the classic wheel works

Imagine you would not mind owning 100 shares of QQQ at a 5 percent discount to today's price. The wheel lets you get paid to be patient about that purchase.

You sell a cash secured put with a strike about 5 percent below the current price. You collect a premium. The buyer of that put has the right to sell you those shares at the strike on the expiration date. Three things can happen.

QQQ stays above your strike. The put expires worthless. You keep the premium. You sell another put. Repeat.
QQQ falls slightly below your strike. The put expires in the money. You pay the intrinsic value (the difference between strike and spot) but you keep the premium. Net result: you bought 100 shares at the strike, but your effective cost is reduced by the premium you collected.
QQQ falls far below your strike. Same mechanics as above, but the assignment is at a price well above current spot. You are now holding shares that are below water.

Once you own shares, you sell a covered call with a strike about 5 percent above your cost basis. You collect another premium. Three things can happen.

QQQ stays below your call strike. The call expires worthless. You keep the premium and the shares. Sell another call. Repeat.
QQQ rises slightly above your strike. The call expires in the money. You sell the shares at the strike, capturing the difference between the strike and your cost basis as profit, plus the premium.
QQQ rockets up. Same mechanics. You sell at the strike. You miss the rest of the rally.

The wheel in two steps. The short put on the left has limited upside (capped at the premium received). The covered call on the right caps upside even when the shares rally hard, because you have to sell at the call strike.

The 14 year backtest result

I ran the classic wheel on QQQ from 2010 to 2024, selling 35 delta puts at 14 days to expiration. The premium uplift was 1.30x realized vol from the calibration above. Here is what came out.

Metric	QQQ Buy and Hold	Classic Wheel 35d / 14DTE
CAGR	18.51%	18.40%
Sharpe ratio	0.74	1.10
Maximum drawdown	-35.1%	-22.3%

The wheel underperformed on CAGR by a couple of percentage points, but it had a meaningfully better Sharpe ratio and a smaller worst drawdown. So it was not worthless. But it was not what the marketing suggested either.

The reason is structural. The 14 year window from 2010 to 2024 was an unusually strong bull market for tech. QQQ went up roughly 10x in that period. Every time the wheel had me assigned and selling covered calls, the calls would get exercised and I would sell the shares at the call strike, capturing a small profit but missing the next 20 percent of the rally. The covered call leg is a structural tax on upside in a trending market.

06 Strategy 2: what happens if you skip assignment

If the covered call is the problem, what happens if I never take assignment? I built a variant called CSP only. When the put expires in the money, I just cash settle the loss and immediately open a new put at a fresh delta. I never hold the underlying.

The result was worse on CAGR. The reason is that when QQQ takes a drawdown and comes back, the wheel captures the share recovery as gains on the underlying position. CSP only does not own anything. It just keeps selling premium against a falling and recovering price, which is roughly a coin flip in net effect.

Metric	Classic Wheel	CSP only
CAGR	18.40%	12.30%
Max drawdown	-22.3%	-19.9%

What CSP only does give you is a tighter, more predictable drawdown profile because you never carry an underwater stock position. The recovery time is also shorter. So there is a use case, but it is not the CAGR maximizing answer.

07 Strategy 3: adding a trend filter

One obvious refinement to the wheel is to avoid selling puts during obvious downtrends. The classic problem with cash secured puts is that they make you a catcher of falling knives. If QQQ is in a clear bear market, selling puts means agreeing to buy at a strike that is likely to be far above future spot. Painful.

So I added a simple momentum filter. Only open new puts when the underlying is above its 50 day simple moving average. If it falls below, I hold existing positions to expiration but do not open new ones. This is the "wheel plus momentum" variant.

Metric	Classic Wheel	Wheel + Momentum
CAGR	18.40%	16.67%
Max drawdown	-22.3%	-22.8%

On QQQ, the trend filter slightly reduces CAGR (because you skip some of the recovery rallies that begin while the SMA is still below), but it tightens drawdowns. The bigger payoff for momentum filtering, as I later found, is on the leveraged ETFs where bear regimes are existential.

Equity curves of the three wheel variants on QQQ from 2010 to 2024. Buy and hold dominates the basic wheel because the bull market punished the covered call leg.

Drawdown profile. The wheel variants have smaller worst drawdowns than buy and hold but not dramatically so.

At this point in the project I was disappointed. The basic wheel strategies I had been testing were not really beating buy and hold on a clear risk adjusted basis. I needed to think differently.

08 Strategy 4: the short strangle, the breakout moment

The short strangle is structurally different from the wheel. Instead of selling a put and waiting to potentially own shares, you simultaneously sell an out of the money put AND an out of the money call. You never own the underlying. Both legs are cash settled at expiration.

The short strangle payoff at expiration. You collect both premiums up front, and as long as the stock stays between the two strikes, both legs expire worthless and you keep everything. Loss begins when the stock moves outside either strike.

What makes the strangle interesting is that it has no upside cap from a covered call leg. Once shares get assigned in the wheel, you cap your upside. In a strangle, you never own shares. The trade off is that you have unbounded loss potential on the call side (technically the stock can rocket to infinity) and bounded loss on the put side (a stock cannot fall below zero).

I ran a 30 delta strangle at 14 days to expiration on QQQ.

Metric	QQQ Buy and Hold	Strangle 30d / 14DTE
CAGR	18.51%	20.70%
Sharpe ratio	0.74	1.55
Maximum drawdown	-35.1%	-19.3%

This was the first strategy that beat buy and hold meaningfully on both CAGR and drawdown. The Sharpe ratio jumped substantially because the strangle's equity curve is much smoother than QQQ's. Each cycle collects two premiums (put plus call) and as long as QQQ stays between the strikes (which it does most weeks), the strategy makes money.

The strangle has been a known retail strategy for decades but is less commonly written about than the wheel. I think this is partly because the wheel sounds friendlier (you "own the stocks you like"), and partly because strangles have a scarier risk profile on paper (unbounded call side loss). In practice, the unbounded loss is a theoretical concern that almost never matters at reasonable position sizes, and the CAGR advantage is real.

09 Tuning delta and DTE

Once strangles were the focus, the next question was parameter tuning. Two main dials: delta and days to expiration.

Delta controls how close to the current price your strikes are. A 30 delta strike is about 30 percent likely to finish in the money, so it is moderately out of the money. A 45 delta strike is much closer to the current price and collects much more premium but also has higher probability of being tested.

Days to expiration controls how often the strategy cycles. A 14 day to expiration strategy cycles every two weeks. A 7 day strategy cycles weekly. A 3 day strategy cycles essentially three times a week.

The theoretical tradeoff is that shorter DTE captures faster theta decay (the premium decays nonlinearly toward expiration) but adds gamma risk (the option's sensitivity to underlying moves accelerates near expiration). Higher delta captures more premium per trade but has higher probability of needing to be paid out.

I ran a walk forward sweep across delta in {0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45} and DTE in {3, 5, 7, 10, 14, 21} on QQQ. The clear winner was high delta combined with short DTE.

Three strangle variants on QQQ. Higher delta and shorter DTE wins decisively. The 45 delta 3 DTE variant produces the highest synthetic CAGR but also the most trade frequency.

Drawdowns of the strangle variants. Surprisingly, the aggressive 45/3 variant does not have meaningfully worse drawdowns than the milder versions.

Strangle config	CAGR	Sharpe	Max DD
30 delta / 14 DTE	20.70%	1.55	-19.3%
35 delta / 7 DTE	31.83%	2.25	-14.0%
45 delta / 3 DTE	48.88%	2.92	-20.7%

The 35 delta 7 DTE variant became my Tier 1 candidate. It captures most of the alpha of the more aggressive 45/3 variant but with materially less gamma risk and fewer trades. I considered this the strongest single sleeve candidate I could build on QQQ alone.

10 What 3x leveraged ETFs actually do to your portfolio

The next experiment was extending the same logic to leveraged ETFs. TQQQ is a 3x leveraged version of QQQ. SOXL is 3x leveraged semis. UPRO is 3x leveraged SPX. These instruments have famously high implied vol because their underlying daily moves are roughly 3x the index, which makes the premium yield enormous.

The catch is that they have volatility decay. When the underlying chops sideways, the 3x ETF loses money even though the underlying is flat. This is just compounding math. A 5 percent up day followed by a 5 percent down day leaves the underlying at 100 minus 0.25 = 99.75. The 3x leveraged version goes up 15 percent then down 15 percent, leaving it at 100 times 1.15 times 0.85 = 97.75. The decay is roughly the square of the daily move.

I ran the wheel and strangle on the three leveraged ETFs to see what would happen.

Solo leveraged ETF strategies vs QQQ buy and hold. The CAGRs are enormous but so is the volatility.

Drawdown profile of leveraged sleeves. All hit drawdowns of minus 55 to minus 70 percent at various points.

Solo leveraged sleeve	CAGR	Sharpe	Max DD
TQQQ wheel 35/14	38.25%	0.78	-81.6%
SOXL wheel + momentum	37.49%	0.86	-60.5%
UPRO strangle 35/14	32.78%	1.00	-57.1%

The headline numbers were tempting. 41 percent CAGR on SOXL is the kind of return that gets you on the cover of magazines, if it survives validation. But the drawdowns were brutal. SOXL had a worst peak to trough drawdown of nearly 65 percent. When I later ran these through Monte Carlo, the worst random 2 year window on UPRO strangle hit minus 68.7 percent. That is one bad week from breaching my "permanent impairment" threshold.

The honest verdict on solo leveraged sleeves. They do not pass my deployment constraint. Worst case drawdowns of minus 60 to minus 70 percent are in the danger zone where mean reversion is no longer guaranteed. I retired them as standalone strategies. But they turn out to be incredibly useful as components inside diversified portfolios, which is the next section.

11 Diversification: the cross asset surprise (and correction)

The next idea was diversification. If selling premium on QQQ produces 22 percent CAGR with a 21 percent drawdown, what if I spread the capital across different asset classes that have low correlation? Bonds, gold, healthcare, all of which sell premium with their own independent paths.

I built a portfolio I called the cross asset core. 45 percent QQQ strangle, 20 percent GLD strangle, 20 percent TLT strangle, 15 percent XLV wheel. Four sleeves on four different asset classes.

Metric	QQQ strangle 30/14 solo	Cross asset core
CAGR	20.70%	15.82%
Sharpe ratio	1.55	1.56
Max drawdown	-19.3%	-14.3%

The Sharpe ratio held nearly identical while the max drawdown collapsed. This looked like a free lunch from diversification. But the historical CAGR was suspicious.

When I later ran the cross asset core through independent block bootstrap Monte Carlo, the CAGR dropped from the historical 16 percent down to about 3 percent. The reason became obvious. From 2010 to 2024, gold had a strong overall uptrend (post 2018 rally) and long bonds had a multi year bull market until 2022. Those drifts contributed most of the cross asset core's CAGR. The strangle premium alone, stripped of the underlying drift, is not enough to compound meaningfully on bonds or gold.

So the cross asset core is an interesting smooth equity curve but the realistic forward return is modest, in the 3 to 5 percent range. It became a "defensive bench" option in the final tier structure but not a primary deployment candidate.

12 The 60/20/20 blend

Diversification that did not depend on a one time historical drift was the next idea. Instead of mixing asset classes, I mixed risk profiles within the same general equity exposure. The structure is 60 percent in the QQQ strangle from Tier 1, 20 percent in a SOXL wheel with momentum filter, and 20 percent in a UPRO strangle.

The math works because the three sleeves have correlated but not identical drawdown timing. When semis are dragging through a cyclical down phase, the broad SPX or QQQ might still be flat. So the portfolio max drawdown is much smaller than any individual sleeve.

Sleeve (solo)	Max drawdown solo
QQQ strangle 35/7	-14.0%
SOXL wheel + momentum	-60.5%
UPRO strangle 35/14	-57.1%
60/20/20 blend	-34.3%

The portfolio drawdown is roughly minus 34 percent, compared to minus 65 percent on SOXL alone or minus 68 percent on UPRO alone. The diversification math is doing the heavy lifting. The CAGR of the blend, however, only modestly exceeds the QQQ strangle alone, because the SOXL and UPRO sleeves only contribute their share of the return.

This was the moment I realized leveraged sleeves are not bad strategies. They are bad standalone deployments. Inside a properly weighted blend, the leveraged sleeves' high CAGR contribution is muted by the QQQ core, but their tail risk is also muted. The blend approximately preserves the Sharpe ratio of the components while reducing the worst drawdowns dramatically.

13 The regime allocator breakthrough

The cleanest insight in the entire project came when I realized that leveraged ETFs have systematically different return profiles in different market regimes. When the underlying is in a clean uptrend, the leveraged ETF compounds beautifully. When the market is chopping or in a bear phase, the leveraged ETF loses to volatility decay.

So I built a regime allocator. It runs two sub portfolios in parallel:

The safe sleeve. 100 percent in the Tier 1 QQQ strangle. Modest returns but very low drawdown.
The aggressive sleeve. 60 percent QQQ strangle (high delta short DTE), 20 percent SOXL wheel with momentum filter, 20 percent UPRO strangle. Higher returns but bigger drawdowns.

The signal is brutally simple. Take the closing price of QQQ and its 20 day simple moving average. If price is above SMA, run the aggressive sleeve. If price is below SMA, fall back to the safe sleeve. Reassess daily.

The intuition is that the 20 day SMA catches major regime transitions within a few days. When QQQ tips into a clear downtrend (typically the precursor to a leveraged ETF drawdown), the allocator switches to the safer Tier 1 strategy and waits. When QQQ recovers above its SMA, the allocator re engages the leveraged sleeves to capture the recovery rally.

Metric	Tier 1 (QQQ 35/7)	Tier 2 (60/20/20)	Tier 3 (regime)
CAGR (full backtest)	31.83%	33.24%	64.85%
Sharpe ratio	2.25	1.54	3.76
Max drawdown	-14.0%	-34.3%	-14.7%

The three portfolio designs vs QQQ buy and hold. The regime allocator lifts CAGR substantially while keeping drawdown competitive with the much simpler Tier 1 strategy.

Drawdown profile of the portfolios. The regime allocator's drawdown profile is barely worse than the conservative Tier 1, despite delivering far higher CAGR.

The regime allocator is the moment in this research where the numbers started to look almost too good to be true. I will be transparent about that in the validation sections below, and the live calibration section will further temper the headline estimate.

Part four. Does it actually work?

14 Walk forward, Monte Carlo, and the block bootstrap

The numbers above are all in sample. Every backtest claim has to survive proper out of sample validation. I used three different methodologies.

Walk forward across 11 windows

Walk forward validation works like this. I split the historical data into multiple non overlapping test windows. For each window, I treat it as the "future" and check whether the strategy delivers positive returns. If a strategy only works in one specific window, it is overfit. If it works in many independent windows, it has structural alpha.

I ran 11 rolling 4 year test windows starting in 2011 through 2024, sliding by one year. Every portfolio design produced positive CAGR in every window.

Walk forward CAGR per design per test window. Every cell is positive. Every strategy survived every window including COVID and 2022.

Historical Monte Carlo

The next test was random start dates. Instead of January aligned windows, I picked 300 random business days as start dates across 2012 to 2022, ran each strategy for a 2 year window from that start, and tracked the outcome. This catches the "what if I deployed at a random Tuesday" scenario rather than the cleaner walk forward setup.

Across 300 random starts on the Tier 1 strangle, the worst 5 percent of starts still produced 14.7 percent CAGR. Zero starts produced a drawdown worse than minus 21 percent. Zero blowups across the entire sample.

Independent block bootstrap Monte Carlo

The historical Monte Carlo overlaps windows, which means it is not statistically independent. The bull market of 2012 to 2024 is sampled many times. To get a genuinely independent assessment, I used a stationary block bootstrap. This resamples blocks of consecutive daily returns from history with random block lengths (averaging 20 days), producing fully synthetic price paths that preserve volatility clustering and cross asset correlation but completely randomize the temporal sequence.

Distribution of 2 year CAGRs across 500 independent block bootstrap paths per strategy. Vertical lines mark median and 5th percentile. No strategy produced any blowup across all 2,500 backtests.

The independent Monte Carlo dragged the median CAGRs down meaningfully versus the historical Monte Carlo. The Tier 1 strangle went from 29 percent historical median to 19 percent independent median. The reason is that the historical bull market sequencing was lucky for short volatility strategies. Once that sequencing is scrambled, the realistic forward return is lower.

The drawdown character was unchanged. The worst paths in the independent Monte Carlo were not deeper than in the historical Monte Carlo. So the safety profile was preserved while the CAGR estimates were corrected downward.

15 The actual stress test: dot com and 2008

None of the above tests included the major crashes. QQQ went down 83 percent from 2000 to 2002. SPX went down 55 percent in 2007 to 2009. Those are the regimes that destroy retail strategies. I needed to know if my strategies could survive them.

So I extended the yfinance dataset back to 1999 (QQQ's inception) and re ran the Monte Carlo with starts during the dot com era and again during the GFC.

Median CAGR per strategy across three eras. Buy and hold lost roughly 22 percent annualized during the dot com era and underperformed dramatically in the GFC. Premium selling strategies remained positive in both crashes.

The dot com result was the most striking finding in the entire project. While QQQ buy and hold lost 22 percent annualized over the dot com era (with a 57 percent blowup rate across random starts), the QQQ 35 delta 7 day strangle produced 88 percent median CAGR. Yes, 88 percent.

Why selling premium worked during dot com. The implied vol during 2000 to 2002 was enormous. VIX was sustained at 30 to 50 for years. That means the premium I collected on every strangle sale was correspondingly enormous. Every week the strategy was selling expensive insurance that the market was paying a fortune to receive. Even though QQQ collapsed 83 percent, the strangle was making the bulk of its profits between the strikes (which is where QQQ actually traded most weeks, despite the long term decline).

I want to be honest about one caveat. The 88 percent CAGR is almost certainly overstated by real world friction. During the 2000 to 2002 period, bid ask spreads were dramatically wider than today, implied vol did not perfectly track realized vol, and the option markets were less liquid. My synthetic engine assumes perfect execution. A realistic live estimate is probably 30 to 50 percent CAGR rather than 88 percent. But the direction is unambiguous. Short volatility strategies are structurally robust to crashes in ways that buy and hold is not.

16 Everything I tried that did not work

Most of the project was failed experiments. Worth documenting because the negative results are educational.

Stop losses on TQQQ wheel

The obvious refinement is to add a stop loss to limit drawdowns. I tested closing any short put position when its mark to market loss reached 2x or 3x the original premium received, on the TQQQ wheel strategy. The result was counterintuitive. Stop losses bumped CAGR up from 31 percent to 38 percent but made the max drawdown worse, from minus 57 percent to minus 82 percent, and tripled the worst recovery time.

The mechanism is subtle. Stops crystallize the loss at the trough, then the strategy redeploys at lower prices, capturing bigger percentage bounces on the recovery. CAGR goes up because you cycle through more capital faster. Drawdown gets worse because you actually realized the losses instead of riding through them. Bad under any reasonable risk model.

VRP filter

I tried gating entry on the current implied vol being above some percentile of its trailing year. The intuition is to skip the cheap premium regimes. In the synthetic model this had zero effect because my IV proxy is a constant multiple of realized vol, so it has no independent signal versus its history. The test was inconclusive because the engine cannot model the real signal we would need.

Iron condors with proper sizing

An iron condor is a short strangle plus protective long wings further out of the money. It bounds the max loss but reduces the net premium. When I sized iron condors by the same cash collateral as a strangle (rather than by the bounded max loss, which would let you over leverage), the CAGR came out around 7 to 8 percent. Too low to be worth the complexity. Iron condors really only outperform strangles if you can over leverage them on margin, which is a fundamentally different risk profile.

The SMA window red flag

When I tuned the regime allocator's SMA window, I tested 20 through 250 days. The CAGR was monotonically better at shorter windows. SMA 5 produced 85 percent walk forward CAGR. That is a red flag, not real alpha.

Walk forward CAGR as a function of the SMA window in the regime allocator. Shorter is monotonically better in the model. This is the model rewarding fast switching without modeling the rebalance friction that would bound it in reality.

The reason is that my engine does not model the transaction cost of switching between sleeve sets when the regime flips. In reality, every regime flip requires liquidating positions in one sleeve and opening positions in another, paying bid ask spread on every contract. At SMA 5, the strategy would flip dozens of times per year, eating all the alpha in transaction costs. The model rewards faster signals only because it does not see the cost of using them.

Tail hedge insurance

I tested spending 0.5 percent of equity per month buying 5 delta puts at 60 days to expiration on QQQ. The idea is to insure against fast crashes. The result was instructive.

During the GFC era, the tail hedge added 9 to 13 percentage points of CAGR to every strategy because the sharp 2008 gap downs let the puts pay off massively. During the dot com era, the tail hedge cost 5 to 7 percentage points because the slow grinding bear meant the 60 day puts kept expiring worthless. So the tail hedge insures against fast crashes, not slow bears. And my option selling strategies do not need the hedge during slow bears because they collect rich premium throughout. The tail hedge is appropriate insurance for buy and hold portfolios, not for premium sellers.

Vol targeted position sizing

Vol targeting scales positions inversely to current implied vol so that the portfolio variance stays approximately constant. When IV is high, shrink positions. When IV is low, grow them. I tested this across all surviving strategies at multiple target levels.

The result is that vol targeting trades CAGR for drawdown roughly proportionally. The Sharpe ratio does not improve. It is a slider on the risk return curve, not a source of free alpha. Useful if you have a specific personal risk tolerance, not as a research finding. The one interesting exception is the 60/20/20 blend where vol targeting at 20 percent target pushed Sharpe from 1.97 to 2.51 (cutting max drawdown nearly in half), at the cost of 12 percentage points of CAGR. That is a real Pareto move, not a slider, and could be a more conservative variant of Tier 2.

17 Going live: the paper trading harness

Synthetic backtests can only get you so far. The final and most important validation is live data. I built a paper trading harness that runs the strategies against actual yfinance option chains, logs both the actual market bid and my model's predicted price for every trade decision, and continues collecting data continuously.

The harness samples every 30 minutes during market hours, logging mark to market data on all open positions. At the close of each market day, it performs full processing: settles any expired positions and opens fresh trades for the next cycle. The infrastructure runs as a continuous daemon and can be left on a dedicated machine for weeks or months without supervision.

What the first day of live data showed

The very first paper trading session produced an important calibration finding. The QQQ 35 delta 7 day strangle that day involved selling a put at strike 723 (delta minus 0.35) and a call at strike 738 (delta plus 0.35) for an expiration 7 days out.

Leg	Market bid	My model's prediction	Gap
Put at 723 (delta -0.35)	$4.90	$5.13	+4.6%
Call at 738 (delta +0.35)	$4.07	$5.46	+34.0%

The put leg was within 5 percent of my model's prediction. Good. The call leg was overpriced by 34 percent. That is volatility skew showing up in real life. Markets price puts at higher implied vol than calls because hedging demand is concentrated on the downside (people pay extra for downside insurance, not upside leverage). My engine uses a single sigma for both legs, so it systematically overestimates call premium.

Implication for live CAGR estimates. Since the call side contributes roughly half of the strangle's total premium, a 34 percent overestimate on calls implies my backtested CAGR is inflated by roughly 10 to 15 percent. The Tier 1 synthetic estimate of 19 percent CAGR (from the independent Monte Carlo) likely corresponds to 16 to 17 percent CAGR in real markets after the skew correction. Still strong. Just honest.

The paper trading harness is still running and continues to collect data. I will revisit these estimates after 4 to 8 weeks of live observations.

18 Final synthesis and what I would actually deploy

Three operating points, ranked by my conviction level.

Tier 1 (high conviction, deployable now)

QQQ 35 delta 7 day strangle, single sleeve. Sell a 35 delta put and a 35 delta call expiring next Friday, every Friday. Cash settle at expiration. Size by the put strike cash collateral requirement.

Walk forward median CAGR: 28.5 percent (synthetic) which adjusts to roughly 16 to 19 percent after the call skew correction. Max drawdown across 11 walk forward windows: minus 20.8 percent. Worst recovery time: 144 days. Zero blowups across 2,500 Monte Carlo paths including the dot com and GFC eras.

Tier 2 (medium conviction, paper trade first)

60 percent Tier 1, 20 percent SOXL wheel with momentum filter, 20 percent UPRO strangle. Three sleeves running in parallel, each with their own allocated capital.

Walk forward median CAGR: 32 percent (synthetic), which adjusts to roughly 22 to 28 percent after live frictions. Max drawdown: minus 35 percent. Worst recovery: 12 months. Still zero blowups in Monte Carlo because the diversification between sleeves dampens any single sleeve's tail.

Tier 3 (low conviction, do not deploy without serious validation)

Regime allocator. Tier 1 when QQQ is below its 20 day SMA, more aggressive 60/20/20 when QQQ is above.

Walk forward median CAGR: 44 percent (synthetic), which adjusts to roughly 25 to 40 percent live with a wide uncertainty band. The 3 day to expiration leveraged ETF options in the aggressive sleeve have severe model versus reality gaps that paper trading is currently quantifying. Need 4 to 8 weeks of live data before risking real capital.

Every strategy I tested plotted as CAGR against maximum drawdown. The frontier I want to operate on is the upper left. The three tiered recommendations all sit on or near that frontier.

19 What this research cannot answer

I want to be explicit about everything the synthetic engine does not model, in case anyone tries to use these results directly without the live calibration step.

Volatility skew. My model uses a single sigma for both put and call legs. Real markets price puts richer than calls. The paper trading harness measured this gap at 34 percent on the call side. My backtested CAGR is inflated by roughly 10 to 15 percent because of this.
Bid ask spreads in crashes. During the actual 2008 selloff, spreads widened by 5 to 10 times their normal levels. My model uses a constant 1 percent slippage estimate.
Pin risk and assignment surprises. Settling cleanly at intrinsic value is a simplification. Real expiration days have weird behavior.
Gap moves. Overnight moves around Fed announcements, earnings, ECB decisions. The model treats all daily moves as smooth.
Liquidity holes. During the March 2020 COVID selloff, SOXL 3 day puts were effectively untradeable at any sane spread. My model assumes you can always exit.
The lucky era problem. 2010 to 2024 was an unusually favorable regime for short volatility. The independent block bootstrap corrects for this somewhat, but I cannot fully calibrate forward expectations.
Free data quality. yfinance occasionally has stale or missing strikes, and historical adjusted close prices have edge case errors.

The friction sensitivity analysis tried to bound some of this.

Even at 5x my baseline slippage and 20 percent IV noise, the strategies' CAGR barely degraded. This is encouraging but it only tests per trade friction, not the regime switch costs or IV decoupling that real crashes produce.

The honest reading is this. My backtested CAGRs are likely overstated by 5 to 15 percentage points. The drawdown character is probably accurate to within a few percentage points. The ranking of strategies is robust. The strangle beats the wheel. Diversified blends beat solo leveraged sleeves. Regime allocation beats static allocation. These conclusions are stable across every test I have run. The specific absolute numbers are conservatively estimated to be roughly 70 to 85 percent of the synthetic backtest in live markets.

Complete results table

Full backtest summary for every strategy discussed in this note. All values 2010 to 2024.

Strategy	CAGR	Sharpe	Max DD	Median recovery	Worst recovery
QQQ Buy and Hold (baseline)	18.51%	0.74	-35.1%	83d	716d
Wheel 35d / 14DTE	18.40%	1.10	-22.3%	75d	221d
CSP only 35d / 14DTE	12.30%	0.82	-19.9%	104d	400d
Wheel + momentum 35d / 14DTE	16.67%	1.08	-22.8%	76d	465d
Strangle 30d / 14DTE	20.70%	1.55	-19.3%	98d	202d
Strangle 35d / 7DTE (Tier 1)	31.83%	2.25	-14.0%	56d	141d
Strangle 45d / 3DTE	48.88%	2.92	-20.7%	49d	124d
TQQQ wheel 35/14	38.25%	0.78	-81.6%	28d	1111d
SOXL wheel + momentum	37.49%	0.86	-60.5%	24d	791d
UPRO strangle 35/14	32.78%	1.00	-57.1%	83d	315d
Cross asset core	15.82%	1.56	-14.3%	133d	155d
60/20/20 leveraged (Tier 2)	33.24%	1.54	-34.3%	72d	202d
Regime allocator (Tier 3)	64.85%	3.76	-14.7%	44d	111d

All code, raw data, walk forward windows, Monte Carlo trials, and the live paper trading harness exist as a runnable repository. The paper trader is collecting intraday data continuously across all six portfolio variants. I will revisit these estimates with real fills as ground truth once I have 4 to 8 weeks of live data.

How I tried to beat buy and hold with options, then proved it wasn't just a lucky backtest.

01 Where I started and what I was trying to beat

02 What an option actually is, and why selling them pays you

Why selling pays positive expected value

03 The volatility risk premium, measured

04 Building the backtest engine

05 Strategy 1: the classic wheel and why it disappointed me

How the classic wheel works

The 14 year backtest result

06 Strategy 2: what happens if you skip assignment

07 Strategy 3: adding a trend filter

08 Strategy 4: the short strangle, the breakout moment

09 Tuning delta and DTE

10 What 3x leveraged ETFs actually do to your portfolio

11 Diversification: the cross asset surprise (and correction)

12 The 60/20/20 blend

13 The regime allocator breakthrough

14 Walk forward, Monte Carlo, and the block bootstrap

Walk forward across 11 windows

Historical Monte Carlo

Independent block bootstrap Monte Carlo

15 The actual stress test: dot com and 2008

16 Everything I tried that did not work

Stop losses on TQQQ wheel

VRP filter

Iron condors with proper sizing

The SMA window red flag

Tail hedge insurance

Vol targeted position sizing

17 Going live: the paper trading harness

What the first day of live data showed

18 Final synthesis and what I would actually deploy

Tier 1 (high conviction, deployable now)

Tier 2 (medium conviction, paper trade first)

Tier 3 (low conviction, do not deploy without serious validation)

19 What this research cannot answer

Complete results table