In power markets, forecasts are made with imperfect information by definition. Generators bid based on information known only to them, the weather unfolds in ways we don’t expect, strategies are tuned and optimized. Meanwhile the data that captures all of this arrives after the market intervals are complete, settles over hours or days, and in some cases never becomes public at all.
In "Anatomy of a Backtest," we drew a distinction between two ways to evaluate a model that coincide with varying data completeness levels:
- The Perfect Information Backtest: baseline evaluation using every observation available today, essentially measuring the model under “perfect” hindsight. We call it perfect information because many inputs including generator offers, wind/solar and load conditions are made available after the market decision and these are used in the perfect information backtest. The approach still has limitations, since we do not receive all of the information the market uses to run its model.
- The Point-in-Time-Correct Backtest: a stricter approach that limits the model to only the information available at the exact moment decisions are made, mimicking the real world. Here we use only renewable and load forecasts available at decision time, published planned outages, and our best assumptions about bidding behavior.
The difference between results is the perfect-information gap: the advantage of hindsight. In this post, we explore the data sources where this gap originates, and describe why a point-in-time model can never fully close it. We then trace how much the gap costs once a model meets real dispatch. Our goal at Distill is to acknowledge the gap while building tools that help you narrow it. We'll discuss the methods we use in the conclusion.
Sources of Imperfection
The gap has multiple sources. Each carries its own uncertainty, and together they compound. We'll cover three ERCOT examples to make this concrete: the load forecast, the unplanned-outage record, and renewable curtailment.
Load Forecasting
The ERCOT Mid-Term Load Forecast (MTLF) is a collection of forecasting models that combines weather forecasts across a variety of variables and temporal patterns to estimate future system load. It is published every hour, 168 hours into the future.
First, measured against actuals from January 2024 through June 2026, we observe ERCOT's MTLF for system load degrades with lead time:1
| Lead time | MAPEMean absolute percentage error | MAEMean absolute error (MW) | RMSERoot mean squared error (MW) | BiasMean signed error (MW) |
|---|---|---|---|---|
| 1 hour | 0.82% | 452 | 643 | +62 |
| 1 day | 2.20% | 1,203 | 1,686 | +30 |
| 2 days | 2.48% | 1,358 | 1,887 | −15 |
| 7 days | 4.20% | 2,317 | 3,235 | −258 |
Mean absolute error triples from one hour to one day out and grows roughly fivefold by seven days, averaging 2,317 MW.
At one hour out, the 0.8% MAPE holds steady across the dataset; longer windows show more variation.
Average load is relatively easy to forecast several days out because broad trends capture the signal and positive and negative errors tend to cancel. Peaks are different. Peak demand occurs during rare combinations of weather and human behavior where small forecasting errors are amplified, sometimes in ways the system hasn't seen before. Industrial loads, crypto miners, and data centers all respond to potential 4CP intervals. Demand response and load-side storage engage more aggressively during extreme events. As a result, ERCOT's MTLF performs consistently across all hours, but the highest-load periods show a characteristic pattern: slight over-forecasting within the day as operators incorporate the latest information, and growing under-forecasting beyond roughly 24 hours as the uncertainty around extreme weather widens.
In short, ERCOT's load forecast is accurate near real time but degrades with lead time. At the highest-demand hours, it slightly over-forecasts within the day and under-forecasts at longer leads.
Unplanned Outages
Load is one side of the equation; supply is the other. Generators fail without notice. During our study period 20 to 33 GW of ERCOT capacity is on unplanned outage at any given time. These outages are unplanned and unavoidable. A perfect-information backtest knows exactly which units were down on each historical day. The unplanned outage report is published three days after the market date, so a point-in-time forecast has to operate without that information.
The baseline follows a predictable pattern. Unplanned outage events are lowest in late summer, when generators run hard to meet peak demand, and highest in the shoulder months of spring and fall. In those months, operators tend to pull a plant offline as soon as a mechanical problem appears, while demand is low enough that the lost output costs little. Models anticipate that seasonality, but extreme events break the pattern: Winter Storm Fern occurred in January 2026, a month that averages 24 GW on outage, and pushed the figure past 50 GW in a matter of days.
The generator outages are not spread evenly across the fleet. Natural gas makes up the bulk of forced outages (this partly reflects the size of the gas fleet itself) well ahead of wind, solar, and coal. That matters because gas is often the marginal fuel: when a gas unit trips, the system loses the dispatchable capacity it relies on to balance and meet the last MW of demand.
Renewable Curtailment
Wind and solar add a second supply-side source of uncertainty. Their output varies interval to interval, and not all of it reaches the grid. Some gets shut off by the generator itself when wholesale prices fall below the resource's economic offer or operating incentive (economic curtailment). Some gets shut off by the grid operator when localized transmission cannot deliver the energy where it is needed (congestion curtailment). Neither is known in advance. Curtailment is an output of the market itself: interval by interval, the dispatch solution decides which renewable energy the grid can absorb and which gets cut.
This presents a major trap for backtesting. If you evaluate a trading or battery dispatch model using actual, historical generation data, you are training it on a post-processed grid. A naïve backtest looks at a curtailed wind farm and assumes the wind simply died. It largely misses the structural signal: that the node was experiencing localized congestion or impacted by a negative price signal. To build a valid point-in-time model, you cannot just download historical generation; you have to reconstruct the available capability of the fleet and simulate the dispatch solution that reduced it.
Across our observation period, ERCOT curtailed 22.6 TWh of wind and solar: 73% economic, the market pricing the energy at or below the resource's own offer floor, and 27% congestion, the network lacked capacity to move the power.2
Curtailment is also concentrated. More than half occurs in West Texas and the Panhandle, where the wind and solar fleet is densest, and even there it is predominantly economic: generators backing off when local oversupply drives wholesale prices below what they can profitably clear. For a trader or a co-located battery model, this concentration matters. A backtest that treats wind output as a weather forecast, rather than as a price-driven actual that gets curtailed during negative-price intervals, will overestimate supply in those same hours and miss the exact intervals when curtailment kicks in.
So what: the cost of the gap
Each of the three inputs can create the same risk and exposure. Load that arrives above forecast, a generator that trips, curtailment that comes in below what the model assumed: each leaves a gap between the supply the system planned for and the supply it has. The cost of that gap depends on two things: how big the error is, and what the system has to do to close it.
The least-expensive energy on the system, from nuclear, coal, and combined cycle, is committed hours in advance based on the forecast. ERCOT's telemetry shows that coal takes about five hours to reach full output once started, and combined cycle about fifty minutes. When a supply shortfall occurs in real time, the capacity to replace the missing MW has to come online immediately. That shortfall is therefore covered by resources that can move in minutes: small gas, diesel, batteries, and hydro. These resources are limited in size and available energy, and are often offered at a premium.
The value of closing the gap depends on the current state of the grid and demand. The supply curve is not flat. At 50 GW of load, an extra 100 MW moves the system price about 20 cents per MWh, and an extra gigawatt about 2 to 3 dollars. Near the summer peak, around 80 GW, the same 100 MW moves the price about 80 cents and a gigawatt close to 9 dollars. Several times steeper. In true scarcity, with reserves thin and shortage pricing active, the curve turns nearly vertical and the same megawatts move price by orders of magnitude more. The Merit Order Explorer lets you see this directly.
The effects compound, and they almost always compound in the same direction. When peak demand is under forecast, the units that could cover it cheaply take hours to start. Each remaining megawatt is priced from the steepest part of the stack. A perfect-information backtest is unaware of this because it commits the right units ahead of a peak it already knows. The distance between that and what a real forecast can do is the cost of the gap, and it is largest when the grid is tightest.
Countering the gap
The gap cannot be closed: no model can know what hasn't happened yet. But it can be made smaller, priced, and hedged. The advantage goes to anyone who treats uncertainty as a planning input rather than noise to reconcile after the fact. Here are some components we use at Distill to navigate the gap.
- Forecast distributions instead of point estimates. A single load or generation forecast commits to one future, and it is often wrong at the peaks. A calibrated distribution of load, renewable output, and outage scenarios lets you plan against the tail rather than the median. The supply stack steepens sharply at high load, which makes the tail the part of the distribution that drives cost. Plan around the tail, not the median.
- Reconstruct inputs point-in-time. Evaluate models using only the information that existed at each point in time, ensuring reported accuracy matches real-world performance and captures the perfect-information gap. That means versioning every input with the timestamp it became available, then replaying history with only what was known at the moment of decision.
- Simulate dispatch across the scenarios. Identify which variables have the most influence on the result, run each scenario through a physics-based OPF, and ensure every input ties directly to the outputs it produced. The result is a distribution of prices and dispatch rather than a single number, and any price in that distribution can be decomposed into the factors that drove it. The risk in a position is visible before it is taken.
- Scale with cloud-native tooling. Outage outcomes span a wide range; running thousands of scenarios in parallel captures that range rather than collapsing it to a single path.
- Keep the tooling approachable. Inputs that are easy to see and adjust beat a black box that turns every update into a project.
Our opf-studio has these capabilities. You can see simple examples on our lab page, and if you’re interested in running this type of analysis with real ERCOT data we’d love to hear from you.
Footnotes
1. System load is the
systemTotalseries in ERCOT's weather-zone forecast, matched to theERCOT_WZweather-zone actual. We limit the forecasts included to those tagged in use by ERCOT. ↩2. Curtailment is defined per dispatch interval as available capability minus base point, beyond a 2 MW / 1% noise floor, for online intermittent resources, with energy computed from true per-interval durations. The economic-versus-congestion split compares the real-time price at the resource's settlement point to its offer floor. About 0.5% of curtailed energy remains unmapped after reconstruction. ↩