Model performance
Every forecast is frozen the moment a result is recorded, then scored out-of-sample. Brier, log loss and RPS are all lower-is-better. The market rows benchmark against sportsbook odds (POST them per fixture to /api/odds — sharp closing lines like Pinnacle's are the toughest public baseline; matching them is the realistic target, and the published blend is designed to be at least as sharp as either input alone.
All completed matches
3 matches with stored forecasts
| Source | N | Brier | Log loss | RPS |
|---|---|---|---|---|
| Model (Elo + Poisson) | 3 | 0.5675 | 0.9389 | 0.1716 |
| Blend (published) | 3 | 0.5675 | 0.9389 | 0.1716 |
Matches with market odds
Pinnacle lines are stored for all upcoming fixtures — this fills in as the first of them finishes
No data yet.
Predicted vs actual scores
Bar = goals actually scored · tick = pre-match expected goals (frozen at kickoff). Deviations feed the auto-calibration below.
Matches
3
Outcome calls
1/3
Exact scores
2/3
Goals act / pred
7 / 8.4
MAE (goals)
0.49
Bias
-0.23
Auto-calibration — expected goals are currently scaled ×0.960, re-fit from these deviations after every recorded match (shrunk toward 1.000 by a 10-match prior, capped at ±25%), and applied to every forecast and tournament simulation.
Match-by-match
Probability each source gave to the outcome that actually happened (higher = better call)
| Match | Score | Model | Market | Blend |
|---|---|---|---|---|
| 🇨🇦CANv🇧🇦BIH | 1–1 | 25.3% | — | 25.3% |
| 🇰🇷KORv🇨🇿CZE | 2–1 | 33.5% | — | 33.5% |
| 🇲🇽MEXv🇿🇦RSA | 2–0 | 70.4% | — | 70.4% |