Retrospectives
Post-ship engineering writeups. Each entry covers one iteration: what we tried, what failed, what we learned, what we'd do differently. Audience is the future engineer (often us) debugging the next iteration.
The blog (/blog) carries lighter narrative writeups for a general audience. These articles are the deeper-dive companion โ pin-down-the-failure-mode level, with enough trace + numbers + config diffs to be useful in three months.
Indexโ
- v0.4.0 ablation campaign โ the verdict-smoke meta-bug and what we shipped instead โ five training divergences, one decoder sidecar, three deferred work areas. 2026-05-23.
- v0.6.x cycle retrospective + v0.7 plan โ three corpus-recipe iterations couldn't find a stable configuration; the deeper finding (calibration crisis + wrong release metric + postcode tokenizer fragmentation) reframes v0.7. 2026-05-29.
- v0.7โv0.8: the neural parser vs the rules baseline (two scoreboards) โ why neural scores 20.7% on our Pelias-lineage harness yet beats Pelias 97.3% vs 95.8% on real addresses; the three-arena decomposition; the MLM pre-training negative result; forward plan ranked (street-level geometry first). 2026-05-31.