Skip to main content

2 posts tagged with "Retrospective"

Post-mortems of incidents, bugs, and failures — what happened, why, and how it was fixed.

View All Tags

Four training runs, zero shipped weights — bisecting v0.5.0's divergence

· 11 min read
Teffen Ellis
Sister Software
If you found this via search

Mailwoman is an open-source address parser. This post is a training log entry from May 2026 documenting the v0.5.0 divergence investigation. For current project status, see what ships today.

v0.5.0 was the fresh-slate ship: new tokenizer, expanded corpus, new architecture, new reconcile stage. The plan was to bundle several months of structural improvements into one big iteration and pay the cost once. Most of it landed clean. The classifier didn't.

This post walks through the four training attempts the v0.5.0 C-train made overnight, the bisect that ruled out three plausible explanations, and what we think the remaining culprit is. It's a sister piece to the v0.4.0 retrospective — same shape of failure, different diagnostic ladder.

Five training runs, one shipped checkpoint — what we learned from v0.4.0

· 10 min read
Teffen Ellis
Sister Software
If you found this via search

Mailwoman is an open-source address parser. This post is a historical retrospective on the v0.4.0 training campaign (May 2026). For current project status, see what ships today.

@mailwoman/neural-weights-en-us@v0.4.0 (and the fr-fr sibling) shipped today as packaged artifacts (the npm publish is a separate step we do by hand). It is a mixed-result release: one clear win on fine-grained labels, two regressions on coarse labels that turned out to be mostly artifacts of how we measured. Almost everything we set out to do — combine three orthogonal training improvements into one ship — was empirically falsified by a divergence pattern we hadn't seen before.

This is a writeup of how the campaign went. We're publishing it for two reasons: to be honest about what the headline numbers mean, and because the way the failures stacked up is worth thinking about if you train your own NER-style models.