Skip to main content

4 posts tagged with "Architecture"

Articles describing Mailwoman's staged pipeline design, the Knowledge Ladder, and how the stages compose.

View All Tags

A lookup table scored 100%. We shipped the model anyway.

· 5 min read
Teffen Ellis
Sister Software

This morning we published a post that ended with a tidy rule: some address tags don't want a neural network, they want a lookup table. Country names are a closed list in a known position. Our deterministic matcher scored a perfect 100 on the eval. The retrained model scored a mess. Case closed, we wrote.

By the afternoon we'd reopened the case, and the verdict flipped — hard enough that we've retracted the morning post rather than leave the wrong conclusion lying around for someone to cite. This is the story of how a perfect score nearly talked us out of the entire premise of the project.

Which way does a postcode point?

· 10 min read
Teffen Ellis
Sister Software

We left the last postcode story with a promise and a bill. The promise was that the "which country is this" signal has to come from the trained model reading the whole string, because the postcode on its own settles the question less than half the time. The bill was that this is the expensive version of the feature. This is the post where we paid it: we built the country signal into the model, watched it do something genuinely great, and then watched it refuse, in the most instructive way we've hit all month, to do that same great thing in a different word order.

The great thing first, because you've earned it. We took the postcode's gazetteer membership, that [us, de, fr] answer from last time, and instead of handing it to a regex we injected it into the model at the postcode token itself. A small additive nudge on the hidden state, right where the five digits sit, carrying "here is what this code could be." On German addresses written the way Germans actually write them, it was worth thirty-five points of locality accuracy. It beat Pelias. For one evening we were heroes.

Then we looked at the international numbers and the floor gave way. Same model, same anchor, the same German cities, but now written house-number-first with the postcode trailing the city, the way our test feed renders them, and it scored a hair above a coin flip. The hero anchor was, on those rows, slightly worse than no anchor at all.

Three questions sit under the rest of this, so let me put them on the table before we start:

  • When a parser "collapses" on a test, is the parser wrong, or is the test?
  • Can you train one model to read an address in any order, or does each order quietly cost you the other?
  • And the one that took three retrains to answer honestly: what does a learned anchor actually learn, the thing you asked for, or the shape of where you kept putting it?

The map runs out before the country does

· 11 min read
Teffen Ellis
Sister Software

We spent a good month teaching our resolver exactly one trick. Take a postcode, drop its centroid into the city polygon that happens to contain it, read off the city. It's a genuinely good trick. It got the Netherlands to 95% and Germany to 93%, and for a while it felt like the whole problem was going to fall to it. Then we pointed it at Japan, and Japan calmly informed us that it has no city polygons to drop anything into.

What follows is a two-country story about what a geocoder can still do when the map underneath it goes thin, and where it finally can't. Japan we resolved anyway, 94% of the way, by putting the polygon down and asking a different question. Korea handed the same problem back to us turned inside-out: it let us pin the coordinate perfectly, every time, and then stopped us cold at the one thing we were really after, which is the name of the place you've landed in.

Three questions sit under all of it, so let me put them on the table before we start:

  • What do you do when the gazetteer gives you points where you expected shapes?
  • Does the move that rescues Japan actually generalize, or did we get lucky once and dress it up as a method?
  • And the question with no comfortable answer: what happens when the map is simply missing the part of a country you most need to see?

Does a postcode know what country it's in?

· 8 min read
Teffen Ellis
Sister Software

We set out to fix a small wart in our address parser and came away with a number that told us to put the screwdriver down.

Here is the wart. When our postcode extractor sees a five-digit run and wants to know whether it's a real postcode or just a house number that happens to look like one, it peeks at the words sitting next to it and checks them against every country's street vocabulary we know — American, German, French, all at once. That "all at once" is fine at three countries. At twenty it gets loud, and a German street suffix starts shadowing an English word by sheer coincidence. So we went looking for the clean way to tell the extractor which country's words to bother with.

That question has a much bigger sibling, and chasing the sibling is where the story actually is.