Skip to main content

6 posts tagged with "Advanced"

Articles that dive deep into technical details, diagnostics, and design decisions — ideal for experienced practitioners.

View All Tags

Which way does a postcode point?

· 10 min read
Teffen Ellis
Sister Software

We left the last postcode story with a promise and a bill. The promise was that the "which country is this" signal has to come from the trained model reading the whole string, because the postcode on its own settles the question less than half the time. The bill was that this is the expensive version of the feature. This is the post where we paid it: we built the country signal into the model, watched it do something genuinely great, and then watched it refuse, in the most instructive way we've hit all month, to do that same great thing in a different word order.

The great thing first, because you've earned it. We took the postcode's gazetteer membership, that [us, de, fr] answer from last time, and instead of handing it to a regex we injected it into the model at the postcode token itself. A small additive nudge on the hidden state, right where the five digits sit, carrying "here is what this code could be." On German addresses written the way Germans actually write them, it was worth thirty-five points of locality accuracy. It beat Pelias. For one evening we were heroes.

Then we looked at the international numbers and the floor gave way. Same model, same anchor, the same German cities, but now written house-number-first with the postcode trailing the city, the way our test feed renders them, and it scored a hair above a coin flip. The hero anchor was, on those rows, slightly worse than no anchor at all.

Three questions sit under the rest of this, so let me put them on the table before we start:

  • When a parser "collapses" on a test, is the parser wrong, or is the test?
  • Can you train one model to read an address in any order, or does each order quietly cost you the other?
  • And the one that took three retrains to answer honestly: what does a learned anchor actually learn, the thing you asked for, or the shape of where you kept putting it?

The map runs out before the country does

· 11 min read
Teffen Ellis
Sister Software

We spent a good month teaching our resolver exactly one trick. Take a postcode, drop its centroid into the city polygon that happens to contain it, read off the city. It's a genuinely good trick. It got the Netherlands to 95% and Germany to 93%, and for a while it felt like the whole problem was going to fall to it. Then we pointed it at Japan, and Japan calmly informed us that it has no city polygons to drop anything into.

What follows is a two-country story about what a geocoder can still do when the map underneath it goes thin, and where it finally can't. Japan we resolved anyway, 94% of the way, by putting the polygon down and asking a different question. Korea handed the same problem back to us turned inside-out: it let us pin the coordinate perfectly, every time, and then stopped us cold at the one thing we were really after, which is the name of the place you've landed in.

Three questions sit under all of it, so let me put them on the table before we start:

  • What do you do when the gazetteer gives you points where you expected shapes?
  • Does the move that rescues Japan actually generalize, or did we get lucky once and dress it up as a method?
  • And the question with no comfortable answer: what happens when the map is simply missing the part of a country you most need to see?

Does a postcode know what country it's in?

· 8 min read
Teffen Ellis
Sister Software

We set out to fix a small wart in our address parser and came away with a number that told us to put the screwdriver down.

Here is the wart. When our postcode extractor sees a five-digit run and wants to know whether it's a real postcode or just a house number that happens to look like one, it peeks at the words sitting next to it and checks them against every country's street vocabulary we know — American, German, French, all at once. That "all at once" is fine at three countries. At twenty it gets loud, and a German street suffix starts shadowing an English word by sheer coincidence. So we went looking for the clean way to tell the extractor which country's words to bother with.

That question has a much bigger sibling, and chasing the sibling is where the story actually is.

Our parser fails 80% of our own tests. We shipped it anyway.

· 4 min read
Teffen Ellis
Sister Software

Our neural address parser passes 20.7% of our test suite. The rule-based parser it's meant to replace passes 93.7%. By that scoreboard, we should delete the neural model and go home.

We shipped the neural model instead. Here's why both numbers are true — and why the one that matters says the opposite.