The map runs out before the country does
We spent a good month teaching our resolver exactly one trick. Take a postcode, drop its centroid into the city polygon that happens to contain it, read off the city. It's a genuinely good trick. It got the Netherlands to 95% and Germany to 93%, and for a while it felt like the whole problem was going to fall to it. Then we pointed it at Japan, and Japan calmly informed us that it has no city polygons to drop anything into.
What follows is a two-country story about what a geocoder can still do when the map underneath it goes thin, and where it finally can't. Japan we resolved anyway, 94% of the way, by putting the polygon down and asking a different question. Korea handed the same problem back to us turned inside-out: it let us pin the coordinate perfectly, every time, and then stopped us cold at the one thing we were really after, which is the name of the place you've landed in.
Three questions sit under all of it, so let me put them on the table before we start:
- What do you do when the gazetteer gives you points where you expected shapes?
- Does the move that rescues Japan actually generalize, or did we get lucky once and dress it up as a method?
- And the question with no comfortable answer: what happens when the map is simply missing the part of a country you most need to see?
Japan has no city polygons
Quick recap for anyone who missed the last Japan post: a few weeks ago we pulled Japan's address hierarchy out of Who's On First and learned that Japanese addresses run backwards and have no street names at all. This is the sequel, the one where we try to actually resolve them.
The European recipe is point-in-polygon, and it's about as simple as geocoding gets. A postcode comes with a centroid. Who's On First gives you administrative polygons. You ask which locality polygon contains the centroid, and that's your city. Clean, fast, and it carried four European locales without complaint.
It gets Japan to 25%, and it took us an embarrassingly long while to see why, because the failure wears the costume of a tuning problem and is nothing of the sort. We went digging through WOF's Japanese geometry placetype by placetype, and the pattern repeats every time. The prefectures have polygons. The wards and sub-prefectures have polygons. The municipality — the 市区町村 level a postcode actually resolves to — is essentially all points. Not coarse polygons, not bad ones, just points: a latitude and a longitude with nothing to be inside of.
So point-in-polygon has nothing to contain, and no amount of fiddling with a containment test rescues a containment test when there are no containers. We checked Korea and Taiwan while we were down there, and they tell the identical story. The municipality layer across all three countries is dots on a map where Europe gave us regions. This is the shape of the whole problem, and it means the recipe we were so pleased with simply doesn't travel east.
You stop asking the polygon and start asking Japan Post
If you can't ask "which shape am I inside," you ask the postal authority something more direct: "what's the municipality for this postcode?" Then you go find that municipality in WOF by name. Japan Post publishes exactly that mapping in a file called KEN_ALL, and, crucially, a romanized edition whose municipality column reads SAPPORO SHI CHUO KU, in the same alphabet WOF uses for its romanized place names. Two romanized strings you can actually compare. That's the whole pivot.
Getting the file was its own small comedy. Every KEN_ALL download URL we had on record returned a 404. The replacements turned out to be gated behind JavaScript and a Japan-only fetch, so a plain script came home with a polite error page instead of data. And when the file finally arrives, it's CP932 (Shift-JIS) encoded, in the year 2025. We got there, and it carries the one thing WOF's own postcode hierarchy refuses to give up: the municipality, where WOF stops at the prefecture and leaves you a hundred kilometres too coarse.
The matching had one wrinkle worth knowing about. Japanese municipalities don't sit in a single WOF placetype. A regular city lands in locality, a ward in county or localadmin, a Tokyo special ward in borough. Match against just one of those and you cap out around 55%. Search all of them at once and you get 94.3% of postcodes matched to a real municipality, with end-to-end resolution landing between 94 and 98% depending on which gold set you grade against. Comfortably past our 85% bar, and the European locales come out of the change byte-for-byte identical, because the new path only fires for the countries that need it.
The same strategy, a build shaped to the country
Here's the part I want to dwell on, because it's the part that decides whether any of this scales. The Japanese build feeds the exact same resolver strategy the European one does. All we wrote was a Japan build: a different way of filling in the one table the resolver already reads. The resolver itself never changed, never even noticed. Postcode in, locality out, the same code path Amsterdam runs through.
That's the bet the whole "rule engine" design rests on: one strategy, and a per-country table that each country gets to populate however its data allows. Japan populates it by asking its postal authority for names. The question we hadn't answered was whether a genuinely different country, with genuinely different data, could populate that same table without us bolting on a pile of special cases. Which is where Korea comes in, and where the story stops being a victory lap.
Korea, the same trick inverted
Korea's data is the mirror image of Japan's, so the build came out mirrored too.
Japan made us go fetch the names from a postal authority. Korea hands them over for free: the GeoNames postal file for Korea already carries, in one place, the postcode, the place name, the province, and a latitude and longitude. No saga, no Shift-JIS. The snag is that the names are in Hangul (추자면), while WOF's romanized spr.name for Korea is some transliteration that may or may not line up. Matching Hangul against romaji goes nowhere, and that's exactly why Korea sat on our "blocked" list for a while.
It turns out that read was half right and gave up one step early. WOF doesn't only keep the romanized name. Its names table also carries Hangul, 13,120 native entries plus several thousand more filed under "undetermined language" that are Hangul all the same. So a Hangul-to-Hangul join is on the table after all. And because every Korean postcode arrives with a coordinate already attached, we could lead with the coordinate and treat the Hangul name as a second opinion. Korea's build is point-primary: take the postcode's coordinate, find the nearest WOF locality, confirm it by name where a name exists. A different first move from Japan, the same table out the other end, and the thing we were testing for, not one line of new resolver code.
On the parts that build resolves, it is excellent. WOF's Korean locality layer is dense, 21,139 of them, near enough one per village, so the nearest locality to a postcode sits a median of 0.96 km away. The province falls out for free and exact: GeoNames' province name matches WOF's Korean region name 17 times out of 17. Hand us a Korean address and we'll put it in the right province and within a kilometre of the right spot, on 100% of postcodes. For a coarse fix, that's money in the bank.
Where the map runs out
Then you ask for the administrative name, and the floor gives way. The name confirms on 26% of Korean postcodes. Japan was 94. Same method, same care, a third of the hit rate, and the whole gap is a story about what the map happens to hold.
Two things go wrong, and both earn their names because they tell you where to dig. The first is a granularity mismatch. GeoNames names a postcode at the eup/myeon/dong level, 추자면, Chuja-myeon. WOF's locality layer is one rung finer, down at the hamlet, so the nearest point to that postcode is a village called "Mung" sitting inside Chuja-myeon. The coordinate is dead-on and the name belongs to a smaller, different place. Both sources are telling the truth about different rungs of the same ladder, and the ladder doesn't line up.
The second one is worse, and it's the one I'd lose sleep over. The single biggest bucket of misses is 구 (gu), the urban districts. Gangnam-gu. Haeundae-gu. The level that is the address for most of Seoul and Busan. WOF Korea doesn't carry those as named localities at all, so there is nothing on the map to confirm against. The single most address-dense slice of the country is the slice the gazetteer is thinnest on. You can have a method that works and a map that's blank exactly where the people are, and that is the honest ceiling on Korea today. No recipe tweak gets past a name that was never in the dataset.
A bug the verifier caught, and you should want it to
One detour, because it's the kind of mistake that ships quietly if you let it. The first version of the Korean build reported 56% name confirmation, and we were briefly delighted. Then we looked at the distances, and the "confirmed" matches were averaging 71 kilometres from the postcode, a few of them out past 500.
Korean place names repeat. A lot. Dozens of villages share a name up and down the country, and the matcher had been finding a name match anywhere in Korea and then taking the nearest copy, which can still be a province away. The fix is the same proximity leash Japan's build already wore: a name only counts as confirmation if the place it names also sits nearby. That pulled the number down to its honest 26% and the average distance back under five kilometres. One signal in a costume of two is worse than the one signal alone — the inflated 56% would have told us Korea was twice as solved as it is. Make your two signals genuinely agree, or don't get to call it two.
What we keep, and what the map still owes us
So where does that leave the bet? The architecture held. A point-primary build and a name-primary build, two countries whose data shares almost nothing, both poured into the same resolver strategy with no new resolver code between them. The "less special" thing we wanted to prove, that this generalizes past one lucky locale, is proven. What it can't do is conjure place names that Who's On First was never handed.
So Korea ships as honest as it is: rock-solid on province and coordinate, explicit about a 26% name tier, marked experimental and kept out of the default bundle until the rest catches up. The catch-up has an address. Korea's road-name database, Juso, carries the gu and dong names natively. It's locked behind a government API key, so getting it is a deliberate acquisition, and it's next on the list to go fetch. Taiwan is one rung further back: there's no GeoNames postal file for it at all, a flat 404, so there isn't even a coordinate to begin with until we source one.
If there's a portable lesson in two countries' worth of this, it's that a geocoder is only ever as good as the map it stands on, and a map's favourite way to lie is to leave things out. Japan's map was missing shapes, and we could route around that. Korea's map is missing names, right where the cities are, and there's no routing around a blank. So before you tune a model or argue with a matcher, go look at what your reference data actually holds in the exact spot you care about most. The country is all still there. Whether the map admits it is a separate question, and it's usually the one that decides how far you get.
