Skip to main content

One post tagged with "Non-Latin scripts"

Articles about CJK, Cyrillic, Arabic, Hangul, and other non-Latin address formats.

View All Tags

The map runs out before the country does

· 11 min read
Teffen Ellis
Sister Software

We spent a good month teaching our resolver exactly one trick. Take a postcode, drop its centroid into the city polygon that happens to contain it, read off the city. It's a genuinely good trick. It got the Netherlands to 95% and Germany to 93%, and for a while it felt like the whole problem was going to fall to it. Then we pointed it at Japan, and Japan calmly informed us that it has no city polygons to drop anything into.

What follows is a two-country story about what a geocoder can still do when the map underneath it goes thin, and where it finally can't. Japan we resolved anyway, 94% of the way, by putting the polygon down and asking a different question. Korea handed the same problem back to us turned inside-out: it let us pin the coordinate perfectly, every time, and then stopped us cold at the one thing we were really after, which is the name of the place you've landed in.

Three questions sit under all of it, so let me put them on the table before we start:

  • What do you do when the gazetteer gives you points where you expected shapes?
  • Does the move that rescues Japan actually generalize, or did we get lucky once and dress it up as a method?
  • And the question with no comfortable answer: what happens when the map is simply missing the part of a country you most need to see?