Our parser fails 80% of our own tests. We shipped it anyway.
Our neural address parser passes 20.7% of our test suite. The rule-based parser it's meant to replace passes 93.7%. By that scoreboard, we should delete the neural model and go home.
We shipped the neural model instead. Here's why both numbers are true — and why the one that matters says the opposite.
