Eval report — step-050000
- entries evaluated: 74
- full-parse exact match: 0.5270
- mean token confidence: 0.9745
Per-component F1
| tag | precision | recall | f1 | support |
|---|
| country | 0.0000 | 0.0000 | 0.0000 | 6 |
| region | 0.8500 | 0.8095 | 0.8293 | 63 |
| locality | 0.6875 | 0.6111 | 0.6471 | 72 |
| dependent_locality | 0.0000 | 0.0000 | 0.0000 | 1 |
| postcode | 0.8730 | 0.8462 | 0.8594 | 65 |
| subregion | 0.0000 | 0.0000 | 0.0000 | 0 |
| cedex | 0.0000 | 0.0000 | 0.0000 | 1 |
Calibration (confidence bucket → accuracy)
| bucket | n | accuracy |
|---|
| 0.0–0.1 | 0 | 0.0000 |
| 0.1–0.2 | 0 | 0.0000 |
| 0.2–0.3 | 0 | 0.0000 |
| 0.3–0.4 | 5 | 0.2000 |
| 0.4–0.5 | 9 | 0.4444 |
| 0.5–0.6 | 20 | 0.4000 |
| 0.6–0.7 | 8 | 0.5000 |
| 0.7–0.8 | 19 | 0.3684 |
| 0.8–0.9 | 25 | 0.4000 |
| 0.9–1.0 | 1114 | 0.8824 |