Eval report — step-002200
- entries evaluated: 4535
- full-parse exact match: 0.0818
- mean token confidence: 0.8063
Per-component F1
| tag | precision | recall | f1 | support |
|---|
| country | 0.2143 | 0.2082 | 0.2112 | 245 |
| region | 0.3430 | 0.1298 | 0.1883 | 3205 |
| locality | 0.2478 | 0.3053 | 0.2736 | 3357 |
| dependent_locality | 0.0050 | 0.1000 | 0.0096 | 40 |
| postcode | 0.8324 | 0.5916 | 0.6916 | 2980 |
| subregion | 0.0000 | 0.0000 | 0.0000 | 0 |
| cedex | 0.0000 | 0.0000 | 0.0000 | 1 |
| venue | 0.3765 | 0.4015 | 0.3886 | 1101 |
| street | 0.3559 | 0.2616 | 0.3016 | 2928 |
| house_number | 0.7446 | 0.8335 | 0.7866 | 1742 |
Calibration (confidence bucket → accuracy)
| bucket | n | accuracy |
|---|
| 0.0–0.1 | 0 | 0.0000 |
| 0.1–0.2 | 21 | 0.2381 |
| 0.2–0.3 | 666 | 0.2913 |
| 0.3–0.4 | 2416 | 0.3266 |
| 0.4–0.5 | 4308 | 0.3152 |
| 0.5–0.6 | 4777 | 0.3383 |
| 0.6–0.7 | 4943 | 0.3569 |
| 0.7–0.8 | 5534 | 0.3825 |
| 0.8–0.9 | 8066 | 0.4363 |
| 0.9–1.0 | 30517 | 0.6440 |