Skip to main content

v0.1.0 PyTorch checkpoint × golden v0.1.2 (apples-to-apples baseline)

  • entries evaluated: 4535
  • full-parse exact match: 0.0088
  • mean token confidence: 0.8587

Per-component F1

tagprecisionrecallf1support
country0.01350.13880.0246245
region0.03330.04090.03673205
locality0.06770.09090.07763357
dependent_locality0.00000.00000.000040
postcode1.00000.00030.00072980
subregion0.00000.00000.00000
cedex0.00000.00000.00001

Calibration (confidence bucket → accuracy)

bucketnaccuracy
0.0–0.100.0000
0.1–0.211.0000
0.2–0.32080.0913
0.3–0.416600.1084
0.4–0.532070.1366
0.5–0.640300.1846
0.6–0.739810.2042
0.7–0.842060.2183
0.8–0.956270.2472
0.9–1.0383280.3366