HIPE-2026 releases new benchmark results for historical person-place extraction

A new HIPE-2026 overview paper on arXiv reports shared-task results for person-place relation extraction from multilingual historical texts, with 17 teams and more than 40 runs across French, German, and English.

The HIPE-2026 shared task has published new benchmark results for person-place relation extraction from multilingual historical texts, giving researchers a fresh look at how systems cope with noisy archives, language variation and cross-domain transfer.

The results overview paper, posted on arXiv on June 24, says 17 participating teams submitted more than 40 runs across French, German and English. The paper is the latest public milestone in HIPE-2026, the third edition of the HIPE evaluation series.

What HIPE-2026 tested

HIPE-2026 focuses on two relation types. The at label marks that a person was present at a place at some point before a document's publication date. The isAt label marks presence around the time of publication.

The benchmark is aimed at historical texts, where OCR noise, spelling variation, older language forms and indirect references make person-place extraction much harder than it is in clean modern text.

The overview paper says the evaluation framework measured three dimensions: predictive accuracy, computational efficiency and cross-domain generalization. It also included a surprise-domain set drawn from early modern French literary texts.

From February setup to June results

The June paper follows an earlier HIPE-2026 lab description posted in February, which laid out the task, relation labels and evaluation framing. That earlier paper established the protocol; the new overview paper reports what happened when teams actually submitted systems.

The results paper shows a broad spread of approaches. Systems ranged from large language models to lighter task-specific classifiers, suggesting that the benchmark tested not just raw extraction ability but also the trade-offs between performance and efficiency.

Why the benchmark matters

Historical document processing has direct relevance for cultural heritage and digital humanities work. Better person-place extraction can help build knowledge graphs, reconstruct biographies and support spatial analysis across archival collections.

The HIPE series itself builds on earlier editions in 2020 and 2022, which focused on named entity recognition and linking in historical texts. HIPE-2026 extends that line by targeting relations between people and places in multiple languages.

The paper also highlights the scale of participation: 17 teams and more than 40 runs make this one of the more substantial public snapshots of the field so far.

What comes next

The June arXiv posting appears to be the first public release of the results overview, but it is still unclear whether it is the final proceedings version or a preprint ahead of publication.

The next items to watch are the final conference or proceedings release, any team-specific system papers, and whether the benchmark data, rankings or evaluation code are published separately.

Revision note

Initial automated publication.