解读赎金：提取1571年瑞典财富税的方法论进展

Reading the ransom: Methodological advancements in extracting the Swedish Wealth Tax of 1571

Explorations in Economic History · 2022

被引 3

ABS 3

Christopher Blomqvist
Kerstin Enflo 通讯
Andreas Jakobsson
Kalle Åström

中文导读

提出一种结合分割模块和手写文本识别模块的深度学习方法，用于读取16世纪手写记录，并以1571年瑞典财富税为例，展示了从松散手写信息自动提取并组织成表格的过程，对经济史学家数字化前工业时期定量材料有参考价值。

Abstract

We describe a deep learning method to read hand-written records from the 16th century. The method consists of a combination of a segmentation module and a Handwritten Text Recognition (HTR) module. The transformer-based HTR module exploits both language and image features in reading, classifying and extracting the position of each word on the page. The method is demonstrated on a unique historical document: The Swedish Wealth Tax of 1571. Results suggest that the segmentation module performs significantly better than the lay-out analysis implemented in state-of-the art programs, enabling us to trace many more text blocks correctly on each page. The HTR module has a low character error rate (CER), in addition to being able to classify words and help organize them into tabular formats. By demonstrating an automated process to transform loosely structured handwritten information from the 16th century into organized tables, our method should interest economic historians seeking to digitize and organize quantitative material from pre-industrial periods.

经济史自然语言处理深度学习手写文本识别

阅读原文 ↗