链接与可读性:理解历史美国人口普查自动链接方法

Links and Legibility: Making Sense of Historical U.S. Census Automated Linking Methods

Journal of Business & Economic Statistics · 2023
被引 6
人大 AABS 4

中文导读

研究了手写可读性如何影响跨人口普查轮次个体链接算法的性能,发现1940年美国人口普查中枚举区可读性差异巨大,低可读性会降低链接质量,完美可读性可提升链接率5-10个百分点。

Abstract

How does handwriting legibility affect the performance of algorithms that link individuals across census rounds? We propose a measure of legibility, which we implement at scale for the 1940 U.S. Census, and find strikingly wide variation in enumeration-district-level legibility. Using boundary discontinuities in enumeration districts, we estimate the causal effect of low legibility on the quality of linked samples, measured by linkage rates and share of validated links. Our estimates imply that, across eight linking algorithms, perfect legibility would increase the linkage rate by 5–10 percentage points. Improvements in transcription could substantially increase the quality of linked samples.

手写可读性人口普查链接算法枚举区边界断点链接质量