Natural language processing to identify the creation and impact of new technologies in patent text: Code, data, and new measures
开发自然语言处理技术,从美国专利中识别新技术的创造与影响,通过诺贝尔奖专利和跨局专利验证其优于传统指标,并提供开放代码与数据。
We develop natural language processing techniques to identify the creation and impact of new technologies in the population of U.S. patents. We validate the new techniques and their improvement over traditional metrics based on patent classification and citations in two case-control studies. First, we collect patents linked to awards such as the Nobel prize and the National Inventor Hall of Fame. These patents likely cover radically new technologies with a major impact on technological progress and patenting. Second, we identify patents granted by the United States Patent and Trademark Office but simultaneously rejected by both the European and Japanese patent office. Such patents arguably lack novelty or cover small incremental advances over prior art and should have little impact on technological progress. We provide open access to code, data, and new measures for all utility patents granted by the USPTO up to May 2018 (see https://zenodo.org/record/3515985, DOI: 10.5281/zenodo.3515985).