A Discussion of “Text Selection”
讨论Kelly等人提出的文本中词语选择的经济模型,该模型能更好处理大量词语,应用于银行资本比率回溯预测和美国宏观经济变量预测,适合研究文本信息量化的学者。
Kelly, Manela, and Moreira provided an economic model of word choice in text. A writer is modeled as someone who is first choosing whether to use a word at all (selection problem) and then deciding how often a selected word should be used (positive counts problem). The resulting model leads to better sufficient reduction for large number of words/phrases in the text as demonstrated many diverse applications that use information captured from the text of the front page of the Wall Street Journal such as back-casting regulatory capital ratio of banks, and forecasting and nowcasting U.S. macroeconomic variables. Researchers interested in quantifying information in text will benefit from reading the article and thinking about some of the issues raised in the article. I provide background, context from other foundational papers, a very short summary of the article, and make some broad observations in my discussion.