房地产中的文本分析

Textual Analysis in Real Estate

Journal of Applied Econometrics · 2016
被引 84
人大 AABS 3

中文导读

将MLS房源列表中的评论文本纳入特征价格模型,发现文本信息可将定价误差降低超过25%,并利用LASSO等惩罚回归方法估计词汇的隐含价格。

Abstract

This paper incorporates text data from MLS listings into a hedonic pricing model. We show that the comments section of the MLS, which is populated by real estate agents who arguably have the most local market knowledge and know what homebuyers value, provides information that improves the performance of both in-sample and out-of-sample pricing estimates. Text is found to decrease pricing error by more than 25%. Information from text is incorporated into a linear model using a tokenization approach. By doing so, the implicit prices for various words and phrases are estimated. The estimation focuses on simultaneous variable selection and estimation for linear models in the presence of a large number of variables using a penalized regression. The LASSO procedure and variants are shown to outperform least-squares in out-of-sample testing. Copyright © 2016 John Wiley & Sons, Ltd.

文本分析特征价格模型MLS房源数据LASSO回归