Textual Analysis in Real Estate
将MLS房源列表中的评论文本纳入特征价格模型,发现文本信息可将定价误差降低超过25%,并利用LASSO等惩罚回归方法估计词汇的隐含价格。
This paper incorporates text data from MLS listings into a hedonic pricing model. We show that the comments section of the MLS, which is populated by real estate agents who arguably have the most local market knowledge and know what homebuyers value, provides information that improves the performance of both in-sample and out-of-sample pricing estimates. Text is found to decrease pricing error by more than 25%. Information from text is incorporated into a linear model using a tokenization approach. By doing so, the implicit prices for various words and phrases are estimated. The estimation focuses on simultaneous variable selection and estimation for linear models in the presence of a large number of variables using a penalized regression. The LASSO procedure and variants are shown to outperform least-squares in out-of-sample testing. Copyright © 2016 John Wiley & Sons, Ltd.