Bayesian Consumer Profiling: How to Estimate Consumer Characteristics from Aggregate Data
针对企业利用聚合数据推断消费者特征时常用的简单计数法存在条件独立性假设失效的问题,提出贝叶斯画像方法,通过引入不同条件独立性假设和额外协变量,显著提升预测准确性,并用姓名推断年龄和IP地址推断收入等案例验证。
Firms use aggregate data from data brokers (e.g., Acxiom, Experian) and external data sources (e.g., Census) to infer the likely characteristics of consumers in a target list and thus better predict consumers’ profiles and needs unobtrusively. The authors demonstrate that the simple count method most commonly used in this effort relies implicitly on an assumption of conditional independence that fails to hold in many settings of managerial interest. They develop a Bayesian profiling introducing different conditional independence assumptions. They also show how to introduce additional observed covariates into this model. They use simulations to demonstrate that in managerially relevant settings, the Bayesian method will outperform the simple count method, often by an order of magnitude. The authors then compare different conditional independence assumptions in two case studies. The first example estimates customers’ age on the basis of their first names; prediction errors decrease substantially. In the second example, the authors infer the income, occupation, and education of online visitors of a marketing analytic software company based exclusively on their IP addresses. The face validity of the predictions improves dramatically and reveals an interesting (and more complex) endogenous list-selection mechanism than the one suggested by the simple count method.