Misreporting and econometric modelling of zeros in survey data on social bads: An application to cannabis consumption
针对社会不良行为调查数据中大量零值的问题,提出双膨胀建模框架,区分非参与者、误报者和偶用者,并应用于大麻消费数据,发现17%的零值来自误报者。
When modelling "social bads," such as illegal drug consumption, researchers are often faced with a dependent variable characterised by a large number of zero observations. Building on the recent literature on hurdle and double-hurdle models, we propose a double-inflated modelling framework, where the zero observations are allowed to come from the following: nonparticipants; participant misreporters (who have larger loss functions associated with a truthful response); and infrequent consumers. Due to our empirical application, the model is derived for the case of an ordered discrete-dependent variable. However, it is similarly possible to augment other such zero-inflated models (e.g., zero-inflated count models, and double-hurdle models for continuous variables). The model is then applied to a consumer choice problem of cannabis consumption. We estimate that 17% of the reported zeros in the cannabis survey are from individuals who misreport their participation, 11% from infrequent users, and only 72% from true nonparticipants.