Consequences of arbitrary binning the midpoint category in survey data: an illustration with student satisfaction in the National Student Survey
研究了在调查数据中任意将中点类别归入满意或排除对排名的影响,发现虽然整体排名稳定,但个别课程和机构排名变动显著,提醒研究者谨慎对待任意分组。
Arbitrary placing cut-offs in data, i.e. binning, is recognised as poor statistical practice.We explore the consequences of using arbitrary cutoffs in two large datasets, the National Student Survey (2019 and 2022).These are nationwide surveys aimed at capturing student satisfaction amongst UK undergraduates.For these survey data, it is common to group the responses to the question on student satisfaction on a five point Likert scale into '% satisfied' based on two categories.These % satisfied are then further used in metrics.We examine the consequences of using three rather than two categories for the rankings of courses and institutions, as well as the consequences of excluding the midpoint from the calculations.Across all courses, grouping the midpoint with satisfied leads to a median shift of 8.40% and 11.41% in satisfaction for 2019 and 2022, respectively.Excluding the midpoint from the calculations leads to a median shift of 4.20% and 5.70% in satisfaction for 2019 and 2022, respectively.While the overall stability of the rankings is largely preserved, individual courses or institutions exhibit sizeable shifts.Depending on the analysis, the most extreme shifts for courses in rankings are between 13 and 79 ranks, for institutions between 24 and 416 ranks.Our analysis thus illustrates the potentially profound consequences of arbitrarily grouping categories for individual institutions and courses.We offer some recommendations on how this issue can be addressed but primarily we caution against the reliance on arbitrary grouping of response categories in survey data such as the NSS.