使用机器学习选择数据包络分析中的变量:基于电力配送数据的模拟与应用

Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data

Energy Economics · 2023
被引 28
人大 A-ABS 3

中文导读

针对电力监管机构在数据包络分析中变量选择困难的问题,提出基于自适应LASSO的两步法,模拟发现其在中低共线性时优于其他方法,高共线性时LASSO更优,并用瑞典电力配送数据验证。

Abstract

Agencies that regulate electricity providers often apply nonparametric data envelopment analysis (DEA) to assess the relative efficiency of each firm. The reliability and validity of DEA are contingent upon selecting relevant input variables. In the era of big (wide) data, the assumptions of traditional variable selection techniques are often violated due to challenges related to high-dimensional data and their standard empirical properties. Currently, regulators have access to a large number of potential input variables. Therefore, our aim is to introduce new machine learning methods for regulators of the energy market. We also propose a new two-step analytical approach where, in the first step, the machine learning-based adaptive least absolute shrinkage and selection operator (ALASSO) is used to select variables and, in the second step, selected variables are used in a DEA model. In contrast to previous research, we find, by using a more realistic data-generating process common for production functions (i.e., Cobb–Douglas and Translog), that the performance of different machine learning techniques differs substantially in different empirically relevant situations. Simulations also reveal that the ALASSO is superior to other machine learning and regression-based methods when the collinearity is low or moderate. However, in situations of multicollinearity, the LASSO approach exhibits the best performance. We also use real data from the Swedish electricity distribution market to illustrate the empirical relevance of selecting the most appropriate variable selection method.

数据包络分析变量选择机器学习自适应LASSO