视觉计量经济学:使用ViSta教授和实践计量经济学

Visual econometrics: teaching and practising econometrics using ViSta

Journal of Applied Econometrics · 2002
被引 0
人大 AABS 3

中文导读

介绍动态图形软件ViSta在计量经济学教学和实践中的应用,通过交互式图形操作直观展示数据、回归结果和诊断,帮助理解概念并改进建模。

Abstract

Recent advances in computer hardware and software with the concomitant reductions in costs have offered many exciting new possibilities in the field of data analysis. Dynamic graphical methods are one of the most interesting and potentially most influential new developments. The essential feature of a dynamic graphical method is ‘the direct manipulation of elements of a graph on a computer screen; the manipulation is carried out through the use of an input device such as a mouse, and in high-performance implementations, the elements change virtually instantaneously on the screen’ (Becker et al., 1987). This approach is rapidly gaining acceptance in statistics, and there are many software packages that implement it to some degree (e.g. JMP, S-Plus). However, the potential of these methods in teaching and practising econometrics has not yet been fully exploited. The main reason of this lies probably in the lack of software that implements these methods in a way that suits econometricians rather than in failure to recognize their importance. The usefulness of plotting the data in econometrics has long been recognized. Looking at economic data encourages knowledge of economic theory and history. Errors, unusual events, structural breaks, and so on are all features that can be best appreciated through plots and that should be taken into consideration when developing an econometric model. Dynamic graphics can accomplish these tasks much more immediately and effectively than standard graphing methods. Moreover, dynamic graphics can make econometric concepts and assumptions ‘visible’. This new approach has been made possible by the development, in recent years, of programs such as ViSta, a powerful and easy-to-use graphically oriented interactive computing environment for statistical analysis. After briefly defining the concept of visual econometrics (Section 2) we illustrate with a few examples relevant to economics and econometrics how ViSta can be useful not only for general data analysis but also for teaching and practising econometrics. Visual econometrics can be defined as a way of teaching and practising econometrics by manipulating visual information on a computer screen using pointing devices, without the need to write codes or issue commands. With the availability of operating systems that utilize graphical user interfaces and increased access to the Internet, it is safe to assume that pointing and clicking should be quite ubiquitous skills. The visual approach perfectly complements the algebraic and numerical perspectives of doing econometrics. In many cases, this approach allows one to introduce in an immediate and simple way important econometric concepts, such as the impact on the Ordinary Least Squares (OLS) estimator of influential observations and outliers, that are usually not only considered too difficult to teach in introductory econometric courses but are also still not part of standard econometric practice for lack of available software (see Davidson and MacKinnon, 1993, p. 32). In order to be interesting to econometricians, a visual econometric environment should not only allow users to ‘see’ data in a meaningful way, but also to visualize regression results and diagnostics. More importantly, the user's actions, such as the elimination of a few observations, should have immediate consequences that are reflected in a visual manner on the regression results and diagnostics. This has obvious implications for both the understanding of econometrics and econometric modelling. The ability easily and systematically to experiment using tools of this kind should promote a deeper understanding of important econometric concepts. Furthermore, these powerful exploration techniques could improve and speed up the refinement process that starts from a basic econometric model, as directly suggested by economic theory, and lead to an econometric specification that better represents the data at hand. Incidentally, we would like to note that the visual approach is all the more important given the growing popularity of non-parametric methods in applied econometrics. When the object of interest becomes a whole distribution, rather then just a few of its moments, only a graphical display can represent such a large amount of information effectively. An environment for visual econometrics should provide the dynamic graphical methods described in Becker et al. (1987) that allow the identification of labelled data (the user should be able to select points on a graph and obtain the observations they refer to and, conversely, should be able to localize the observations on the graph from the label), deletion (it should be possible to delete observations from the graphs or from an ordered list of labels) and linking (observations should be visually linked on different plots). Graphical analysis of regression results has been made popular by Cook and Weisberg (1994). Their approach focuses on static methods. Various types of residual, influential, and added-variable plots can be obtained within several modern statistical packages (JMP and S-Plus to mention a few). What these programs specifically lack to qualify as visual econometric environments is the systematic application of dynamic methods to residual plots. ViSta is a ‘free and open’ software package written by Forrest W. Young. It is ‘free’ in the sense that the executable, along with plugins, help, examples, and test data can be freely copied and redistributed, and it is ‘open’ in the sense that all the code1 can be obtained, with permission from the author, in order to extend ViSta's capabilities. ViSta can be downloaded from the web at http://www.visualstats.org/, where different versions are available, according to the platform (Microsoft Windows 2000, 98, 95, 3.x/NT/OS-2, Macintosh PPC, 68 K or Unix with X112) or the language (English, French and Spanish) used.3 Only the Windows version is regularly updated, and other versions tend to be much less up to date. The reason why ViSta is particularly well suited to implement visual econometrics is the extensive use of graphical analysis which allows multiple views of the data and regression results simultaneously in an interactive fashion. To the best of our knowledge, in this respect, ViSta is unique. Also, ViSta is designed for an audience of users having a very wide range of levels of expertise in data analysis, ranging from novices to experts. ViSta has been specifically designed to match different requirements in terms of data analysis sophistication, and therefore different environments are available at the choice of the user. WorkMaps that visually summarize the data analysis session. WorkMaps have an ongoing structure and expand as the analysis proceeds. Therefore, they provide the user with a history of the analysis carried out and can easily be used to rerun previous steps. Textual reports that present data statistics and analysis results in the classical way are also available in a different window. DataSheets for inputting and editing data. Visual GuideMap Authoring for expert users (e.g. lecturers) who wish to create GuideMaps to be followed by less experienced ones (e.g. students). Context-Sensitive Help, Web-based Help, and Notes. The statistical visualization capabilities of ViSta constitute a useful tool for understanding data and provide intuitive insights into issues of econometric modelling. ViSta's design rests on the assumption that the synthesis between classical approaches in data analysis and modelling and visualization techniques gives the user the most complete understanding of the data. Different combinations of empirically linked plots (i.e. plots which are linked through the data) let the researcher perform a complete investigation of the structure and the nature of data. Of particular interest to econometricians are SpreadPlots (Young et al., 1992), a spreadsheet-like arrangement of linked, dynamic plots that can help the researcher in assessing the goodness of fit of a specified econometric model, especially when dealing with multi-dimensional data. We will illustrate the difference between SpreadPlots (also referred to as algebraically linked), where the link is provided by the equation of the underlying model, and empirically linked plots, where links involve only variables and observations, by means of examples. In the next section, empirically linked plots will be used to investigate income distribution dynamics. In Section 5, algebraically linked plots will be used to explore the estimation results of a simple demand function. In this section, we illustrate a few basic features of ViSta by means of an exploratory data analysis (EDA) applied to the evolution of income distribution using a 121-country sample of GDP per worker taken from the well-known Penn World Table of Summers and Heston (1991). The data can be downloaded from the following web page: http://pwt.econ.upenn.edu/. To import the data set into ViSta, the File menu's Import Data item can be used. Several ASCII file formats are supported. A dialog box allows the user to specify whether the first line contains variable names, whether the first column contains observation labels, whether there are missing values, and so on, by checking the appropriate boxes. For our analysis, we first transform the data into logs using the Transform menu's logarithm option. Although many standard transformation are readily available, user defined transformations (such as computing GDP per worker using GDP and the number of workers) have to be done by inputting Lisp code in an appropriate window.4 Once the data set is loaded and transformed, the Data menu offers several possible alternative ways to proceed. Options to visualize, summarize, edit, browse, create, and merge are available. For our purposes, we will concentrate on the Visualize Data option, which can be used to obtain a collection of empirically linked plots that visually summarizes the data.5 Figure 1 shows how, using ViSta, we can effectively investigate not only the evolution of the income distribution but also have a clear view of the intradistributional dynamics. Evolution of income distribution and intradistribution dynamics The scatterplot allows us to identify changes in the world distribution of income. Large deviations from the line point out growth miracles and growth disasters. Individual observations or groups of observations can be highlighted and labelled. For example, it is clear from the figure that Botswana and Lesotho have experienced a large increase in relative income, whereas Chad and Madagascar have lost ground. The boxplot gives us a simple schematic of income distribution particularly well suited for inequality comparisons. The centre horizontal line inside each box locates the median, the bottom and top edges of the box are at the first and third quartiles; and the bottom and top lines are at the 10th and 90th percentiles. Therefore, 10% of the countries are above the top margin and another 10% are below the bottom margin. If we simply compare the two boxplots for the two years, as in Figure 1, we can clearly see how the middle box has grown in extent between 1960 and 1988, thus providing evidence that the middle 50% of the distribution has been spreading out. Also, from the fact that the extremes of the box are further apart, it is evident that the distribution has become somewhat more unequal. The latter fact is confirmed by the quantile–quantile plot. Connected boxplots can be used to explore the intradistribution dynamics and usually summarize a rich variety of dynamic behaviour. Most countries display persistence in their dynamic behaviour, and few display considerable mobility. Another interesting dynamic feature is evident in the picture: There is some evidence of within-group convergence. This is confirmed by the Histogram windows, where histograms, the normal reference curve, and nonparametric kernel density estimates6 for the two years are shown.7 There is evidence that the overall income distribution has become twin-peaked. In this section, we will illustrate the usefulness of ViSta in teaching and practising econometrics. For this purpose, we will use the data on gasoline consumption available in Greene (2000). The data set can be downloaded at the following web page http://www.stern.nyu.edu/∼wgreene/Text/econometricanalysis.htm. (Table 6.1) Once the data have been read by the program, running a regression can be done very easily by clicking on Analyze ▸ Regression Analysis, after which an easy-to-use multiple regression dialog box opens (see Figure 2). Regression Analysis dialog box After running the regression of US per capita gasoline consumption (GPOP) on its price (PG) and national income (Y), we have from the Model menu the choice to: Visualize Model, Report Model, Delete Model, and Create Data. The Report Model choice simply prints the numerical results from the regression analysis (printouts of residuals and leverages are optional). Here is part of the output of our regression: Since the regression output rounds the regression coefficients and standard errors to two significant digits, it is difficult to judge the accuracy of the regression routines.8 The Create Data option creates a new data set that includes fitted values, residuals, influences, and leverages. The option is very useful to prepare for testing the linear regression model and to implement other regression techniques such as instrumental variables. The Visualize Model option is particularly useful. Figure 3 presents an example of a layout of the five algebraically linked plots from the visualization of the regression model. The plots displayed are the Added Variables Plot, Residuals Plot,9 Leverage Plot, Influence Plot, and the Observation window. We refer the reader to Cook and Weisberg (1994) for background on these residual-based diagnostic plots. Visualization of regression model When teaching econometrics, the best way to illustrate the usefulness of regression analysis is to relate it to the concept of ceteris paribus in economics. With ViSta, we can easily illustrate such conceptual experiments. We can compare the ScatterPlot window with the Regression window, which implements the added variable plot (see Figure 4). The scatterplot, with a linear regression line superimposed, shows only the marginal relationship between demand and price. The Regression window shows the relationship that would exist between the variables if we kept income constant; it is the visual equivalent of the partial derivative of the mean function conventionally used in the interpretation of regression coefficients. Also, since the added variable plot is obtained by plotting per capita gasoline consumption netted out of the effects of income and the constant, versus prices netted out of the effect of income and the constant, it is also a visual illustration of the Frish–Waugh–Lovell theorem. The slope of the regression line in the added variable plot can be easily read from the plot, since both variables are centred around zero, and is numerically identical to the value of the coefficient for price in the original regression. Scatterplot versus added variable plot It is well known that the method of least squares is very sensitive to even a single extreme observation, whether erroneous or not. The assimilation of techniques to identify unusual observations has been slow in econometric practice and, especially in teaching econometrics, almost absent. It has been pointed out (see Davidson and MacKinnon, 1993) that this is mostly due to the lack of available software packages that implement this techniques. ViSta represents a considerable step forward in making the detection of unusual observations accessible to both the novice and the expert econometrician. ViSta's ability to connect several plots, so that any action taken in one of the linked plots (e.g. selecting or removing points), is simultaneously applied to all the others, allows the user to examine where unusual observations in one plot are located in other plots. This visualization can, among other things, be effectively used to illustrate the differences among influential, high leverage, and outlying observations. For example, in Figure 3, a few interesting observations with unusually large residuals (outliers), leverage, or influence were selected. Such examples can be immediately found with ViSta using a method of selecting point called brushing. In this visualization mode, the mouse pointer is transformed into a rectangle. As this rectangle moves across the various windows, all the points within its margins and the corresponding points in other windows are highlighted and labelled. The fact that high leverage points need not be influential is clearly illustrated by the pictures. The observation for 1960, even though of high leverage, has virtually no influence on the regression coefficients because it is associated with a relatively small residual. The pictures also clearly show how the size of the residual associated with an observation is a poor indicator of its influence. The observation for 1979 is such an example. For the 1981 observation, the combination of ‘outlyingness’ and leverage produces a substantial influence on the regression coefficients. One of the most powerful features of ViSta is the possibility to visually remove or include selected observations for subsequent analysis. This feature is tantamount to the visual creation of dummies. From the Obs menu there is the possibility to remove, focus on, or cancel the selection. When we removed the three most influential observations (1981, 1982, and 1995) and reran the previous regression model,10 the results showed an improved fit. Also, the economic and statistical significance of the price coefficient increased slightly (the value of the estimated coefficient is −0.11 and the t-statistic is −7.36). ViSta is a powerful and easy-to-use graphically oriented interactive computing environment for statistical analysis. At present, the program still has a lot of limitations. Thus, in its current version, it can effectively complement, but not substitute for, standard econometric packages. ViSta in fact does not directly implement many data analysis techniques that we would expect in a fully fledged econometric package, such as specific methods for time series analysis, systems of equations, panel data, and limited dependent variables. However, the extensible nature of the package will ensure that, as ViSta becomes more popular among econometricians, new data analysis and visualization capabilities useful for econometric applications will be added. The authors would like to thank Karim Abadir, John Hutton, James MacKinnon, Forrest Young, and the participants to the Sixth Society for Computational Economics Conference, Barcelona for helpful comments and suggestions. Financial support from the ESRC (UK) grant R000239538 is gratefully acknowledged.

动态图形方法计量经济学教学ViSta软件经济数据可视化