档案研究中的未观测变量:实现理论与统计识别

Unobserved variables in archival research: Achieving both theoretical and statistical identification

JOURNAL OF BUSINESS LOGISTICS · 2023
被引 14
人大 A-ABS 3

中文导读

这篇社论为物流与供应链管理研究者澄清了使用档案数据时如何通过机制解释实现理论识别,并强调理论识别与统计识别同等重要,以提升研究可信度。

Abstract

A welcomed addition to the logistics and supply chain management (L&SCM) research landscape has been growth in the use of archival data, defined as data collected by an entity outside the research team (Miller et al., 2021). Expansion in the use of archival data is stimulated by discussions concerning the generally accepted limitations of primary data research and related debate about when and how the use of primary data is appropriate (e.g., Montabon et al., 2018; Schoenherr et al., 2015). Concurrently, there has been recognition that research utilizing archival data opens new avenues of emphasis for L&SCM research in answering a wide array of questions. When archival data is generated from industry and operations (e.g., DeHoratius et al., 2022), direct application may be enhanced. Additionally, archival data is highly accessible, which aids in both replication and extension (Pagell, 2021). As with every research design, archival data poses multiple limitations, and there are unique challenges as researchers employing the data are detached from the original collection process. Recent articles have tackled issues concerning how to establish strong validity claims for measures derived from archival sources (Miller et al., 2021) and how to formulate statistical models that ensure theoretical hypotheses map to estimated parameters (Ketokivi et al., 2021). We refer to this as statistical identification, which focuses on the confidence that an estimated statistical parameter (e.g., regression coefficient) is reasonably unbiased and not overly sensitive to changes in the structure of the statistical model. Possibly the greatest threat to statistical identification is the existence of one or more unmeasured variables that reside theoretically upstream or parallel to the independent variables. If included as predictors in a statistical model, these variables could be significantly related to the outcome variable. This is highly problematic if the estimates of the independent variables could shift substantially (Miller & Kulpa, 2022). We see a significant amount of emphasis on this aspect of the research, and the general remedy involves utilizing a combination of control variables and performing robustness tests (sometimes in excessive numbers) to rule out alternative explanations. One fundamental issue that has not been adequately addressed in L&SCM research is related to theorizing. Theorizing involves devising hypotheses that will be tested with this archival data. It is not enough to develop statistical models where there is a reasonable degree of confidence that focal parameters are statistically identified. It is just as important to provide evidence of theoretical identification, defined here as the existence of strong and convincing rationale(s) that the theorized mechanisms bring to the reported results. Because archival data do not capture the unobserved processes that are postulated to bring about hypothesized relationships, theorizing in archival research can sometimes be more challenging than when using primary methods (Godfrey & Hill, 1995; Ketokivi & Mantere, 2021). Lack of guidance on this issue is problematic as it can (a) encourage research that presents “novel” findings without solid theoretical reasoning, (b) discourage L&SCM researchers from answering important questions, and (c) cause confusion between authors and reviewers during the peer review process. In this editorial, we hope to provide clarification to authors and reviewers about theorizing with archival data. We asked Dr. Jason Miller and Dr. Andy Balthrop to co-author this manuscript given their exceptional methodological expertise in this area. To address the need for theoretical identification, we stress the importance of developing mechanism-based explanations when hypothesizing relationships between variables. We contend that it is critical in the research process to explicitly explain the “underlying entities, processes, or structures which operate in particular contexts to generate outcomes of interest” that are theorized to bring about the hypothesized effects (Astbury & Leeuw, 2010, p. 368). The focus of offering mechanism-based explanations should be on developing logical arguments that explain unobserved underlying processes. It is these unobservable mechanisms, grounded in a study's context, that provide the rationale for hypotheses development concerning the directional relationships between selected variables (Bunge, 1997). In the sections that follow, we further clarify the role that theoretical identification should play in the research process. We also offer a 2 × 2 matrix to distinguish theoretical identification from statistical identification and we underscore why all research should feature both. Finally, we highlight the outstanding articles that are published in this issue. Almost 30 years ago, Sutton and Staw (1995) provided an excellent overview of theory. They expressed it as a story that: (1) offers why and how behaviors, events, and structures occur in a causal nexus and (2) explains both order and timing. Sutton and Staw explain that strong theory delves into underlying processes by detailing the systematic rationale for what does (or does not) occur. This explanation of theory clearly aligns with the concept of mechanisms – i.e., how, and by what steps, dependent variables or outcomes follow from a set of independent variables or initial conditions (Mayntz, 2004). It is common for researchers who use archival data to hesitate in offering mechanism-based explanations in hypotheses development. This has been a long-standing issue. Sutton and Staw (1995) stated that, “Providing a deep theory, in which intervening mechanisms or processes are spelled out in graphic detail, may likewise lead to objections that only the antecedents and consequences of the model are measured … the author is careful to avoid mentioning any variables or processes that might tip off the reviewers and editors that something is missing in the article. Peripheral and intervening processes are left out of the theory” (p. 380). Editors and reviewers often object to authors providing a rationale for hypotheses that include unobserved processes, but leading editors and top reviewers “always” object to authors providing insufficient rationale in hypotheses development. There is frequently good reason for both, so how does one strike a balance between motivating hypotheses using unobserved mechanisms without stretching a mechanism beyond what is believable? Our review of the literature suggests a few common features of theoretical identification using mechanism-based explanations, and we use a recent Journal of Business Logistics article from Darby et al. (2022) to provide context for these features. The Darby study sources archival data to examine how information from financial institutions has both symmetric and asymmetric effects on exchanges between farmers and customers in agricultural supply chains. The authors hypothesize that changes in the futures market price of a commodity positively affects the commodity's spot price, which in this context is the cash price that rice buyers pay farmers. The mechanism-based explanation is that the real-time price information from the futures market conveys macro-level supply and demand conditions, and thus it influences how farmers and customers negotiate to the spot price. For example, if the futures market price goes up, farmers will feel like they have more negotiating leverage, while a futures market price decrease gives the advantage to the customers. The authors further theorize that information about local market conditions can create divergences between the futures market price and the spot price. We want to highlight four elements from the Darby et al. (2022) article that add to the credibility of the unobserved processes that they claim are operating and explain the hypothesized relationships. We offer a summary of these four characteristics in Table 1. First, the mechanism-based explanation has external support by integrating logic from established theories (Ylikoski & Kuorikoski, 2010). The authors use Penrose's (1959) early version of the Resource Based View of the Firm to infer that farmers and customers will acquire and use information from the futures market, and they use information economics literature to explain how both parties leverage the information (Stiglitz, 2000). Second, the mechanism they offer operates at a level below the unit of analysis. It is often the case that these ‘higher level’ theories – such as this one – involve complex social processes that occur at the individual level (Stinchcombe, 1991). In this case, the prediction was at the market level, but the mechanism proposed (i.e., acquiring market information and leveraging it) operates at the individual level (i.e., negotiations) between farmers and customers. The third element involves the authors detailing how the study's context activates or suppresses the theorized mechanism (Astbury & Leeuw, 2010) by explaining how price adjustments occur over time. This is consistent with the context + mechanism → outcome (C + M → O) logic from Pawson and Manzano-Santaella (2012) and further explained by Stank et al. (2017). These studies highlight the importance of including contextual factors when devising hypotheses. Being context explicit can make mechanism-based explanations more precise, reducing alternate explanations (Ylikoski & Kuorikoski, 2010) and identifying theoretical boundary conditions (Goldsby et al., 2013). In this study, leveraging information about local market conditions can create divergences that have asymmetric effects on the relationship between the futures market price and the spot price. The fourth aspect of a strong mechanism-based explanation is that it has strong backing (Ketokivi & Mantere, 2021), or in other words, there are independent reasons for accepting the plausibility of the postulated processes (Sætre & Van de Ven, 2021). One approach that can be especially impactful is collecting primary data from individuals who are embedded in the context and actively engage in the unobserved processes. Darby et al. (2022) demonstrates this through extensive interviews with farmers and customers who were engaged in the exchange process. As is common in mixed approaches, their research strategy created a much deeper understanding and allowed the authors to offer a clear and compelling rationale towards explaining the hypothesized relationships. Querying industry experts and participants is an increasingly common practice as it provides a stronger backing for underlying mechanisms in archival studies. We highly recommend this approach to reduce concerns in the review process that stem from utilizing archival data.1 Another approach to demonstrate strong backing is to highlight how the proposed mechanism is analogous to other mechanisms that are perceived to have a high degree of credibility (Ketokivi et al., 2021). Additionally, manuscripts can detail how the proposed mechanism explains a wide variety of effects in the current study (Thagard, 1978) and how it can reconcile inconsistent findings from prior studies (Laudan, 1977; Miller et al., 2018). Because any study should support both statistical and theoretical identification, we offer a 2 × 2 matrix in Figure 1 that illustrates each source of identification as weak or strong. The top right quadrant is labeled as reflective of the research previously discussed in this editorial, and it is most likely to proceed past a desk rejection in the review process because these studies can be statistically identified by rigorous analysis and theoretically identified by strong mechanism-based explanations. Alternately, the lower left quadrant is reflective of studies that are weak in both areas. Such studies are unlikely to ever be published in top journals. Consequently, we focus our attention on the remaining two quadrants. The lower right quadrant illustrates studies with strong theoretical identification, yet the statistical identification is weak. The hurdle to successful publication here is generally from weak design. While the theoretical arguments are well-crafted, there are severe concerns about statistical identification. Weak design in this context can stem from several sources. First, the data sources available may not capture measures necessary to operationalize critical control variables. For example, given the central role that firm size plays in shaping firms' actions and outcomes, a study that examines how firms' R&D investment affects firm-level innovation (measured using patent counts) without controlling for firm size would diminish statistical identification. This is because the coefficient for R&D investment could also be a proxy for forces associated with firm size yet distinct from an underlying innovation mechanism. Another source of weak design with archival data occurs when researchers assign an interpretation to an observed measure for which there is little logical justification (Ketchen Jr et al., 2013). For example, it could be difficult to justify that a measure of advertising investments in an archival data set is a proxy for asset specificity as presented in transaction cost economics (Williamson, 2005). A third cause of weak research design is that statistical results can be explained by common method effects, which stems from the response process that underlies the archival data (Miller et al., 2021). If researchers find themselves in the first two instances, the best course of action is to modify the research questions so that statistical concerns are addressed. If researchers find themselves in the third scenario, the best course of action is to augment the existing data with other data sources, eliminating concerns about common method variance. Moving to the upper left quadrant, these studies have strong statistical identification, yet the theoretical identification is weak. The issue that creates concern here is whether the statistical identification is a result of “data dredging.”2 A study can offer table after table of regression results showing that the effect of X on Y is robust to numerous alternative statistical specifications, yet the logic for why X should impact Y is inadequately developed. This quadrant represents a repeating problem for researchers who submit to the Journal of Business Logistics. Desk rejections occur when authors poorly articulate mechanisms (Ketokivi & Mantere, 2021). We offer three primary reasons that papers fall into this category. First is the mistaken pursuit of trying to produce “interesting” research (Tsang, 2022) by assembling an archival dataset to regress some dependent variable on a focal predictor for which there is little intuitive understanding as to why the relationship might exist in the first place. Second, researchers attempt to utilize predictors that reside at a much higher unit of analysis to explain variance in a dependent variable at a far lower unit of analysis. As a hypothetical example, a study could propose that a country's logistics performance using The World Bank's Logistics Performance Index affects the frequency of leveraged buyouts. The issue with proposing this relationship relates to a multitude of more micro-level explanations that tend to shape leverage buyout behavior.3 Third, researchers may try to link a supply chain related predictor at the firm level to an outcome that is highly distal from that predictor. An example here is the numerous factors that affect company-level return on assets (ROA) relative to a variable like inventories, so isolating a theorized mechanism tends to be much harder if the former measure is the dependent variable, a concern also noted by Richey et al. (2022). This scenario requires a stronger mechanism-based explanation for the hypothesized relationships. We encourage (i) returning to literature to draw on existing theory, (ii) providing evidence through the collection of primary data (e.g., industry experts), or (iii) altering the focus of the research such that theoretical isolation is facilitated. In summary, while archival data can allow L&SCM researchers to push the boundaries of existing knowledge, archival data come with their own set of challenges and, importantly, there are research questions that can only be addressed with primary research, so dismissing the use of primary data methods does not serve the discipline. The overarching point of this editorial is twofold: to encourage the use of archival data to address a wide range of L&SCM phenomena and to underscore that a limitation is its inability to capture underlying mechanisms that explain relationships between observable variables. Overcoming limitations requires that studies offer statistical identification by having strong research design to rule out alternate explanations. It also requires theoretical identification by providing convincing and plausible mechanism-based explanations, and when possible, it is worthwhile to collect feedback from industry participants to provide backing (Ketokivi & Mantere, 2021) for these proposed explanations. It should also lead to insightful implications for practice, which we continually stress (Richey & Davis-Sramek, 2022). This issue includes ten exceptional papers providing support to a multitude of concepts. The first three papers cover some of the hottest topics in our field today. The second three delve into the human element of L&SCM. The final four consider a broad set of important aspects central to supply chain strategy in the 21st century. We will briefly discuss each as a prelude to reading the manuscripts. The hottest topics in L&SCM today center on transforming the way we manage and theorize supply chain strategy and logistics implementation. Issues like resilience, responsiveness, risk, internet of things/technology, and distant markets have come to light as increasingly important. The first paper by Essuman et al. (2023) tackles the topic of improvisation in L&SCM. Improvisation is easily the most understudied of the View (Richey et al., 2022) by The authors that is critical to by detailing a link from supply chain to supply chain of theory, the authors explain how and improvisation during supply chain primary data, improvisation is to have a relationship to improvisation is research should consider of improvisation and their given This is one of the first papers from that has been published in We are about from these authors and other from this important and of the In paper et al. (2023) examine an important but understudied – the of the in and in of the a set of case the authors find that participants at the often to the and their to development L&SCM is often in this context, our to a more in the and social aspects of et al. (2023) examine understudied in and of An the of to the authors detail how have recent in the et al., et al., the authors examine the level of this theorizing that higher of price, and of The results support the use of as a to including for and address issues related to the human in L&SCM. We are highly in in this of study as can be in a published First, in et al. (2023) consider the importance of in on the A for supply chain As a point of for most are to understanding social theory, the authors examine how firm-level to identification. These are more and support impact to and in the the impact of identification. et al. (2023) in are the and in the Logistics to new L&SCM human questions that to be on the of so of the industry our is when L&SCM are and often are and The result is and The authors of these by focus theory, the authors examine between and the and growth are as at reducing for to the and is as for motivating high in The findings from this and archival data study are and can reduce and associated The final context paper is and An by et al. We often the link between and a study four the authors how to The findings a of and should consider all three as they to their approach to to a need for The final four manuscripts are in but all have important L&SCM strategy First, et al. (2023) and on the A of management are more in so this is a research on recent studies et al., 2022) further and the authors find that L&SCM on is important relative to frequency of the of and the of in between Performance and in a The of on and et al. (2023) strategy and we that variety to to frequency and We also that variety is associated with data provided by a over a the authors develop a model that and and need to that variety strategy consider the or of the when In the second to et al. (2023) the to The Performance of and the of The of our research occurs at the level, but we examine the size that occurs in We do that are The authors examine and implications in to their study is the should with This will more important as are and to new and theory, the authors how influences the of They also consider how a these relationships. that with financial but it affects performance in the In a can performance the We need more in this area. In the but highly important study, et al. (2023) or to the from the the severe are their supply and have the two of are problematic but of these adjustments are grounded in data from of the study that and buyers the impact of supply support that buyers The authors this by for to research and

物流与供应链管理研究方法档案数据理论识别统计识别