重新思考六西格玛：数字时代从实践中学习

Rethinking Six Sigma: Learning from practice in a digital age

JOURNAL OF OPERATIONS MANAGEMENT · 2023

被引 1

人大 AFT50UTD24ABS 4*

Suzanne de Treville · 洛桑大学通讯
Tyson R. Browning · 德克萨斯基督教大学
Matthias Holweg · 牛津大学
Rachna Shah · 明尼苏达大学

中文导读

指出运营管理领域在从统计过程控制转向六西格玛时缺乏审慎，接受了有统计缺陷的启发式方法，呼吁回归第一性原理并重新审视六西格玛的理论基础，对学者和实践者均有警示意义。

Abstract

As scholars in the field of operations management (OM), we would like to suggest that our field fell short in terms of due diligence when transitioning from statistical process control (SPC) to Six Sigma—accepting without scrutiny, building theory around, and teaching heuristics and algorithms without recognizing its underlying statistical inaccuracies. It is our view that these incorrect heuristics and algorithms have introduced bias and inefficiencies in process improvement throughout the OM field, contributing to a disconnect between OM and knowledge development in data science more generally. We call for a return to first principles and the establishment of formal conceptual definitions for the theory and methods underlying Six Sigma. We urge the OM academic community to embrace the lessons from SPC and Six Sigma so that we prioritize our due-diligence role, beginning with a requirement that all algorithms and tools be vetted before entering our curricula and case-study repertoires, especially as we move forward into an age of big data and potentially further opaque algorithms and tools. We propose that our top journals be open to research that scrutinizes methods developed in practice, so that OM will continue to be the focal field for quality assurance—even when the “product” of a process is data. The application of statistical methods to quality management has been a central theme in OM since Shewhart's seminal work at Western Electric's Hawthorne Works nearly a century ago (Shewhart, 1925; Shewhart, 1926). He studied process variation to determine “how and under what conditions observations may contribute to a rational decision to change or not a process to accomplish improvements” (W. Edwards Deming, p.i, in the foreword of the 1986 edition of Shewhart, 1931). His work fostered methods and tools to monitor, diagnose, measure, reduce, and control variation in the output of a process to increase its consistency and capability. The capability of a process with respect to a process parameter hence can be defined as the number of standard deviations (“sigmas”) of the parameter that fit between the mean for that parameter and its specification limits. If the process is not centered between the specification limits, then the process capability is set using whichever specification limit is closer to the process center. Six Sigma moves the concept of process capability from descriptive to prescriptive. At the time that Motorola's process-improvement ideas were proposed by Bill Smith in 1986 (Harry, 1994), process capability was defined in terms of specification limits set three standard deviations from the process mean (a “3σ process”). Following Shewhart's logic, a centered and normally distributed 3σ process is expected to produce 2700 parts-per-million (ppm) pieces that are more than three standard deviations from the process mean. Motorola quantified the “zero defect” philosophy at the heart of the “quality is free” argument proposed by popular and influential authors like Crosby (1979) by translating the number of standard deviations that fit between the process center and the tightest specification limit into ppm defects. Historically it was assumed that a 3σ process capability was good enough. The 2700 ppm out-of-specification pieces produced by a 3σ process capability, however, is too high in many contexts. Harry described Bill Smith as demanding a higher standard: “Bill's proposition was eloquently simple. He suggested that Motorola should require a 50% design margin for all its key product performance characteristics” (Harry, 2003, p. 1). This buffer, which adds another three standard deviations between either side of the mean and the specification limits (as illustrated in Figure 1), marked the inception of the “Six Sigma” concept. Source: Harry (2003, p. 27) Up to this point, Motorola's approach can be seen as a protocol that adds intuition and goal setting to a simple application of the theory of process capability. Had the Motorola project stopped there, a refined and stricter protocol might have seen the light of day, resulting in an incremental but eminently sensible extension to SPC as applied to high volume, repetitive-manufacturing contexts. Unfortunately, this is not what happened. Several approaches to the application of these core concepts—artifacts of the limitations of practical application at the time—are difficult to justify today. In the following subsections, we will address a few of the more notable of these legacy issues, and how they might be re-evaluated and addressed in current practice, teaching, and research. Harry and his team—tasked with implementing Smith's vision—noted two challenges to resolve. First, the sample mean is only an estimate of the population mean, so confidence intervals need to be considered. Second, the distribution of the process output risked changing over time when something changed in the process—such as loss of calibration, a tool losing its edge, or a setup error—causing a change in the process mean or standard deviation. Harry wanted to ensure that these factors would be taken into consideration when defining what was to be promised to the customer. They proposed adjustments, but their proposals were deemed to be too “technical” and “complicated” by management and so were disregarded (Harry, 2003, see p. 6ff). In short, the idea of an expanding and contracting standard deviation was found to be outside the realm of ‘common sense reasoning’ without the provision of statistical instruction. However, the idea of a ‘shift correction’ carried high appeal and inevitably promoted lively and meaningful discussion – without the prerequisite education. Therefore, those of us at Motorola involved in the initial formation of Six Sigma (1984-1985) decided to adopt and support the idea of a ‘1.5 Sigma equivalent mean shift’ as a simplistic (but effective) way to account for the underlying influence of long-term, random sampling error. Harry and colleagues thus proposed that the estimated defect level for a given process and set of specification limits should be adjusted to take into consideration the possibility that the process mean could be closer to one or the other specification limit by 1.5σ, as illustrated in Figure 2: The distribution of the process parameter remains normal, but—rather than being centered between the specification limits that are 6σ from the mean—is now 4.5σ from one specification limit (implying 3.4 ppm defects on that side) and 7.5σ from the other. Setting a process-capability goal that results in 3.4 ppm defects—a 4.5σ process-capability level—thus required reducing the standard deviation to fit six standard deviations between the true process center and the specification limits on both sides to allow room for this expected shift in the process center. Harry seems to have misinterpreted the factor of 1.5 as the allowance for the shifts in the mean of a single component, due to its being manufactured in different lots. As suggested by Bender and Gilson, it might make sense to inflate the estimator of the standard deviation of the assembly to allow for shifts in the means of individual components. It does not make sense, however, to allow for a 1.5-sigma shift in the process mean of individual components. Whether Harry's proposal of the 1.5σ mean shift came from confidence intervals around the population mean or from Bender's and Gilson's 1.5σ allowance to be used in tolerance (or both), it had important consequences: Most writings on Six Sigma interpreted the shift literally, meaning that Six Sigma was built around the expectation that the process mean would shift by 1.5σ. Standard Six Sigma reference guides like the Goal/QPC Black Belt Memory Jogger state that “on average, short-term process means tend to shift and drift by 1.5 sigmas” (GOAL/QPC, 2002, p. 45). From a SPC point of view, such a 1.5σ shift induces a fundamental contradiction. It means that the process is no longer in statistical control. A process is defined as running under statistical control with respect to a given parameter when its mean and standard deviation are not changing. If the process is operating under statistical control, then all variation is random (“common-cause variation”) rather than emerging from an assignable cause. When a process is no longer running under statistical process control because its mean has shifted, the appropriate corrective action is to eliminate the factor that caused the mean to shift so that the process is again centered. The effect of Six Sigma has been to get people thinking that it was no longer a top priority to ensure that the process was being run down the center, awarding a higher value to reducing the process standard deviation to make it possible to deliver a small number of defects even though the process was not centered. This change in thinking contradicted the generally accepted principles of SPC, yet Six Sigma gained acceptance with minimal challenge from the academic community. (…) the shift factor (1.5 sigma) does not constitute a ‘literal’ shift in the mean of the performance distribution – as many quality practitioners and process engineers falsely believe or try to postulate through uniformed speculation and conjecture. It is only here that Harry fully explained the source of the idea that the true process mean could lie 1.5σ from the estimated mean in either direction. It is not that the process mean shifts, but that decision makers should take into consideration confidence intervals in estimating it from process data. Harry suggested that a reasonable confidence interval would be around 1.5σ. If the estimated process mean is centered between the specification limits with six standard deviations on each side, then the true process mean could vary by 1.5σ. By promising the customer 3.4 ppm defects on one side—rather than 1 part per billion on either side—the supplier would fulfill the promised quality even if the true process mean was different from the estimate. Harry and his colleagues at Motorola clearly set in motion a framework that led users to conclude that process centrality no longer mattered. But, the damage would have been largely mitigated had the academic OM community scrutinized the Six Sigma methodology more closely. In fact, with the notable exception of some attempts to define the underlying theory of Six Sigma (de Treville et al., 2008; Schroeder et al., 2008) and identify its underlying structures and mechanisms (Anand et al., 2010; Braunscheidel et al., 2011; Linderman et al., 2003; Linderman et al., 2006), we as a field failed to question the core assumptions of Six Sigma and rather adopted it at face value in our syllabi. We would like to suggest that an archaic understanding of SPC created an environment in which confusion kept formal conceptual definitions from being fully used to maintain order and discipline in thinking. Impenetrable SPC algorithms that were largely developed in the 1950s continue to be used and taught without questioning, which establishes acceptance of concepts that are not understood. This acceptance without understanding gives rise to an environment in which mystique triumphs over the first principles that peer review is designed to tease out. Next, we explore initial ideas about how to overcome this archaic understanding of process capability and SPC. The ability to infer whether a process is running under statistical control with respect to a specific parameter remains as useful today as it was when Walter Shewhart developed the statistical control chart (Shewhart, 1926) nearly a century ago. SPC charts help a decision maker decide whether to intervene in an ongoing process. Intervening in a process that is running correctly tends to increase variability, but failing to intervene in a process that is running incorrectly is costly. Shewhart proposed that samples of measurements of a process parameter be collected from the process to assess whether it is more likely that the process is running normally, or, if the mean or standard deviation have changed, warranting intervention. Each sample mean (e.g., X ¯ ) is evaluated to see whether it is within three standard deviations of the estimated process mean. The standard deviation is the estimated population standard deviation divided by the square root of the sample size (n). The population mean and standard deviation are estimated from process data (typically at least 20 samples) at a time when the process appears to be running under control. The population mean ( μ ) is estimated as the mean of the sample means (e.g., X ¯ ¯ ). The generally prescribed method for estimating the population standard deviation (σ) was developed before calculators were readily available and so needed to be done using a slide rule. Patnaik (1950) expanded the works of Tippett (1925) and showed that the average range ( R ¯ ) of a set of samples was an unbiased (although noisy) estimator of the standard deviation, which made estimating σ feasible under these conditions. The estimate of the sample standard deviation was then calculated using R ¯ over the set of samples from the period during which the process was perceived as running under statistical control. Patnaik (1950) calculated the factor (denoted “d2”) used to transform R ¯ into an unbiased estimate of σ for a given n. Doing this for n = 5, for example, entails dividing R ¯ by 2.326. This value is then divided by the square root of n to estimate the standard deviation of the distribution of sample means. X ¯ SPC charts are drawn with the center line at X ¯ ¯ and control limits three standard deviations above and below it. It is common to denote 3 d 2 √ n as “A2” so that the control limits of an SPC chart are denoted as X ¯ ¯ ± A 2 R ¯ . If a given X ¯ is outside of these control limits, it is highly likely that the process is no longer running under statistical control. More than seven decades later, we in the field of OM continue to teach SPC using R ¯ with Patnaik's table of factors as the approach of choice for estimating σ. The SPC tables providing Patnaik's factors were first published by the American Society for Testing and Materials in 1951 (ASTM, 1951). It was none other than Walter Shewhart who chaired this Committee in the 1930s, while W. Edwards Deming was a member for many years during the 1950s when the SPC standards were set and published. These SPC standards remain in common use and are typically provided in OM textbooks (e.g., Cachon & Terwiesch, 2013, p. 204; Jacobs & Chase, 2024 , p. 381; Krajewski & Ritzman, 2002, p. 294; Nahmias & Olsen, 2015, p. 842), as well as textbooks on statistical quality control (e.g., Montgomery, 2019). While some adjustments and derivatives have been proposed over the years (e.g., Nelson, 1984; Western Electric Company, 1956; Westgard, 2009), the logic of how we define the boundaries of our control charts has remained unquestioned and unchanged. The essential step of estimating σ is thus carried out using factors that are presented without explanation and typically without reference. Patnaik and Tippett are not referenced as a source of these factors in any of the standard OM textbooks that we considered. Without these mysterious factors, the obvious way to estimate σ would be to use a spreadsheet function on the data collected from the process, recalling that it was taken from a process assumed to be functioning normally, with no assignable causes changing its μ or σ. The standard deviation of the data making up those samples will be unbiased, like that estimated using R ¯ , albeit considerably less noisy. Such an exercise would be well understood by average business students. The number of standard deviations to be used in setting the control limits could then be determined based on the nature of the tradeoff between failing to intervene when something has changed (Type I error) and intervening in a stable process (Type II error). In some cases, an X ¯ more than 1.5σ from X ¯ ¯ would be of concern. In other cases, one or more X ¯ s would need to be more than 3σ from X ¯ ¯ to elicit concern. Thus, a heuristic to estimate σ, designed long before the widespread availability of computers, continues to be taught unquestioningly as the correct way to do this estimation. Students who rightfully ask for an explanation of the mysterious factors are told that they should be taken on faith. Students, professors, and professionals lack a clear understanding of how σ was calculated. This has created an environment where one loses the ability to rely on statistical understanding, instead carrying out calculations using the factors based on blind faith. A student who has a strong understanding of standard deviations and normal distributions will be perplexed, assuming that there is some mysterious extra insight that comes from using the factors rather than a regular estimation of σ from the sample data. Other students who have not yet mastered the idea of the standard deviation in this context will be completely lost. If, however, σ is calculated directly, learning about SPC charts can contribute to general understanding of the normal distribution and the use of a standard deviation. It is time to relegate this mystical artifact to history so that prospective users of SPC can do the obvious calculations. We suggest that this blind-faith approach to estimating σ created an environment in which the 1.5σ mean shift of Six Sigma went easily unquestioned. Thus far, our discussion has focused on the use of descriptive tactics to inform prescription. What of prediction? Processes are defined as running outside of statistical control when one or more sample means are outside of control limits drawn during a time when the process was perceived to be running normally (i.e., appeared to be running under statistical control) or otherwise demonstrate non-random patterns suggestive of special cause. But, what if the lack of special was instead a of As we these with the standard tools of data science that are now estimation of variation possible an of in sample data. tactics would to such non-random and more approaches are that we clearly what we are for rather than following that we do not on blind faith. As we these to the side, we make for SPC to from statistical is not to that implementing Six Sigma has had a in many over the do we to the heuristics that SPC. We to to and the of our field to use the process to Six statistical and to process capability and SPC to allow to use these tools with research is to eliminate this of and the of ideas emerging from the of into a increase in peer review can it is for top journals to be open to is this of due in the when Six Sigma to top OM journals were not perceived as being open to this of and Treville conceptual published in an idea of the of research that could open the way to the that we are In the the idea of and The was that as was from a the resulting line would and Treville in what between two as is can when one or but such and can in a to This academic to a more understanding of the between and we suggest that should be open to conceptual that an of a that has from to these SPC and methods to and has caused confusion and kept OM theory and tools from to be to address than small samples of collected data from a statistical quality produce high data that is in By SPC and process capability to statistical we to like of and and using OM thinking and tools. use of tools like learning a statistical in a framework that has peer not opaque algorithms and We call for the OM community to that process capability and SPC be to statistical and formal conceptual recognizing Six Sigma as a protocol from the of that important but underlying assumptions are This return is essential if we are to remain to what is now in the of quality and data science more generally. We further call for to be a that the that ideas from to peer thus blind in with in and

运营管理质量管理六西格玛统计过程控制数据科学

阅读原文 ↗