March 28, 2009

Exploratory Data Analysis: Stratification

The primary purpose Exploratory Data Analysis (EDA) is to identify the key variables that affect the quality measures.   Two principles, mentioned by De Mast and Trip (2007), are helpful in identifying these variables.  They are:

  • Display the distribution of the data
  • Display the distribution within individual stratum

Chang and Lu (1995) provide an example illustrating these principles.   A steel sheet metal manufacturer had customers complaining about uneven thickness.  The specification was 4.5 ± .5 mm.   The production manager had data collected from 120 sheets giving the thickness measurements on the left, middle and right sides of the sheets.   Employees selected five sheets at shift times of 0900, 1100, 1400 and 1700 over a period of five days.   The histogram appearing below shows 13% of the sheet thickness measurements below the lower specification limit of 4.0 mm.   Also, the mean is lower than 4.5 mm. 

After discussions with shop-floor personnel, they stratified by position on the sheet and by time.   Histograms for the two stratifications appear below.   The stratification by position did not show distributions much different than the aggregate distribution.   However, the stratification by time showed higher frequencies of thin measurements at 1100 and 1700.  Twenty four of the 26 values in the histograms below 4 mm, 24 of them were at 1100 and 1700. 

Discussions with shop-floor personnel identified mold wear out, build up of chips in a work holding device, and operator fatigue as possible causes.   The corrective action was to take a 10 minute break at 1030 and 1630 each day and have maintenance performed during the breaks.   The corrective action produced a substantial reduction in thin sheets.

References

  1. Chang, P.-L. and K.-H. Lu (1995). "The Construction of the Stratification Procedure for Quality Improvement." Qualilty Engineering 8(2): 237-247.
    Chang, P.-L. and K.-H. Lu (1995). "The Construction of the Stratification Procedure for Quality Improvement." (2): 237-247.
  2. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

February 15, 2009

Exploratory Data Analysis: Limitations

De Mast and Trip (2007) specify that the purpose of Exploratory Data Analysis (EDA) is to identify the dependent (Y) and independent (X) variables that may help understand or solve a quality problem.   However, they point out that EDA can only identify variables that vary in the collected data set.  If the EDA can not identify key variables affecting the system performance, available options include:

  1. Collecting additional data and revising the variables recorded
  2. Analyzing the available information, designing experiments, and conducting the experimental design

Option 1
The Pease Industries example, described in the posting on 3/4/2008, illustrates the first option.   A team wanted to reduce an 11% defect rate in glass inserts for a wooden entry door.  They thought that humidity and temperature variations were the cause.   They collected data and did a regression analysis where the dependent variable was the number of defects and the independent variables were temperature and humidity.   They found no correlation.   Then the team collected additional data, and they examined defect occurrence as related to part type, monthly occurrence and day of the week.   They found that the defect rate varied with the day of the week.  After investigating why the day of the week was important, they determined that dirty molds caused the elevated defect rate.

Option 2
The posting on 2/28/2008 describes a case study illustrating the second option above.   A company was experiencing excessive variation in its grinding operation.   A team conducted a brainstorming session to identify key factors causing the variation in the grinding operation.   The brainstorming session produced a Cause & Effect diagram.   The posting on 9/15/2008 describes an experimental design conducted to determine which factors were most significant.  The posting on 10/16/2008 describes the analysis of the experimental results.  The company improved the grinding process performance index from .49 to 1.25.

References

  1. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

February 10, 2009

Exploratory Data Analysis: Key Steps

De Mast and Trip (2007) list the following three steps in performing Exploratory Data Analysis.

  1. Display the Data
  2. Identify salient features
  3. Interpret salient features

The resin output variation example, 3/25/2008 posting illustrates these steps.   The Ricoh team constructed a histogram of the output quantity (Display the data), noticed the bimodal nature of the output quantity (Identify salient features), and this bimodal distribution suggested that the output distributions from lines A and B were different (Interpret salient features).   Histograms of line A and line B output confirmed this conclusion.    Another salient feature of the histograms was the excessive variation in output quantity.   This feature motivated establishment of lower and upper limits and a target value.

References

  1. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

December 22, 2008

Exploratory Data Analysis: Resin Output Variation Example

The postings on 3/21/2008, 3/25/2008, 3/28/2008 and 4/1/2008 present the Resin Output Variation Example to illustrate Statistical Thinking and the Hoerl-Snee Process Improvement Strategy.   This example also makes extensive use of Exploratory Data Analysis.

Ricoh’s Numazu plant made raw material used as ingredients for copy machine toner.   The company had a team which monitored the process in order to achieve continual quality improvement.  

  • The team examined a Resin Yield Run Chart and noted that the yield ratio sometimes exceeded 1.0 which was theoretically impossible.   The 3/21/2008 posting includes this run chart.   The run chart indicated the presence of an assignable cause which they eliminated by preventing a drop in air pressure.  However, yields still exceeded 1.0.  They decided that this result was due to variation and suspected that this variation would degrade finished product quality.   The team started an effort to discover the source of variation and to make changes to eliminate it.
  • The team collected output quantity data and constructed histograms.   The posting on 3/25/2008 shows the histograms.   The overall output quantity histogram clearly shows two peaks indicating a combination of two component distributions.   After the second batch processing step, the process splits a batch into two parts which are processed on two separate lines, i.e., line A and line B.   The posting on 3/25/2008 shows histograms for the output quantities of the two lines.    Comparing the line A and line B histograms shows that the two lines have different distributions for their output quantities.
  • Next the team constructed a Cause & Effect Diagram to show potential causes of the output quantity variation and the differences between the two lines.   The 3/25/2008 posting presents this Cause & Effect Diagram.    The posting on 3/28/2008 describes the identification and elimination of two variation causes.  They investigated the procedure for dividing the resin after the second processing step.   They discovered that some material remained in the reaction tank after sending material to the two lines.  This mean that line B had less input and thus less output than line A.   They changed the dividing procedure and found no significant difference between the output quantities of the two lines. 
  • The second potential cause described on the 3/28/2008 posting involved the solvent feed ratio.   They constructed a scatter plot showing that increasing solvent feed ratio was correlated with increasing output.   This correlation was inconsistent with the team’s understanding of the physical process.    They found that the ratio measurement was affected by the time the solvent was in the tank.   They changed the procedure to insure the solvent had stabilized prior to measurement.   Examination of a control chart showed that the variation in output quantity was still excessive.
  • The posting on 4/1/2008 describes the elimination of a third cause, and the posting shows a control chart.   This control chart clearly shows a significant reduction in output variation.  

The exploratory data analysis included examination of four different graphical displays.  They are a run chart, histograms, a scatter plot, and several control charts.  De Mast and Trip (2007) points out that Good (1983); Hoaglin, Mosteller et al (2000); and Bisgaard (1996) note that graphical presentations are preferred in Exploratory Data Analysis.   They are more effective is showing an individual what he did not expect to see.

References

  1. Bisgaard, S. (1996). "Qualilty Quandaries: The Importance of Graphics in Problem Solving and Detective Work." Quality Engineering 9(1): 157-162.
  2. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.
  3. Good, I. J. (1983). "The Philosophy of Exploratory Data Analysis." Philosophy of Science 50(2): 283-295.
  4. Hoaglin, D. C., F. Mosteller, et al. (2000). Understanding Robust and Exploratory Data Analysis. New York, John Wiley & Sons, Inc.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

December 15, 2008

Exploratory Data Analysis: Defect Reduction Example

Bisgaard (2006) gives us an example where Exploratory Data Analysis leads us to narrow the scope of the quality improvement investigation.   The example involves the production of small outboard motors by an assembly line.    Monthly quality reports showed an unacceptable number of defective motors that caused costly rework.   The Vice President of Manufacturing formed a team for the purpose of reducing the number of defects.  

The first task performed by the team was to flowchart the assembly line.   This is consistent with the first step in the Hoerl-Snee Process Improvement Strategy (See the posting on 4/8/2008).   After the base motors were painted and dried, the motors traveled on a ten station line for the purpose of installing accessory components.   These accessory components included the carburetor, brackets, the propeller, and electrical systems.     Next, the team examined tables specifying defects and their type.    The team found the tables difficult to analyze.   To assist the analysis the team constructed Pareto charts specifying the defects by type of defect.   For example, missing fasteners, loose fasteners, and missing operations.  These Pareto Charts did not suggest principal causes.  The team decided to categorize the defects by the station on the line where the defect originated.   For example, a loose fastener on the carburetor, the defect originated at station 3.   An incorrectly mounted spark plug wire would have occurred at station 9.  The Pareto Chart categorizing defects by station appears below.

The team focused on station 9.  The workers on the line revealed that design of the motors had changed leaving station 9 with more work than the other stations.   The team redesigned the assembly line reducing the work load at station 9.   A number of other changes were made such as improved lighting.   The result was a dramatic reduction in the occurrence of defects.

De Mast and Trip (2007) claim that this example illustrates the use of exploratory data analysis change the focus of the problem from “too many defects” to “too many defects from station 9”.  They state that the example illustrates the use of exploratory data analysis to identify a KPOV.  

My viewpoint is that this example illustrates the identification of a KPIV.   That is, the assembly line station.    Admittedly, the next phase of the improvement effort was clearly more focused on station 9.    We must remember that quality improvement is often an iterative process.   That is, successive Plan-Do-Check-Act (PDCA) cycles.   Identifying a KPIV on a cycle may result in that KPIV being a KPOV on the next cycle.

References
  1. Bisgaard, S. (1996). "Qualilty Quandaries: The Importance of Graphics in Problem Solving and Detective Work." Quality Engineering 9(1): 157-162.
  2. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

November 26, 2008

Exploratory Data Analysis: Molding Operation Example

The purpose of Exploratory Data Analysis (EDA) is to generate hypotheses or clues that guide us in improving quality or process performance.  Breyfogle (2003, pgs. 10-11) views Six Sigma as a murder mystery where we use a structured approach to uncover clues that lead us to improve process outputs.   These clues are Key Process Input Variables (KPIVS) and process improvement strategies.  As an example, he considers the process of traveling to work where the Key Process Output Variable (KPOV) is the arrival time.   Examples of KPIVs are the setting of our alarm clock and our departure time.   An alternative process improvement strategy might be a different travel route that is less subject to variation during congested time periods.   Then, the route selected is another KPIV, and the travel time along that route is a function of both the route and departure time.   Exploratory Data Analysis helps us identify these KPIVs.

De Mast and Trip (2007) state that the purpose of EDA from a quality improvement project viewpoint is to identify the dependent (Y) and independent (X) variables that may help understand or solve the quality problem.   The dependent Y variables are KPOVs, and the independent X variables are KPIVs.  Leitnaker (2000) gives an example of EDA to identify KPIVs.  The example is a molding operation where:

  • Yields are erratic
  • Parts are produced that do not meet specifications
  • Shipment schedules are not consistently met

A team studied a molding operation supplying plastic switches to industrial customers for use in assembled control pads.   The operation has eight machines, each machine has two molds, and each mold has four cavities.  To investigate the process capability, the team took a sample of size 5 from the output of one machine every 4 hours.   The following control chart displays the results for a critical dimension.

The process is in control, and the range chart supported this conclusion.  But the variation is large.  Next the team investigated the effect of the cavities and molds on the measured dimension.   To do this, they sampled one part from each of the four cavities of the two molds on one machine.   Breaking down the data by cavity and mold is an example of stratification.  Control charts for the individual cavities and molds showed that all cavities and molds appear to be in control. However, mold 2 cavities have larger averages than mold 1 cavities, and the averages for the cavities increases with cavity number.  The following figure clearly shows this pattern.

The figure leads us to identify mold and cavities numbers as KPIVs.   The exploratory data analysis produced a clue which generated a search for the reasons that molds and cavities produced different average dimensions.  The team can proceed to reduce the variability in the measured dimension by reducing the differences in averages for the molds and cavities.

 

 

 

 

References

  1. Breyfogle, F. W. (2003). Implementing Six Sigma. Hoboken, New Jersey, John Wiley & Sons, Inc.
  2. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.
  3. Leitnaker, M. G. (2000). Using the Power of Statistical Thinking, Special Publication of the ASQ Statistics Division, Summer 2000.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

November 19, 2008

Exploratory and Confirmatory Data Analyses

This posting describes the difference between Exploratory Data Analysis (EDA) and Confirmatory Data Analysis (CDA).  Tukey (1977) distinguished between EDA and CDA.   Confirmatory Data Analysis tests hypotheses and produces estimates with a specified precision.   Regression analysis, Analysis of Variance, and Hypothesis Tests are examples of Confirmatory Data Analysis.  Confirmatory Data Analysis requires hypotheses or assumptions to consider and evaluate.

Exploratory Data Analysis makes few assumptions, and its purpose is to suggest hypotheses and assumptions.   Consider the OEM manufacturer described in the posting on 1/30/2008.  The company was experiencing customer complaints.   A team wanted to identify and remove causes of these complaints.   They asked customers for usage data so the team could calculate defect rates.   This started an Exploratory Data Analysis.   The team plotted a control chart, and these charts identified a high defect rate in October, 1991.   The investigation established that a supplier used the wrong raw material.   Discussions with the supplier and team members motivated further analysis of raw material, and its composition.   This decision to analyze raw material completed the Exploratory Data Analysis.   The Exploratory Data Analysis used both data analysis and process knowledge possessed by team members.  The supplier and company conducted a series of designed experiments which identified an improved raw material composition.   Using this composition, the defect rate improved from .023% to .004%.   The experimental design and its analysis was Confirmatory Data Analysis.  Note that the experimental design required a hypothesis generated by the Exploratory Data Analysis.

Tukey states that EDA is detective work.   He uses the criminal justice process as an analogue to illustrate the roles of EDA and CDA.   A detective investigating a crime needs both tools and understanding.   The detectives and other investigative units search for and produce evidence.  The juries and judges evaluate the evidence’s strength.   Exploratory Data Analysis uncovers statements or hypotheses for Confirmatory Data Analysis to consider.   Experimental design and regression modeling are more effective if Exploratory Data Analysis uncovers precise statements or hypotheses.   Admittedly, one can conduct experiments searching for hypotheses; however, our viewpoint is that preliminary Exploratory Data Analyses may reduce the costs of these experiments.

Exploratory and Confirmatory Data Analyses can be thought of as part of statistical thinking.   De Mast and Trip (2007) present principles for more effective EDA in quality improvement projects.  We will examine results from their paper in future postings.   Their paper won the Nelson award for the paper having the greatest immediate impact for practitioners published during 2007 in the Journal of Quality Technology.

References

  1. John W. Tukey (1977). Exploratory Data Analysis, Addison-Wesley Publishing Co.
  2. de Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

October 16, 2008

Design of Experiments: Grinding Process Example (Part 4)

This posting continues the grinding process case study (Gigo, 2008) that illustrates the use of design and analysis of experiments to reduce common-cause variation.  We present the results of the analysis of the experiments specified in the 9/18/2008 posting (Part 2).  

The following figures display graphically the relative significance of the six factors, i.e., A, B, C, D, AB and AC.   The figures show the average response at the factor low (-1) and high (+1) values.    Factors B and C are not nearly as significant as factors A and D since the average responses of B and C are nearly the same at their low and high values.  That is, a change in the factor levels for factors B and C has little effect on the response.  Also, the interaction factor AC is more significant than the interaction factor AB.  

We can test the significance of the factors using an Analysis of Variance (ANOVA).   Refer to Montgomery, Peck and Vining (2006).   Let SST be the total sum of squares.   That is:

where Yi is the response on experiment i and ybar is the average response over the 8 experiments.   That is, SST is the sum of the 8 squared deviations between the experiment responses and the average response.   The value of ybar is 49.582, and the value of SST is 118.151.   Then we partition SST into a sum of squares due to the estimated effects (SSR) and a sum of squared deviations from the estimated effects (SSRES).  That is, SST = SSR + SSRES.  The value of SSR is the same as a sum of squares due to an estimated regression function when we have a two-level experiment.   Consider the contribution of factor A to SSR.    The posting on 9/18/2008 gives the estimated effect of factor A to be -6.067.  That is the difference between the average of the responses at the low values of factor A and the high values of factor A.    Thus the estimated average response at the high values of factor A is ybar - 6.067/2 = 46.5485.  Similarly, the estimated average response at the low values of factor A is  ybar + 6.067/2  = 52.6155.   The deviation between the mean response and the effect of A conditioned on whether A is high or low is 6.067/2.   Since we have 8 experiments, the contribution of factor A to SSR is 8*(6.067/2)2 = 73.60788.   For factor D and the interaction effect AC, the corresponding contributions to SSR are 18.67308 and 11.38575.   Thus, SSR is 103.6667.   The value of SSRES is SST – SSR = 14.48432.  We can test whether these three factors are statistically significant using the F statistic.    The F statistic assumes that the individual responses have a normal distribution.   The F statistic is:

where dfR = degrees of freedom for SSR = 3 (the number of factors),
dfRES = degrees of freedom for SSRES = 8-1-3 = 4 (we loose one degree of freedom due to estimating the mean and 3 due to estimating the 3 factor effects.
We can tell whether this value of F is statistically significant by calculating its PValue.    The PValue is the probability of obtaining this value of F, i.e., 9.543, or higher by chance when the factor effects have at true value of zero.   The PValue for this F is .027.    Usually, we regard a PValue as statistically significant when it is less than .05.   Thus the factors A, D and AC are statistically significant.   If we attempt to add a forth factor, i.e., AB,  the PValue becomes .0625; thus, we do not include AB. 

Higher values of the response S/N are desirable.   Thus, the low value of factor A (feed rate of .0008 mm/Revolution) and the low value of factor D (wheel grade of A54) are preferred.  Since the low value (-1) of the interaction effect AC is preferred, we select the high value of factor C which is a work speed of 360 RPM.   For the insignificant factor, the team chose its low value ( a wheel speed of 2200 RPM).

The posting on 2/28/2008 reports that the preferred factor levels specified above improved the process performance index (Ppk) from .49 to 1.25.   This is based on a sample of 40 parts.   The posting on 5/1/2008 defines the process capability index Cpk.   Process capability indices assume the process is stable.   When we have insufficient evidence the process is stable, we call the capability index a performance index and use the same equation.   

References

  1. Montgomery, Douglas C., Elizabeth Peck, Geoffrey Vining (2006). Introduction to Linear Regression Analysis, John Wiley & Sons, p26.

Social Bookmarking:
Digg, delicious, reddit, NewsVine, Furl, Fark, Google, Spurl, BlinkList, Simpy, StumbleUpon, BlogMarks, Facebook

Hosted by
American Society for Quality

Powered by
Movable Type 3.2

The blog authors are ASQ members. Their opinions are their own and may not reflect the opinions of ASQ or its membership as a whole. Members of the public are invited to comment.