Main

March 28, 2009

Exploratory Data Analysis: Stratification

The primary purpose Exploratory Data Analysis (EDA) is to identify the key variables that affect the quality measures.   Two principles, mentioned by De Mast and Trip (2007), are helpful in identifying these variables.  They are:

  • Display the distribution of the data
  • Display the distribution within individual stratum

Chang and Lu (1995) provide an example illustrating these principles.   A steel sheet metal manufacturer had customers complaining about uneven thickness.  The specification was 4.5 ± .5 mm.   The production manager had data collected from 120 sheets giving the thickness measurements on the left, middle and right sides of the sheets.   Employees selected five sheets at shift times of 0900, 1100, 1400 and 1700 over a period of five days.   The histogram appearing below shows 13% of the sheet thickness measurements below the lower specification limit of 4.0 mm.   Also, the mean is lower than 4.5 mm. 

After discussions with shop-floor personnel, they stratified by position on the sheet and by time.   Histograms for the two stratifications appear below.   The stratification by position did not show distributions much different than the aggregate distribution.   However, the stratification by time showed higher frequencies of thin measurements at 1100 and 1700.  Twenty four of the 26 values in the histograms below 4 mm, 24 of them were at 1100 and 1700. 

Discussions with shop-floor personnel identified mold wear out, build up of chips in a work holding device, and operator fatigue as possible causes.   The corrective action was to take a 10 minute break at 1030 and 1630 each day and have maintenance performed during the breaks.   The corrective action produced a substantial reduction in thin sheets.

References

  1. Chang, P.-L. and K.-H. Lu (1995). "The Construction of the Stratification Procedure for Quality Improvement." Qualilty Engineering 8(2): 237-247.
    Chang, P.-L. and K.-H. Lu (1995). "The Construction of the Stratification Procedure for Quality Improvement." (2): 237-247.
  2. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.

November 26, 2008

Exploratory Data Analysis: Molding Operation Example

The purpose of Exploratory Data Analysis (EDA) is to generate hypotheses or clues that guide us in improving quality or process performance.  Breyfogle (2003, pgs. 10-11) views Six Sigma as a murder mystery where we use a structured approach to uncover clues that lead us to improve process outputs.   These clues are Key Process Input Variables (KPIVS) and process improvement strategies.  As an example, he considers the process of traveling to work where the Key Process Output Variable (KPOV) is the arrival time.   Examples of KPIVs are the setting of our alarm clock and our departure time.   An alternative process improvement strategy might be a different travel route that is less subject to variation during congested time periods.   Then, the route selected is another KPIV, and the travel time along that route is a function of both the route and departure time.   Exploratory Data Analysis helps us identify these KPIVs.

De Mast and Trip (2007) state that the purpose of EDA from a quality improvement project viewpoint is to identify the dependent (Y) and independent (X) variables that may help understand or solve the quality problem.   The dependent Y variables are KPOVs, and the independent X variables are KPIVs.  Leitnaker (2000) gives an example of EDA to identify KPIVs.  The example is a molding operation where:

  • Yields are erratic
  • Parts are produced that do not meet specifications
  • Shipment schedules are not consistently met

A team studied a molding operation supplying plastic switches to industrial customers for use in assembled control pads.   The operation has eight machines, each machine has two molds, and each mold has four cavities.  To investigate the process capability, the team took a sample of size 5 from the output of one machine every 4 hours.   The following control chart displays the results for a critical dimension.

The process is in control, and the range chart supported this conclusion.  But the variation is large.  Next the team investigated the effect of the cavities and molds on the measured dimension.   To do this, they sampled one part from each of the four cavities of the two molds on one machine.   Breaking down the data by cavity and mold is an example of stratification.  Control charts for the individual cavities and molds showed that all cavities and molds appear to be in control. However, mold 2 cavities have larger averages than mold 1 cavities, and the averages for the cavities increases with cavity number.  The following figure clearly shows this pattern.

The figure leads us to identify mold and cavities numbers as KPIVs.   The exploratory data analysis produced a clue which generated a search for the reasons that molds and cavities produced different average dimensions.  The team can proceed to reduce the variability in the measured dimension by reducing the differences in averages for the molds and cavities.

 

 

 

 

References

  1. Breyfogle, F. W. (2003). Implementing Six Sigma. Hoboken, New Jersey, John Wiley & Sons, Inc.
  2. De Mast, Jeroen and Albert Trip (2007). “Exploratory Data Analysis in Quality-Improvement Projects”, Journal of Quality Technology, 39(4): 301-311.
  3. Leitnaker, M. G. (2000). Using the Power of Statistical Thinking, Special Publication of the ASQ Statistics Division, Summer 2000.

May 29, 2008

Analyze Common-Cause Variation A

An additional example appears below illustrating the Analyze Common-Cause Variation step, step 6, in the Hoerl-Snee process improvement strategy.   Refer to the posting on 5/18/2008 for a description of this step.   Following the example, the posting summarizes some suggestions by Breyfogle (2003) to assist in stratification and disaggregation.

Histogram – Stratification.   The posting on 3/25/2008 describes statistical thinking by a team at Ricoh’s Numazu plant.   The plant makes raw material used as ingredients for copy machine toner.  The team wanted to reduce variation in output quantity which indicated a lack of control of the underlying process.   After removing a special cause, the team constructed a histogram of the output quantity.   The histogram clearly displayed excessive variation and two peaks.   The process flow chart showed a split after phase 2 into 2 separate lines, i.e., line A and line B.   Separate histograms for the two lines showed the output from line B was consistently lower that line A.  Constructing separate histograms for the two lines illustrates stratification by line.  Next, the team conducted a brainstorming session to formulate their collective thinking about the causes of excessive variation and the differences between the two lines.   They documented the results with a cause and effect diagram.   The brainstorming session and the construction of a cause and effect diagram illustrate step 7, Study Cause & Effect.

Stratification requires identifying a stratification factor, such as time of the day, and the partitioning of this factor into logical categories.   What tools may we use to aid in the selection of a stratification factor?    The team in the example above noticed two peaks in a histogram.   Breyfogle (2003) provides some guidance for this question.

  1. On page 220, Breyfogle states that patterns on a control chart may suggest the need for stratification.   A sequence of points with small up and down variation relative to the control limits may suggest that the sequence of points comes from a single strata.   The opposite situation where a sequence of points that do not have values near the center line may indicate the combination of two strata.
  2. On page 385, Breyfogle suggests dividing the data into categories based on posing basic questions such as who, what, when and where.

Disaggregation may be aided by constructing a process map such as the one used in the posting on 2/21/08.    The process map (Breyfogle, 2003, p. 103) is a flowchart with key process input variables listed for each step in the process.

References

1.     Breyfogle, F. W. (2003). Implementing Six Sigma. Hoboken, New Jersey, John Wiley & Sons, Inc.


 

May 26, 2008

Analyze Common-Cause Variation Examples (Disaggregation)

This posting gives two additional examples illustrating the Analyze Common-Cause Variation step, step 6, in the Hoerl-Snee process improvement strategy.   Refer to the posting on 5/18/2008 for a description of this step.   Both examples include disaggregation as a tool.

·        Disaggregation – Stratification.  The posting on 2/18/2008 describes statistical thinking by a Midwest manufacturing firm to reduce waiting times by customers.   The company’s goal was to have 95% of incoming customer calls answered by a customer service representative in less than 2 minutes.   Based on a process flowchart, team collected service time data for each step in the process.   That is disaggregation.   The team also collected data for estimating the distribution of incoming calls by time of the day.   That is stratification by the time of day.  They used these data as inputs to a simulation of the call answering process.  They used the simulation construct staffing levels by the hour of the day.   The construction and use of the simulation illustrates step 7, Study Cause & Effect.
·        Disaggregation – Regression Analysis.  The posting on 2/21/2008 describes statistical thinking by a manufacturer of automotive door frames.  The purpose was to eliminate a problem meeting dimensional specifications of the finished product.   Shop floor personnel thought that variations in the incoming raw material characteristics caused the problem meeting dimensional specifications.  The team defined important quality characteristics for each step in the process.   They included quality characteristics of the incoming material.   The manufacturer collected data listing the important quality characteristics as well as the final part dimensions.    A regression analysis showed no effect by the incoming material characteristics.    Moreover, it identified several quality characteristics having a significant effect on finished product dimensions.    The regression analysis also showed that the left and right door frames had significantly different variation for two quality characteristics.   These results motivated corrective action and eliminated the need for rework.   In this example, the team did not need to employ step 7, Study Cause & Effect.

May 21, 2008

Analyze Common-Cause Variation Examples (Stratification)

This posting gives two examples illustrating the Analyze Common-Cause Variation step, step 6, in the Hoerl-Snee process improvement strategy.   Refer to the previous posting for a description of this step.

·         Stratification – Pareto Chart.  The posting on 2/25/2008 describes statistical thinking by a company experiencing a high rejection rate in one of its machine shops.   In order to determine the root cause of these rejections they stratified by classifying the rejections with respect to machine type causing the rejections.   Then they created a Pareto Chart ranking the frequency of rejections by machine type.   They found that 60% of the rejections were due to grinding problems.   This finding did not give them the root cause of the rejections, but it allowed them to focus on grinding operations.  Their next step was to construct a cause and effect diagram and then to design experiments to determine improved grinding procedures.   This next step illustrates the implementation of the Study Cause & Effect step, step 7 in the Hoerl-Snee process improvement strategy.
·         Regression Analysis – Stratification.  The posting on 3/4/2008 describes statistical thinking by Pease Industries to reduce the defect rate of decorative glass inserts for a wooden entry door.   The prevailing opinion was that humidity and temperature variations in the mold department were the root cause.  The team collected data and did a regression analysis using temperature and humidity as independent variables and the number of defects as the dependent variable.   The result was no correlation between the independent variables and the number of defects.  They collected more data and stratified the data by part type, month of occurrence and day of week.   They were surprised by the result showing day of the week strongly affecting the defect rate.   A Chi-Square test showed the day of the week was statistically significant.   The next step was to construct a Cause-and-Effect diagram and do a Is-Is Not analysis.   This step illustrates the Study Cause and Effect step, step 7.

In both of the above examples, the use Cause-and-Effect diagrams, designed experiments and the Is-Is Not analysis required the previous results from the Analyze Cause and Effect steps.   One needs to identify the effects prior to studying the effects.

May 18, 2008

Analyze Common-Cause Variation

This posting discusses the sixth step, Analyze Common Cause Variation, of the Hoerl-Snee Process Improvement Strategy.   Refer to the figure in the April 4 posting for an overview of the process.  Use Britz et al (2000) and Hoerl and Snee (2002) as references.

Common-cause variation affects all of the data which distinguishes this step from the Address-Special-Causes step.  The purpose of the Analyze-Common-Cause-Variation step is to identify sources of variation.     Locating the sources of variation might also reveal its root cause without significant additional analysis.  On other occasions, knowing a source of common-cause variation might require further analysis to determine its root cause.   This additional analysis is performed in the next step, Study Cause and Effect.

Some of the tools we might use in this step are:

  • Stratification.  Define a stratification factor such as the day of the week or machine.   Partition the factor into logical categories.  Compare the data for each category to highlight differences.
  • Disaggregation.  Define quality measures for sub-processes or individual process steps.  Study the variation in the individual sub-processes.  How does it contribute to the overall process variation?
  • Pareto Chart.  Classify defects into categories.  Highlight the categories having the most frequent occurrences.    
  • Histogram.  Plot the distribution of quality measures.  One or more peaks might indicate the presence of categories that could be examined by stratification.
  • Regression Analysis.   Existing opinion might suggest one or more input variables that influence the output quality measure.   A regression analysis might verify this opinion or indicate that these variables have negligible effect.

References

  1. Britz, G. C., D. W. Emerling, et al. (2000). Improving Performance Through Statistical Thinking. Milwaukee, WI, ASQ Quality Press.
  2. Hoerl, R. and R. D. Snee (2002). Statistical Thinking - Improving Business Performance. Pacific Grove, CA, Duxbury.

March 25, 2008

Resin Example of Hoerl-Snee Strategy (Part B)

This posting continues the resin output variation example described to illustrate the Hoerl-Snee process improvement strategy.   We take this example from Britz et al (2000).   It also appears in Hoerl and Snee (2002).

Having removed the special cause, the Ricoh team focused on output quantity variability.   A histogram displays this variability, and the following figure shows recent output data.  This histogram displays an unexpected pattern indicating a combination of two underlying distributions for the output quantity.   Notice the peaks at 4284 and 4308 kg.

The process flowchart appearing in the previous posting suggested that these two component distributions were due to the split after phase 2 into two separate lines, i.e., lines A and B.   The following histograms shown below confirmed this difference.   The output from line B was consistently lower than line A.   Based on the needs of their customers, the team established the limits shown in the histograms, i.e., 4300 kg ± 5 kg.

Clearly, the variation in output quantity is excessive.   Next the team conducted a brainstorming session to document their collective thinking on potential causes of excessive variation and differences between the two lines.   The following cause and effect diagram shows the result of this session.


The next posting will describe the investigation based on the potential causes shown above.  
Note that the improvement process is iterative. Gather data, identify special cause, gather more data, notice differences, and then conduct brainstorming session.   This improvement strategy looks more like Shewhart’s Plan-Do-Check-Act (PDCA) than the DMAIC steps recommended for Six-Sigma projects.   Also, the team didn’t adopt a specified target until after two data analysis steps.   That is, their Define step occurred in their second PDCA cycle.

References

  1. Britz, G. C., D. W. Emerling, et al. (2000). Improving Performance Through Statistical Thinking. Milwaukee, WI, ASQ Quality Press.
  2. Hoerl, R. and R. D. Snee (2002). Statistical Thinking - Improving Business Performance. Pacific Grove, CA, Duxbury.

February 18, 2008

Service Time Flowchart

This post starts a series of posts to present the use of Statistical Thinking Tools in applying Statistical Thinking.   The Statistical Thinking Tool illustrated by this example is a flowchart.   We can have flowcharts for processes having service time objectives as well as processes processes producing a physical product.  Jeffries and Sells (2004) present this example and describe the use of “statistical tools” to meet company service time objectives.   We regard their use of statistical tools as an application of Statistical Thinking.

A Midwest manufacturing firm processes orders for its 6 manufacturing plants and 12 warehouses.   Originally, each plant and warehouse had its own order processing service staffed by a total of 36 customer service representatives.  To improve customer service and reduce costs, the company president directed a team to develop a centralized customer service center located at corporate headquarters.   The president made this decision after the team surveyed customers and found that they were adamant that they did not want to wait for a customer service representative to answer a phone call and they were not very interested in personalized service provided by a plant or warehouse representative.

The team established a goal where 95% of incoming calls would wait less than 2 minutes for a customer service representative.   The team acquired an Automatic Call Distribution (ACD) system to route customer calls to customer service representatives.  The call center would operate from 7:00 am to 7:30 pm Central Time.   The following figure gives a flowchart specifying the process of answering incoming customer calls.

The team collected data giving the distributions of incoming calls by time of day and the service times of the customer service representatives to answer the calls.  Recording and analyzing data for individual steps in the process flow chart is an example of disaggregation.   Classifying and analyzing data by a factor such as time of the day is an example of stratification.

The customer service center staffing levels by hour of the day is a crucial system design parameter.   Wait times will be long without adequate staff.  On two occasions in the past two months, I have had to wait more than an hour for technical service support personnel to answer my calls.   I know that this happens because the companies involved have allocated inadequate staffing to handle the incoming calls.

The team developed staffing levels throughout the day using a simulation of the process represented by the figure above.   Constructing a simulation requires a flowchart.  Refer to Jeffries and Sells (2004) for additional details.

The next post will illustrate the use of a flowchart for a process producing a physical product.

References

  1. Jeffries, R. D. and P. R. Sells (2004). Managing Customer Service Using Statistical Tools: A Case Study. Annual Quality Congress Proceedings.