SlideShare a Scribd company logo
Using Microsoft excel for six sigma
   Setting Expectations
   Calculating Measures of central tendency and variation
   Skewness and kurtosis
   Calculating area under normal curve
   Sorting data
   Histogram
   Pareto Chart
   Scatter diagrams
   Bar and Pie charts
   Using Analysis Toolpak for advanced functions



                                                             2
   This is not a training on Six Sigma!!
    The training presentation assumes that you are already
    aware of Six Sigma concepts, and are looking for ways to
    implement the same using MS Excel.
   The training presentation also assumes that you know the
    basics of MS Excel, and hence it focuses on some advanced
    analytical concepts.
   The excel tips and tools mentioned in this presentation can
    be used in multiple phases of the DMAIC order. So, the
    presentation does not follow a DMAIC flow of thought.
   The training is based on MS Excel 2007. Improvise a little
    when you are using MS Excel 2003.
                                                                  3
In mathematics, the central tendency of a data set is a measure of the
"middle" or "expected" value of the data set. There are many different
descriptive statistics that can be chosen as a measurement of the
central tendency of the data items. These include mean, the median
and the mode.
Other statistical measures such as the standard deviation and the range
are called measures of spread and describe how spread out the data is.




                                                                          4
The arithmetic mean (average) of a list of numbers is the sum of all of
the list divided by the number of items in the list.
To obtain the arithmetic mean from a dataset, use the excel function
“Average”. Click below for the syntax for using the function.




                                        Click for the syntax




                                                Syntax
                                          =AVERAGE(number1,number2,...)




                                                                          5
A median is described as the number separating the higher half of a
sample, a population, or a probability distribution, from the lower half.
If there is an even number of observations, the median is not unique, so
one often takes the mean of the two middle values.




                                        Click for the syntax




                                                  Syntax
                                          =MEDIAN(number1,number2,...)




                                                                            6
The mode is the value that occurs the most frequently in a data set or a
probability distribution. The mode is not necessarily unique, since the
same maximum frequency may be attained at different values.




                                        Click for the syntax




                                                 Syntax
                                          =mode(number1,number2,...)




                                                                           7
In Statistics, variance is the expected square deviation of a variable or
distribution from its expected value or mean. To obtain variance from a
distribution, excel uses the function “=var”. Click below for the syntax.




                                         Click for the syntax




                                                 Syntax
                                           =VAR(number1,number2,...)




                                                                            8
Standard deviation is a measure of the variability or dispersion of a
statistical population, a data set, or a probability distribution. To
calculate Standard Deviation in an excel worksheet, we use the
function, “=stdev”.




                                         Click for the syntax




                                                 Syntax
                                           =STDEV(number1,number2,...)




                                                                         9
In descriptive statistics, the range is the length of the smallest interval
which contains all the data. It is calculated on excel by subtracting the
Min from the max value of the sample. Click below for the syntax.




                                          Click for the syntax




                                                   Syntax
                                            =max(A2:A16)-Min(A2:A16)




                                                                              10
In probability theory and statistics, skewness is a measure of the
asymmetry of the probability distribution of a real-valued random
variable. It is measured in Six Sigma because, in reality, data points are
always not perfectly symmetric.




                                         Click for the syntax




                                                  Syntax
                                           =skew(A2:A16)




                                                                             11
In probability theory and statistics, kurtosis is a measure of the
"peakedness" of the probability distribution of a real-valued random
variable.




                                       Click for the syntax




                                                 Syntax
                                         =kurt(A2:A16)




                                                                       12
If the mean is 85 days and the standard deviation is 5 days,
what is the yield if the USL is 90 days?
                                                                  USL
   Z = (90 − 85) / 5 = 1
                                                                                       Area under curve to
   Y = Pr( x ≤ 90) = Pr( z ≤ 1)                                                         right of USL would
                                                                                         be considered %
                                                                                             defective
 P(z<1) = P(z>-1) = 1-.15865
    = .8413 Yield ≅ 84.1%                     Yield



                               60   70         80            90        100       110        120
                                                    D a ys



                 -7   -6   -    -4 -3    -2     - 0                2         3   4      5    6    7
                           5                    1     1
                                                Z-Scale

                                                                                                             13
=normdist(x,mean,standarddeviation,cumulative)




                                                 14
=normdist(x,mean,standarddeviation,cumulative)




                                                 15
=normdist(x,mean,standarddeviation,cumulative)




                                                 16
=normdist(x,mean,standarddeviation,cumulative)




                                                 17
For a pizza delivery center, the mean of the delivery time is
20 minutes and the standard deviation is 3.5. What is their
target, if the probability of achieving the target is 99.78%?
                                       USL




                            Yield




                               Hours
                                a s




                                                                18
=norminv(probability,mean,standarddeviation)




                                               19
=norminv(probability,mean,standarddeviation)




                                               20
=norminv(probability,mean,standarddeviation)




                                               21
   Data in raw form are usually not easy to use
    for decision making
     Some type of organization is needed
       ▪ Table
       ▪ Graph
   Techniques reviewed here:
       Ordered Array
       Histograms
       Bar charts and pie charts
       Contingency tables
                                                   22
A sorted list of data:
 Shows range (min to max)

 Provides some signals about variability
    within the range
 May help identify outliers (unusual observations)

 If the data set is large, the ordered array is
    less useful


                                                      23
   Data in raw form (as
    collected):

     24, 26, 24, 21, 27, 27, 30, 41,
    32, 38

 Data in ordered array from
  smallest to largest:

     21, 24, 24, 26, 27, 27, 30, 32, 38,
    41
                                           24
 A graph of the data in a frequency distribution is
  called a histogram
 The class boundaries (or class midpoints) are
  shown on the horizontal axis
 the vertical axis is either frequency, relative
  frequency, or percentage
 Bars of the appropriate heights are used to
  represent the number of observations within
  each class

                                                       25
Class
     Class                Midpoint Frequency
10   but less than   20      15         3
20   but less than   30      25         6
30   but less than   40      35         5               Histogram : Daily High Tem perature
40   but less than   50      45         4
                                                        7              6
50   but less than   60      55         2
                                                        6                   5
                                            Frequency

                                                        5                        4
                                                        4         3
                                                        3                              2
                                                        2
             (No gaps                                   1    0                                0
             between                                    0
               bars)
                                                             5   15    25   35   45   55   More
                                                                                                  26
27
28
2


                                          Choose Histogram



                                                    (
    Input data range and bin range (bin
       range is a cell range containing
       the upper class boundaries for
3      each class grouping)
    Select Chart Output
    and click “OK”




                                                             29
30
31
   Scatter Diagrams are used for bivariate
    numerical data
     Bivariate data consists of paired observations
     taken from two numerical variables

   The Scatter Diagram:
     one variable is measured on the vertical axis and
     the other variable is measured on the horizontal
     axis

                                                          32
1

Select the Insert Menu
  tab
2

Select Scatter plot
  dropdown and
  click on any of
  the options. If in
  doubt, select the
  first option
  (scatter with only
  markers)

                         33
Volume    Cost per
                                              Cost per Day vs. Production Volume
per day     day
  23        125                     250
  26        140                     200
  29        146
                     Cost per Day




                                    150
  33        160
  38        167                     100
  42        170                     50
  50        188
                                     0
  55        195
                                          0    10     20     30     40      50     60   70
  60        200
                                                           Volume per Day


                                                                                         34
35
36
Microsoft Excel
descriptive statistics output,
 using the house price data:

    House Prices:


    $2,000,000
       500,000
       300,000
       100,000
       100,000




                                 37
   Select
    Data Analysis
   Choose Correlation from
    the selection menu
   Click OK . . .




                              38
   Input data range and select
    appropriate options
   Click OK to get output


                                  39
 Select the
  input range s
  from the data


 Select the
  residuals
  pattern. If
  you are not
  sure, just
  select line fit
  plots.
                    40
Regression Statistics
Multiple R                 0.76211    The regression equation is:
R Square                   0.58082
Adjusted R Square          0.52842    house price = 98.24833 + 0.10977 (square feet)
Standard Error            41.33032
Observations                     10


ANOVA
                            df             SS              MS          F       Significance F
Regression                        1       18934.9348    18934.9348   11.0848         0.01039
Residual                          8       13665.5652     1708.1957
Total                             9       32600.5000


                       Coefficients   Standard Error      t Stat     P-value    Lower 95%       Upper 95%
Intercept                 98.24833          58.03348       1.69296   0.12892       -35.57720    232.07386
Square Feet                0.10977            0.03297      3.32938   0.01039         0.03374      0.18580




                                                                                                            41
42

More Related Content

PDF
Fiqurların açılışı
PPSX
Üçbucağın, Kvadratın və Düzbucaqlının perimetri.
PPS
التحليل الفني والحركي لسباحة الفراشة
PDF
What is Lean Six Sigma? Lean Six Sigma Explained - Invensis Learning
PPT
Düzbucaqlı,kvadrat və onların perimetrləri
PDF
Cad exercises sample 014
PPTX
Bucaq
PDF
hsc ict practical 2024.pdf
Fiqurların açılışı
Üçbucağın, Kvadratın və Düzbucaqlının perimetri.
التحليل الفني والحركي لسباحة الفراشة
What is Lean Six Sigma? Lean Six Sigma Explained - Invensis Learning
Düzbucaqlı,kvadrat və onların perimetrləri
Cad exercises sample 014
Bucaq
hsc ict practical 2024.pdf

What's hot (20)

PPT
Riyaziyyat dərsi 5
PPTX
Açıq dərs
PDF
Çoxbucaqlının sahəsi
PDF
Vurma və bölmə vərdişləri
PPT
Çevrə və Çevrənin uzunluğu
PDF
Kütlənin ölçülməsi
DOCX
Cəbr 7 ci sinif icmalı
PDF
Vurma əməli
PDF
688 hitroruki
PPTX
Azərbaycan dili
PPT
Riyaziyyat 4
PDF
Mürəkkəb sözlər
PDF
Ədədin hissəsinin tapılması
PDF
20 dairəsində toplama və çıxma
PDF
Çoxbucaqlının perimetri
PDF
Hərəkət bildirən sözlər
PDF
Agenda Ppt Slide
PDF
Mürəkkəb sözlər
PDF
Toplama və çıxmanın qarşılıqlı əlaqəsi
Riyaziyyat dərsi 5
Açıq dərs
Çoxbucaqlının sahəsi
Vurma və bölmə vərdişləri
Çevrə və Çevrənin uzunluğu
Kütlənin ölçülməsi
Cəbr 7 ci sinif icmalı
Vurma əməli
688 hitroruki
Azərbaycan dili
Riyaziyyat 4
Mürəkkəb sözlər
Ədədin hissəsinin tapılması
20 dairəsində toplama və çıxma
Çoxbucaqlının perimetri
Hərəkət bildirən sözlər
Agenda Ppt Slide
Mürəkkəb sözlər
Toplama və çıxmanın qarşılıqlı əlaqəsi
Ad

Similar to Using Microsoft excel for six sigma (20)

PPT
L estimation
PDF
PG STAT 531 Lecture 2 Descriptive statistics
PDF
Evans_Analyticsjjjjjjjjjjjjjj2e_ppt_04 (1).pdf
PPTX
Statistical Analysis with R -I
PPTX
3.2 measures of variation
PDF
Machine Learning - Probability Distribution.pdf
PPTX
Ders 1 mean mod media st dev.pptx
PPTX
Central tendency _dispersion
PPT
Sriram seminar on introduction to statistics
PPTX
3. BIOSTATISTICS III measures of central tendency and dispersion by SM - Cop...
PDF
Normal distribution
PPT
Math Introduction 2014.ppt
PDF
Standard deviation
PPTX
Properties of Standard Deviation
PPTX
Estimating a Population Standard Deviation or Variance
PPTX
Estimating a Population Standard Deviation or Variance
PPTX
3.2 measures of variation
PDF
Memorization of Various Calculator shortcuts
DOCX
Non-Normally Distributed Errors In Regression Diagnostics.docx
PPTX
Lec. 10: Making Assumptions of Missing data
L estimation
PG STAT 531 Lecture 2 Descriptive statistics
Evans_Analyticsjjjjjjjjjjjjjj2e_ppt_04 (1).pdf
Statistical Analysis with R -I
3.2 measures of variation
Machine Learning - Probability Distribution.pdf
Ders 1 mean mod media st dev.pptx
Central tendency _dispersion
Sriram seminar on introduction to statistics
3. BIOSTATISTICS III measures of central tendency and dispersion by SM - Cop...
Normal distribution
Math Introduction 2014.ppt
Standard deviation
Properties of Standard Deviation
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
3.2 measures of variation
Memorization of Various Calculator shortcuts
Non-Normally Distributed Errors In Regression Diagnostics.docx
Lec. 10: Making Assumptions of Missing data
Ad

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Presentation on HIE in infants and its manifestations
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Cell Structure & Organelles in detailed.
PPTX
GDM (1) (1).pptx small presentation for students
PDF
01-Introduction-to-Information-Management.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
FourierSeries-QuestionsWithAnswers(Part-A).pdf
VCE English Exam - Section C Student Revision Booklet
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
102 student loan defaulters named and shamed – Is someone you know on the list?
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Presentation on HIE in infants and its manifestations
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
RMMM.pdf make it easy to upload and study
O7-L3 Supply Chain Operations - ICLT Program
Pharma ospi slides which help in ospi learning
Supply Chain Operations Speaking Notes -ICLT Program
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
Cell Structure & Organelles in detailed.
GDM (1) (1).pptx small presentation for students
01-Introduction-to-Information-Management.pdf

Using Microsoft excel for six sigma

  • 2. Setting Expectations  Calculating Measures of central tendency and variation  Skewness and kurtosis  Calculating area under normal curve  Sorting data  Histogram  Pareto Chart  Scatter diagrams  Bar and Pie charts  Using Analysis Toolpak for advanced functions 2
  • 3. This is not a training on Six Sigma!! The training presentation assumes that you are already aware of Six Sigma concepts, and are looking for ways to implement the same using MS Excel.  The training presentation also assumes that you know the basics of MS Excel, and hence it focuses on some advanced analytical concepts.  The excel tips and tools mentioned in this presentation can be used in multiple phases of the DMAIC order. So, the presentation does not follow a DMAIC flow of thought.  The training is based on MS Excel 2007. Improvise a little when you are using MS Excel 2003. 3
  • 4. In mathematics, the central tendency of a data set is a measure of the "middle" or "expected" value of the data set. There are many different descriptive statistics that can be chosen as a measurement of the central tendency of the data items. These include mean, the median and the mode. Other statistical measures such as the standard deviation and the range are called measures of spread and describe how spread out the data is. 4
  • 5. The arithmetic mean (average) of a list of numbers is the sum of all of the list divided by the number of items in the list. To obtain the arithmetic mean from a dataset, use the excel function “Average”. Click below for the syntax for using the function. Click for the syntax Syntax =AVERAGE(number1,number2,...) 5
  • 6. A median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half. If there is an even number of observations, the median is not unique, so one often takes the mean of the two middle values. Click for the syntax Syntax =MEDIAN(number1,number2,...) 6
  • 7. The mode is the value that occurs the most frequently in a data set or a probability distribution. The mode is not necessarily unique, since the same maximum frequency may be attained at different values. Click for the syntax Syntax =mode(number1,number2,...) 7
  • 8. In Statistics, variance is the expected square deviation of a variable or distribution from its expected value or mean. To obtain variance from a distribution, excel uses the function “=var”. Click below for the syntax. Click for the syntax Syntax =VAR(number1,number2,...) 8
  • 9. Standard deviation is a measure of the variability or dispersion of a statistical population, a data set, or a probability distribution. To calculate Standard Deviation in an excel worksheet, we use the function, “=stdev”. Click for the syntax Syntax =STDEV(number1,number2,...) 9
  • 10. In descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated on excel by subtracting the Min from the max value of the sample. Click below for the syntax. Click for the syntax Syntax =max(A2:A16)-Min(A2:A16) 10
  • 11. In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. It is measured in Six Sigma because, in reality, data points are always not perfectly symmetric. Click for the syntax Syntax =skew(A2:A16) 11
  • 12. In probability theory and statistics, kurtosis is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Click for the syntax Syntax =kurt(A2:A16) 12
  • 13. If the mean is 85 days and the standard deviation is 5 days, what is the yield if the USL is 90 days? USL Z = (90 − 85) / 5 = 1 Area under curve to Y = Pr( x ≤ 90) = Pr( z ≤ 1) right of USL would be considered % defective P(z<1) = P(z>-1) = 1-.15865 = .8413 Yield ≅ 84.1% Yield 60 70 80 90 100 110 120 D a ys -7 -6 - -4 -3 -2 - 0 2 3 4 5 6 7 5 1 1 Z-Scale 13
  • 18. For a pizza delivery center, the mean of the delivery time is 20 minutes and the standard deviation is 3.5. What is their target, if the probability of achieving the target is 99.78%? USL Yield Hours a s 18
  • 22. Data in raw form are usually not easy to use for decision making  Some type of organization is needed ▪ Table ▪ Graph  Techniques reviewed here:  Ordered Array  Histograms  Bar charts and pie charts  Contingency tables 22
  • 23. A sorted list of data:  Shows range (min to max)  Provides some signals about variability within the range  May help identify outliers (unusual observations)  If the data set is large, the ordered array is less useful 23
  • 24. Data in raw form (as collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38  Data in ordered array from smallest to largest: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 24
  • 25.  A graph of the data in a frequency distribution is called a histogram  The class boundaries (or class midpoints) are shown on the horizontal axis  the vertical axis is either frequency, relative frequency, or percentage  Bars of the appropriate heights are used to represent the number of observations within each class 25
  • 26. Class Class Midpoint Frequency 10 but less than 20 15 3 20 but less than 30 25 6 30 but less than 40 35 5 Histogram : Daily High Tem perature 40 but less than 50 45 4 7 6 50 but less than 60 55 2 6 5 Frequency 5 4 4 3 3 2 2 (No gaps 1 0 0 between 0 bars) 5 15 25 35 45 55 More 26
  • 27. 27
  • 28. 28
  • 29. 2 Choose Histogram ( Input data range and bin range (bin range is a cell range containing the upper class boundaries for 3 each class grouping) Select Chart Output and click “OK” 29
  • 30. 30
  • 31. 31
  • 32. Scatter Diagrams are used for bivariate numerical data  Bivariate data consists of paired observations taken from two numerical variables  The Scatter Diagram:  one variable is measured on the vertical axis and the other variable is measured on the horizontal axis 32
  • 33. 1 Select the Insert Menu tab 2 Select Scatter plot dropdown and click on any of the options. If in doubt, select the first option (scatter with only markers) 33
  • 34. Volume Cost per Cost per Day vs. Production Volume per day day 23 125 250 26 140 200 29 146 Cost per Day 150 33 160 38 167 100 42 170 50 50 188 0 55 195 0 10 20 30 40 50 60 70 60 200 Volume per Day 34
  • 35. 35
  • 36. 36
  • 37. Microsoft Excel descriptive statistics output, using the house price data: House Prices: $2,000,000 500,000 300,000 100,000 100,000 37
  • 38. Select Data Analysis  Choose Correlation from the selection menu  Click OK . . . 38
  • 39. Input data range and select appropriate options  Click OK to get output 39
  • 40.  Select the input range s from the data  Select the residuals pattern. If you are not sure, just select line fit plots. 40
  • 41. Regression Statistics Multiple R 0.76211 The regression equation is: R Square 0.58082 Adjusted R Square 0.52842 house price = 98.24833 + 0.10977 (square feet) Standard Error 41.33032 Observations 10 ANOVA df SS MS F Significance F Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 41
  • 42. 42