SlideShare a Scribd company logo
Examples for
                                                                        the Project
                                                                        Manager
SAMPLING AND THE
   LAW OF LARGE
        NUMBERS
  The Law of Large Numbers, LLN, tells us it‟s possible to estimate
          certain information about a population from just the data
 measured, calculated, or observed from a sample of the population.

           Sampling saves the project manager time and money, but
            introduces risk. How much risk for how much savings?

          The answer to these questions is the subject of this paper.




                                              A whitepaper by
                                   John C. Goodpasture, PMP
                                            Managing Principal
                                  Square Peg Consulting, LLC
Sampling and the Law of Large Numbers
                    Examples for the Project Manager

                         The Law of Large Numbers, LLN, tells us it‟s possible to
                         estimate certain information about a population from just the data
                         measured, calculated, or observed from a sample of the population.

                         A population is any frame of like entities. For statistical purposes,
                         entities should be individually independent and subject to identical
                         distributions of values of interest.

                         Sampling saves project managers a lot of time and money:

                                  Obtains practical and useful results even when it is not
                                   economical to obtain and evaluate every data point in a
                                   population
                                  Extends the project access even though it may not be
                                   practical to reach every member of the population.
                                  Provides actionable information even when it is not
                                   possible to know every member of the population.
                                  Avoids spending too much time to observe, measure, or
                                   interview every member of the population
                                  Avoids collecting too much data to handle even if every
                                   member of the population were readily available—to
                                   include expense of data handling and timeliness of data
                                   handling



 Analysis by sampling is called ‘drawing an inference’, and the branch of statistics from which it
 comes is called ‘inferential statistics’. Drawing an inference is similar to ‘inductive reasoning’.

 In both cases, inference and induction, one works from a set of specific observations back to the
 more general case, or to the rules that govern the observations.




What about risks?
Sampling introduces risk into the project:
    Risk that the data sample may not accurately portray the population—there may
      be inadvertent exclusions, clusters, strata, or other population attributes not
      understood and accounted for.

©Copyright 2010 John C Goodpasture                                                            Page 1
   Risk that some required information in the population may not be sampled at all;
       thus the sample data may be deficient or may misrepresent the true condition of
       the population.
      Risk that in other situations, the data in the sample are outliers and misrepresent
       their true relationship to the population; the sample may not be discarded when it
       should be.


Risk assessments
There are two risk assessments to be made. Examples in this paper will illustrate these
two assessments.

   1. “Margin of error”, which refers to the estimated error around the measurement,
      observation, or calculation of statistics within the interval of the sample data, and
      Margin of error is the percentage of the interval relative to the statistic being
      estimated:

                              % error = Interval / Average, or
                                   Interval / Proportion
                                          (x100)

Because margin of error is a ratio, the risk manager actually has to be concerned for both
the numerator and the denominator: for small statistical values [for a small denominator]
the interval [the numerator] must be likewise small—and, a small interval is achieved by
having a large sample size, N.

   2. “Confidence interval”, which refers to the interval within which the true
      population parameters are likely to be with a specified probability.


Confidence intervals have their own risk. The principle risk is that the sample
misrepresents the population. If confidence is stated as 95% for some interval, then there
is a 5% chance that the true population parameter lays outside the interval. Consider this
case: a population with a parameter real value of 8 is sampled [of course, this fact—the
real value of 8—is unknown to the project team]. But, also unknown to the project team,
for example, the sample may be influenced by some infrequent outliers in the population.
From the sample data the sample average may be calculated to be 10. The question is:
what is the quality of this metric value? We will use confidence interval and margin of
error as surrogates for sample statistics quality.


Sample design and sample risk
Would more trials of the same sample size improve quality? Perhaps. However, the
definition of confidence covers the case: Of all the sample intervals obtained in multiple
trials, 95% of them will contain the true population parameter; or, for only one trial, there
is a 95% chance that the true population parameter is within the interval of that trial.


©Copyright 2010 John C Goodpasture                                                   Page 1
Generally, to reduce risk, the sample size, N, is made larger, rather than independently
resampling the same population with the same size sample.

Deciding upon the sample size—meaning: the value of N—introduces a tension between
the project‟s budget and/or schedule managers, and the risk managers. Tension is another
word for risk.

      Budget managers want to limit the cost of gathering more data than is needed and
       thereby limit cost risk—in other words, avoid oversampling.
      Risk managers want to limit the impact of not having enough data and thereby
       limit functional, feature, or performance risk.


Sampling policy
The risk plan customarily invokes a project management policy regarding the degree of
risk that is acceptable:
      “Margin of error” is customarily accepted between +/- 3 to 5%
      “Confidence Interval” is customarily a pre-selected percentage between 80 and
       99%, most commonly 95% or 99%.
The sampling protocol for a given project is designed by the risk manager to support
these policy objectives




General examples
Below are several population examples that are common in project situations. They fall
into one of two population types, discrete proportions and continuous data.
     Project managers and the project office often deal with proportions
     Project control account managers and team leaders often deal with “continuous
       data”.


   1. Populations of categorical data characterized with proportions: Proportional data is
      sometimes called ‘categorical data’ or ‘category data’; proportions are a form of ‘count’
      data. Proportions are formed from the ratio of the count.

       In Six Sigma, such category data is called ‘attribute data’. For example, a semi-
       conductor wafer fits either into a category of ‘defect free’ or into another category of
       ‘defective’. The metric is the count in each category.

       Proportion is often notated as ‘p’ for the proportional count in one category, and ‘1-p’ for
       the other. ‘1-p’ is sometimes denoted ‘q’.

       The true proportion, p, is often unknown. An estimate of p is measurable but the
       estimate is probabilistic and thus has statistical characteristics.


©Copyright 2010 John C Goodpasture                                                           Page 2
The underlying entity is often not quantitative. In other words, we speak of the average
       proportion of defects, but not the average defect.

   2. Populations of continuous data: Continuous data is measured on a continuous number
      scale. Continuous data from one measurement can be compared with other continuous
      data and can be manipulated with arithmetic operations. The ‘distance’ between one
      point on the scale and another has a real meaning, not just a relative position as on an
      ordinal scale.

       Continuous data is descriptive: the data values describe features and attributes, like size,
       weight, density, and the like. Collections or sets of continuous data values are
       characterized with descriptive statistics, like average weight, or average hours of
       experience; and other statistics that can be calculated from data, like standard deviation
       and variance.

       Six Sigma refers to such populations as having ‘continuous or variable data’ metrics,
       referring to the idea that such metrics can be measured on a continuous scale.




Examples of Categorical data populations characterized with proportions:

                    A proportion of Users/operators/ maintenance and support/beneficiaries
Opinion
                    who have one opinion or another about a feature or function.

                    A proportion of devices or objects have a defect, and others do not,
                    Or possess an attribute that pass/fails some metric limit, like power
Defects and
                    consumed.
pass/fail
                             Typically, pass/fail results are observed in a number of independent
                             tests, inspections, or ‘trials’.

                    A proportion of devices or objects that are of a certain
                    type/category/classification.
                             This situation comes up often in database projects where database
Classification               records may or may not meet a specific type classification.
and position                 But all manner of tangible objects also have type classifications,
                             such as hard wood or soft wood, steel or stainless steel.
                    A proportion of devices that are positioned above, between, or below some
                    ‘critical’ boundary, like a quartile or percentile limit.



Examples of Continuous measurement populations




©Copyright 2010 John C Goodpasture                                                          Page 3
Average age of a user group, average drying time of a coating, average
                      time to code a design object, or average time to repair an object.
Objects with
Measurable            Average difference between user groups, drying time, coding time, or repair
attributes            time of one or another object
                      Average distance to [or between] object coordinates

                      Process example: A process for which the arrival rate of event—like a
                      trigger or a device failure—or the count of events in a unit of time or space
                      is important.
                              In a web commerce project, an example is the arrival rate of
Process events                customers to the product ordering page.
and Opportunity       Opportunity example: An ‘area’ in which events can occur.
                              In a chemical development project, an event could be the
                              appearance—yes, or no—of a certain molecule after some process
                              activity; the measurable opportunity is the count of a certain
                              molecule per cubic centimeter.


Project Estimates
Regardless of the nature of the population, the issues for the project manager are the
same:

       Effort: How much effort will sampling take?
        The LLN tells us the sample statistics will be „good enough‟ if the sample is
        „large enough‟. For project managers the question is: How large is „large
        enough‟?
       Impact: What is the impact of the risk to be mitigated?
        Confidence statistics and margins of error of the sample provide the ranges of the
        impact.

Risk management and estimating rules of thumb

                    The actual size of the population is irrelevant—so long as it is ‘large’
Population size     compared to the sample. Population size is not used in estimates, even if
                    known, unless the population is ‘small’ when compared to the sample size.

                    Sample size [count of values in the sample] is driven by risk tolerance for the
Sample size
                    possible error in the sample results. A larger count reduces error possibilities.
[count of values]
                    There are formulas for sample size that take into account risk tolerance.

                    The margin of error in the estimated statistic improves with increasing count
Margin of error
                    of data values in the sample

                    Confidence that the actual population parameter is within the sample data
Population          interval improves as the interval is made wider for a given number of samples
parameter           values.
confidence          Thus, for a sample of 30 values, the confidence interval for 99% confidence is
                    wider than for 90% confidence


©Copyright 2010 John C Goodpasture                                                           Page 4
Common
                  The most common confidence intervals are 80, 90, 95, and 99%.
intervals

Estimating proportional parameters
Sample proportion notation:
    One category is given a proportion notated „p‟.
    „1-p‟ notates the sample proportion of the other category [sometimes „1-p‟ is
       denoted as „q‟]

Project example with proportion:
Project description: Let‟s say that a project deliverable is a database for which over 10M
data records are to be loaded from a very much larger library [population]. Depending
on the mix of categories of data records in the population, the scheduling manager will
schedule more loading time if mostly Category-1, or less time if not mostly Category-1.

The project manager elects to sample the data record population to determine the
proportionality, p, of records that are Category-1 so that the scheduling manager has
information to guide project scheduling.

The project risk management plan requires estimates to have 95% confidence for design
parameters, and a margin of error of less than +/- 5% on sample data values.

Sample design: With no a priori hypothesis of the expected proportionality of „p‟, some
iteration may be required. A good starting point is to assume p = 0.5. The risk manager
refers to the chart given in the appendix entitled “Proportion „p‟ vs +/- Margin of Error
%” that is a plot of error percentage for a confidence of 95%. From that chart, the risk
manager finds that for a +/- 5% margin of error of „p‟ with 95% confidence a sample size
greater than 1,000 but smaller than 3000 is needed.

Solving the margin of error equation for N in fact gives 1,536 as the appropriate starting
point for N

Starting with N = 1,536, if the first sample returns a „p‟ value that is 0.5 or greater, the
margin of error is likely less than +/- 5%; no further sampling is required. Otherwise, a
larger sample size is required.

Sample analysis: Assume the sample returns a value of „p‟ of 0.7. From the confidence
interval equation for proportions given in the appendix, the 95% confidence interval for
the estimated proportion is calculated to be 67% to 73%, centered on 70%.

Risk management analysis: There is a 5% probability that the proportion „p‟ is not
within the confidence interval of 67% to 73%. There is not enough information to
forecast whether the proportion „p‟ is more likely less than 67% or greater than 73%.

From the chart in the appendix for margin of error, the margin of error of the
proportionality value 0.7 is about +/- 4.7 %, or +/- 0.032, from 0.668 to 0.732.


©Copyright 2010 John C Goodpasture                                                  Page 5
The sample data supports the project risk tolerance policy objectives of 95% confidence
and < +/- 5% margin of error.

Estimating continuous data parameters:
Project example with descriptive statistics
Project description: Let‟s say that a project deliverable is an ejector seat for a military
aircraft; the average weight of the pilot population needs be known for the design.

The project manager elects to sample the pilot population rather than weigh every pilot.

The project risk management plan requires estimates to have 95% confidence for design
parameters and +/- 3% margin of error for sample data statistics.

Sample Frame: From the chart in the appendix entitled “% Margin of Error v N, 95%
Confidence” the risk manager finds that a sample of size 85 is required to meet the +/-
3% policy metric and simultaneously meet the 95% confidence interval metric. So, in this
example, 85 pilots are weighed from a population frame of active duty military pilots,
both men and women.

Assume the sample average is found from the sample data to be 175 lbs [79.4 kg], and
the Sample σ is calculated from the sample data by spreadsheet function. Assume the
Sample σ is calculated to be 25 lbs [11.3 kg].

Sample analysis: From the equation given in the appendix for continuous data, the 95%
confidence interval for the estimated average weight of the pilot population is estimated
to be about +/- 5.4 lbs [+/- 2.4 kg], or from 169.6 to 180.4 lbs [76.9 to 81.8 kg].

Risk management analysis: There is a 5% probability that the average pilot weight is
not within the confidence interval.

The sample average of 175 pounds is estimated to have a margin of error of +/- 3%, or
+/- 5.2 pounds [+/- 2.4 kg].




©Copyright 2010 John C Goodpasture                                                 Page 6
Acknowledgement
The author is indebted to Dr. Walter P. Bond, Associate Professor (retired) of Florida
Institute of Technology for suggestions and peer review.

Appendix

Proportional category data


Confidence Interval, proportions
The following equations define the confidence interval for varying confidence objectives,
where  is the symbol for „square root‟ [sqrt]. Note the square root is multiplied by a
numerical factor. The numerical factor is a so-called „Z‟ number taken from the standard
normal bell curve. Z = 1 corresponds to one standard deviation, σ. Z values typically
range +/- 3 about the standard normal mean of „0‟. In order to use this equation, „p‟
cannot be very close to 0 or 1 since the validity of the equation depends „p‟ being in the
mid-range of the confidence probability.


                            80% Interval = p +/- 1.3 * [p * (1 - p) / N]
                            90% Interval = p +/- 1.7 * [p * (1 - p) / N]
                             95% Interval = p +/- 2 * [p * (1 - p) / N]
                            99% Interval = p +/- 2.7 * [p * (1 - p) / N]



 The confidence objective, expressed as a %, is read as, for example, 80% confidence
 the real population parameter is within the interval, and 20% confidence the real
 population parameter is outside the interval.


Margin of Error, proportions
The following chart is a plot of three different sample sizes, N, showing the margin of
error as the proportion „p‟ changes. This chart is based on the formula for margin of error
given below:

                             +/- Margin of Error = ½ Interval width / p
                         Where ½ Interval width = +/- Z * [p * (1 - p) / N]
                                            And where
                                    Z = 2* for 95% confidence
* more precisely: 1.96


©Copyright 2010 John C Goodpasture                                                 Page 7
Continuous data and descriptive statistics
Confidence Interval
The following equations give approximations of the interval range. „N‟ is the count of
data values in the sample; N is the square root of „N‟. The numerical factor in the
numerator comes from a table of „t‟ values that are developed by statisticians for
sampling analysis. The „t‟ value depends on the count of the sample points. The „t‟ value
typically ranges +/- 3; it is taken from the T-distribution that is approximately Normal.


         80% Interval = Sample average +/- (1.3 / N) x Sample σ [narrowest interval]
                   90% Interval = Sample average +/- (1.7 / N) x Sample σ
                    95% Interval = Sample average +/- (2 / N) x Sample σ
           99% Interval = Sample average +/- (2.7 / N) x Sample σ [widest interval]




Note: The Sample σ, or sample standard deviation, is calculated, usually by spreadsheet
function, from the sample data.




©Copyright 2010 John C Goodpasture                                                        Page 8
Margin of error
The margin of error is based on the following equation:


                           Margin of error = +/- ‘t’ * sample σ / √N
                                                Sample average
                                           Where
                             ‘t’ = 2* for 95% confidence interval

*more precisely: 1.96


The following is a plot for the margin of error as a function of the sample size, N




©Copyright 2010 John C Goodpasture                                                    Page 9
.




©Copyright 2010 John C Goodpasture   Page 10
John C. Goodpasture, PMP and Managing Principal at
                              Square Peg Consulting, is a program manager, coach,
                              author, and project consultant specializing in technology
                              projects with emphasis on quantitative methods, project
                              planning, and risk management.

                              His career in program management has spanned the U.S.
                              Department of Defense; the defense, intelligence, and
                              aerospace industry; and the IT back office where he led
                              several efforts in ERP systems.

                              He has coached many project teams in the U.S., Europe,
                              and Asia.

                              John is the author of numerous books, magazine articles,
                              and web logs in the field of project management, the most
                              recent of which is “Project Management the agile way:
                              Making it work in the enterprise”.

                              He blogs at johngoodpasture.com, and his work products
                              are found in the library at www. sqpegconsulting.com.




©Copyright 2010 John C Goodpasture                                            Page 11

More Related Content

PDF
12.0 risk management agile+evm (v10.2)
PDF
Probabilistic Cost, Schedule, and Risk management
PDF
Paradigm of agile project management
PDF
Managing in the presence of uncertainty
PDF
Applying risk radar (v2)
PDF
Risk Management
PDF
Options based decisions processes
DOC
Agile project management and normative
12.0 risk management agile+evm (v10.2)
Probabilistic Cost, Schedule, and Risk management
Paradigm of agile project management
Managing in the presence of uncertainty
Applying risk radar (v2)
Risk Management
Options based decisions processes
Agile project management and normative

What's hot (20)

PDF
Managing risk with deliverables planning
PDF
Increasing the Probability of Success with Continuous Risk Management
PDF
Risk assesment template
PDF
Programmatic risk management workshop (handbook)
PDF
Notional cam interview questions (update)
DOCX
Risk management 4th in a series
PDF
Increasing the Probability of Project Success
PDF
Risk management (final review)
PDF
Project Risk Management
PPTX
Risk management and IT technologies
PDF
Managing Risk in Agile Development: It Isn’t Magic
PDF
Risk management of the performance measurement baseline
PPTX
Software risk analysis and management
PDF
Risk Management in Five Easy Pieces
DOCX
Bertrand's Individual Essay
PDF
Information Technology Risk Management
DOCX
Continuous Risk Management
DOCX
Liberty university busi 313 quiz 3 complete solutions correct answers slideshare
PDF
Increasing the Probability of Success with Continuous Risk Management
PDF
Root causes
Managing risk with deliverables planning
Increasing the Probability of Success with Continuous Risk Management
Risk assesment template
Programmatic risk management workshop (handbook)
Notional cam interview questions (update)
Risk management 4th in a series
Increasing the Probability of Project Success
Risk management (final review)
Project Risk Management
Risk management and IT technologies
Managing Risk in Agile Development: It Isn’t Magic
Risk management of the performance measurement baseline
Software risk analysis and management
Risk Management in Five Easy Pieces
Bertrand's Individual Essay
Information Technology Risk Management
Continuous Risk Management
Liberty university busi 313 quiz 3 complete solutions correct answers slideshare
Increasing the Probability of Success with Continuous Risk Management
Root causes
Ad

Viewers also liked (15)

DOC
project on construction of house report.
DOCX
Project charter template
PDF
Project description form write
DOCX
Project charterexample (1) (1)
PDF
Agile and the Seven Sins of Project Management
PPTX
Project charter v2
PPT
PMP - Project Initiation Template for Professionals
PPT
Discrete Probability Distributions
PDF
Project charter-template
PDF
Social Media Project Charter Template
PDF
Project Charter Guide
PPT
Organisation Structure
PDF
7. binomial distribution
PDF
7 Deadly Sins of Agile Software Test Automation
PPT
Statistics Project
project on construction of house report.
Project charter template
Project description form write
Project charterexample (1) (1)
Agile and the Seven Sins of Project Management
Project charter v2
PMP - Project Initiation Template for Professionals
Discrete Probability Distributions
Project charter-template
Social Media Project Charter Template
Project Charter Guide
Organisation Structure
7. binomial distribution
7 Deadly Sins of Agile Software Test Automation
Statistics Project
Ad

Similar to Project examples for sampling and the law of large numbers (20)

PDF
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...
PPTX
Quantification of Risks in Project Management
PDF
Keys to extract value from the data analytics life cycle
PDF
Sampling Technique
DOC
Statistics Assignments 090427
PDF
How Traditional Risk Reporting Has Let Us Down
DOCX
Risk Management in Five Easy Pieces
DOCX
Academic writer 23
DOCX
Computing Descriptive Statistics © 2014 Argos.docx
DOCX
Computing Descriptive Statistics © 2014 Argos.docx
PPT
Sample size
PDF
Statistic Project Essay
PPT
Measuring Risk - What Doesn’t Work and What Does
PPTX
Data science notes for ASDS calicut 2.pptx
DOCX
Executive Program Practical Connection Assignment - 100 poin
PDF
Statistics For Bi
PPTX
Basic statistics for pharmaceutical (Part 1)
PDF
Biostaticstics, Application of Biostaticstics
DOCX
Statistical ProcessesCan descriptive statistical processes b.docx
DOCX
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docx
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...
Quantification of Risks in Project Management
Keys to extract value from the data analytics life cycle
Sampling Technique
Statistics Assignments 090427
How Traditional Risk Reporting Has Let Us Down
Risk Management in Five Easy Pieces
Academic writer 23
Computing Descriptive Statistics © 2014 Argos.docx
Computing Descriptive Statistics © 2014 Argos.docx
Sample size
Statistic Project Essay
Measuring Risk - What Doesn’t Work and What Does
Data science notes for ASDS calicut 2.pptx
Executive Program Practical Connection Assignment - 100 poin
Statistics For Bi
Basic statistics for pharmaceutical (Part 1)
Biostaticstics, Application of Biostaticstics
Statistical ProcessesCan descriptive statistical processes b.docx
httphome.ubalt.eduntsbarshbusiness-statoprepartIX.htmTool.docx

More from John Goodpasture (20)

PDF
Five tools for managing projects
PPTX
Risk management short course
PDF
Agile in the waterfall
PDF
RFP template
PDF
Agile earned value exercise
PDF
Agile 103 - the three big questions
PDF
Agile for project managers - a sailing analogy-UPDATE
PDF
Feature driven design FDD
PDF
Dynamic Systems Development, DSDM
PDF
Agile for project managers - A presentation for PMI
PDF
Five risk management rules for the project manager
PDF
Building Your Personal Brand
PDF
Portfolio management and agile: a look at risk and value
PDF
Agile for project managers - a sailing analogy
PDF
Risk management with virtual teams
PDF
Bayes Theorem and Inference Reasoning for Project Managers
PDF
Adding quantitative risk analysis your Swiss Army Knife
PDF
Business value and kano chart
PDF
Agile for Business Analysts
PDF
Time centric Earned Value
Five tools for managing projects
Risk management short course
Agile in the waterfall
RFP template
Agile earned value exercise
Agile 103 - the three big questions
Agile for project managers - a sailing analogy-UPDATE
Feature driven design FDD
Dynamic Systems Development, DSDM
Agile for project managers - A presentation for PMI
Five risk management rules for the project manager
Building Your Personal Brand
Portfolio management and agile: a look at risk and value
Agile for project managers - a sailing analogy
Risk management with virtual teams
Bayes Theorem and Inference Reasoning for Project Managers
Adding quantitative risk analysis your Swiss Army Knife
Business value and kano chart
Agile for Business Analysts
Time centric Earned Value

Recently uploaded (20)

PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PDF
Chapter 5_Foreign Exchange Market in .pdf
PDF
Deliverable file - Regulatory guideline analysis.pdf
DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PDF
COST SHEET- Tender and Quotation unit 2.pdf
PDF
Laughter Yoga Basic Learning Workshop Manual
DOCX
Business Management - unit 1 and 2
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PDF
Nidhal Samdaie CV - International Business Consultant
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PPTX
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PPT
Lecture 3344;;,,(,(((((((((((((((((((((((
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Chapter 5_Foreign Exchange Market in .pdf
Deliverable file - Regulatory guideline analysis.pdf
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
COST SHEET- Tender and Quotation unit 2.pdf
Laughter Yoga Basic Learning Workshop Manual
Business Management - unit 1 and 2
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
Nidhal Samdaie CV - International Business Consultant
Reconciliation AND MEMORANDUM RECONCILATION
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Power and position in leadershipDOC-20250808-WA0011..pdf
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
Roadmap Map-digital Banking feature MB,IB,AB
Ôn tập tiếng anh trong kinh doanh nâng cao
Lecture 3344;;,,(,(((((((((((((((((((((((
Digital Marketing & E-commerce Certificate Glossary.pdf.................

Project examples for sampling and the law of large numbers

  • 1. Examples for the Project Manager SAMPLING AND THE LAW OF LARGE NUMBERS The Law of Large Numbers, LLN, tells us it‟s possible to estimate certain information about a population from just the data measured, calculated, or observed from a sample of the population. Sampling saves the project manager time and money, but introduces risk. How much risk for how much savings? The answer to these questions is the subject of this paper. A whitepaper by John C. Goodpasture, PMP Managing Principal Square Peg Consulting, LLC
  • 2. Sampling and the Law of Large Numbers Examples for the Project Manager The Law of Large Numbers, LLN, tells us it‟s possible to estimate certain information about a population from just the data measured, calculated, or observed from a sample of the population. A population is any frame of like entities. For statistical purposes, entities should be individually independent and subject to identical distributions of values of interest. Sampling saves project managers a lot of time and money:  Obtains practical and useful results even when it is not economical to obtain and evaluate every data point in a population  Extends the project access even though it may not be practical to reach every member of the population.  Provides actionable information even when it is not possible to know every member of the population.  Avoids spending too much time to observe, measure, or interview every member of the population  Avoids collecting too much data to handle even if every member of the population were readily available—to include expense of data handling and timeliness of data handling Analysis by sampling is called ‘drawing an inference’, and the branch of statistics from which it comes is called ‘inferential statistics’. Drawing an inference is similar to ‘inductive reasoning’. In both cases, inference and induction, one works from a set of specific observations back to the more general case, or to the rules that govern the observations. What about risks? Sampling introduces risk into the project:  Risk that the data sample may not accurately portray the population—there may be inadvertent exclusions, clusters, strata, or other population attributes not understood and accounted for. ©Copyright 2010 John C Goodpasture Page 1
  • 3. Risk that some required information in the population may not be sampled at all; thus the sample data may be deficient or may misrepresent the true condition of the population.  Risk that in other situations, the data in the sample are outliers and misrepresent their true relationship to the population; the sample may not be discarded when it should be. Risk assessments There are two risk assessments to be made. Examples in this paper will illustrate these two assessments. 1. “Margin of error”, which refers to the estimated error around the measurement, observation, or calculation of statistics within the interval of the sample data, and Margin of error is the percentage of the interval relative to the statistic being estimated: % error = Interval / Average, or Interval / Proportion (x100) Because margin of error is a ratio, the risk manager actually has to be concerned for both the numerator and the denominator: for small statistical values [for a small denominator] the interval [the numerator] must be likewise small—and, a small interval is achieved by having a large sample size, N. 2. “Confidence interval”, which refers to the interval within which the true population parameters are likely to be with a specified probability. Confidence intervals have their own risk. The principle risk is that the sample misrepresents the population. If confidence is stated as 95% for some interval, then there is a 5% chance that the true population parameter lays outside the interval. Consider this case: a population with a parameter real value of 8 is sampled [of course, this fact—the real value of 8—is unknown to the project team]. But, also unknown to the project team, for example, the sample may be influenced by some infrequent outliers in the population. From the sample data the sample average may be calculated to be 10. The question is: what is the quality of this metric value? We will use confidence interval and margin of error as surrogates for sample statistics quality. Sample design and sample risk Would more trials of the same sample size improve quality? Perhaps. However, the definition of confidence covers the case: Of all the sample intervals obtained in multiple trials, 95% of them will contain the true population parameter; or, for only one trial, there is a 95% chance that the true population parameter is within the interval of that trial. ©Copyright 2010 John C Goodpasture Page 1
  • 4. Generally, to reduce risk, the sample size, N, is made larger, rather than independently resampling the same population with the same size sample. Deciding upon the sample size—meaning: the value of N—introduces a tension between the project‟s budget and/or schedule managers, and the risk managers. Tension is another word for risk.  Budget managers want to limit the cost of gathering more data than is needed and thereby limit cost risk—in other words, avoid oversampling.  Risk managers want to limit the impact of not having enough data and thereby limit functional, feature, or performance risk. Sampling policy The risk plan customarily invokes a project management policy regarding the degree of risk that is acceptable:  “Margin of error” is customarily accepted between +/- 3 to 5%  “Confidence Interval” is customarily a pre-selected percentage between 80 and 99%, most commonly 95% or 99%. The sampling protocol for a given project is designed by the risk manager to support these policy objectives General examples Below are several population examples that are common in project situations. They fall into one of two population types, discrete proportions and continuous data.  Project managers and the project office often deal with proportions  Project control account managers and team leaders often deal with “continuous data”. 1. Populations of categorical data characterized with proportions: Proportional data is sometimes called ‘categorical data’ or ‘category data’; proportions are a form of ‘count’ data. Proportions are formed from the ratio of the count. In Six Sigma, such category data is called ‘attribute data’. For example, a semi- conductor wafer fits either into a category of ‘defect free’ or into another category of ‘defective’. The metric is the count in each category. Proportion is often notated as ‘p’ for the proportional count in one category, and ‘1-p’ for the other. ‘1-p’ is sometimes denoted ‘q’. The true proportion, p, is often unknown. An estimate of p is measurable but the estimate is probabilistic and thus has statistical characteristics. ©Copyright 2010 John C Goodpasture Page 2
  • 5. The underlying entity is often not quantitative. In other words, we speak of the average proportion of defects, but not the average defect. 2. Populations of continuous data: Continuous data is measured on a continuous number scale. Continuous data from one measurement can be compared with other continuous data and can be manipulated with arithmetic operations. The ‘distance’ between one point on the scale and another has a real meaning, not just a relative position as on an ordinal scale. Continuous data is descriptive: the data values describe features and attributes, like size, weight, density, and the like. Collections or sets of continuous data values are characterized with descriptive statistics, like average weight, or average hours of experience; and other statistics that can be calculated from data, like standard deviation and variance. Six Sigma refers to such populations as having ‘continuous or variable data’ metrics, referring to the idea that such metrics can be measured on a continuous scale. Examples of Categorical data populations characterized with proportions: A proportion of Users/operators/ maintenance and support/beneficiaries Opinion who have one opinion or another about a feature or function. A proportion of devices or objects have a defect, and others do not, Or possess an attribute that pass/fails some metric limit, like power Defects and consumed. pass/fail Typically, pass/fail results are observed in a number of independent tests, inspections, or ‘trials’. A proportion of devices or objects that are of a certain type/category/classification. This situation comes up often in database projects where database Classification records may or may not meet a specific type classification. and position But all manner of tangible objects also have type classifications, such as hard wood or soft wood, steel or stainless steel. A proportion of devices that are positioned above, between, or below some ‘critical’ boundary, like a quartile or percentile limit. Examples of Continuous measurement populations ©Copyright 2010 John C Goodpasture Page 3
  • 6. Average age of a user group, average drying time of a coating, average time to code a design object, or average time to repair an object. Objects with Measurable Average difference between user groups, drying time, coding time, or repair attributes time of one or another object Average distance to [or between] object coordinates Process example: A process for which the arrival rate of event—like a trigger or a device failure—or the count of events in a unit of time or space is important. In a web commerce project, an example is the arrival rate of Process events customers to the product ordering page. and Opportunity Opportunity example: An ‘area’ in which events can occur. In a chemical development project, an event could be the appearance—yes, or no—of a certain molecule after some process activity; the measurable opportunity is the count of a certain molecule per cubic centimeter. Project Estimates Regardless of the nature of the population, the issues for the project manager are the same:  Effort: How much effort will sampling take? The LLN tells us the sample statistics will be „good enough‟ if the sample is „large enough‟. For project managers the question is: How large is „large enough‟?  Impact: What is the impact of the risk to be mitigated? Confidence statistics and margins of error of the sample provide the ranges of the impact. Risk management and estimating rules of thumb The actual size of the population is irrelevant—so long as it is ‘large’ Population size compared to the sample. Population size is not used in estimates, even if known, unless the population is ‘small’ when compared to the sample size. Sample size [count of values in the sample] is driven by risk tolerance for the Sample size possible error in the sample results. A larger count reduces error possibilities. [count of values] There are formulas for sample size that take into account risk tolerance. The margin of error in the estimated statistic improves with increasing count Margin of error of data values in the sample Confidence that the actual population parameter is within the sample data Population interval improves as the interval is made wider for a given number of samples parameter values. confidence Thus, for a sample of 30 values, the confidence interval for 99% confidence is wider than for 90% confidence ©Copyright 2010 John C Goodpasture Page 4
  • 7. Common The most common confidence intervals are 80, 90, 95, and 99%. intervals Estimating proportional parameters Sample proportion notation:  One category is given a proportion notated „p‟.  „1-p‟ notates the sample proportion of the other category [sometimes „1-p‟ is denoted as „q‟] Project example with proportion: Project description: Let‟s say that a project deliverable is a database for which over 10M data records are to be loaded from a very much larger library [population]. Depending on the mix of categories of data records in the population, the scheduling manager will schedule more loading time if mostly Category-1, or less time if not mostly Category-1. The project manager elects to sample the data record population to determine the proportionality, p, of records that are Category-1 so that the scheduling manager has information to guide project scheduling. The project risk management plan requires estimates to have 95% confidence for design parameters, and a margin of error of less than +/- 5% on sample data values. Sample design: With no a priori hypothesis of the expected proportionality of „p‟, some iteration may be required. A good starting point is to assume p = 0.5. The risk manager refers to the chart given in the appendix entitled “Proportion „p‟ vs +/- Margin of Error %” that is a plot of error percentage for a confidence of 95%. From that chart, the risk manager finds that for a +/- 5% margin of error of „p‟ with 95% confidence a sample size greater than 1,000 but smaller than 3000 is needed. Solving the margin of error equation for N in fact gives 1,536 as the appropriate starting point for N Starting with N = 1,536, if the first sample returns a „p‟ value that is 0.5 or greater, the margin of error is likely less than +/- 5%; no further sampling is required. Otherwise, a larger sample size is required. Sample analysis: Assume the sample returns a value of „p‟ of 0.7. From the confidence interval equation for proportions given in the appendix, the 95% confidence interval for the estimated proportion is calculated to be 67% to 73%, centered on 70%. Risk management analysis: There is a 5% probability that the proportion „p‟ is not within the confidence interval of 67% to 73%. There is not enough information to forecast whether the proportion „p‟ is more likely less than 67% or greater than 73%. From the chart in the appendix for margin of error, the margin of error of the proportionality value 0.7 is about +/- 4.7 %, or +/- 0.032, from 0.668 to 0.732. ©Copyright 2010 John C Goodpasture Page 5
  • 8. The sample data supports the project risk tolerance policy objectives of 95% confidence and < +/- 5% margin of error. Estimating continuous data parameters: Project example with descriptive statistics Project description: Let‟s say that a project deliverable is an ejector seat for a military aircraft; the average weight of the pilot population needs be known for the design. The project manager elects to sample the pilot population rather than weigh every pilot. The project risk management plan requires estimates to have 95% confidence for design parameters and +/- 3% margin of error for sample data statistics. Sample Frame: From the chart in the appendix entitled “% Margin of Error v N, 95% Confidence” the risk manager finds that a sample of size 85 is required to meet the +/- 3% policy metric and simultaneously meet the 95% confidence interval metric. So, in this example, 85 pilots are weighed from a population frame of active duty military pilots, both men and women. Assume the sample average is found from the sample data to be 175 lbs [79.4 kg], and the Sample σ is calculated from the sample data by spreadsheet function. Assume the Sample σ is calculated to be 25 lbs [11.3 kg]. Sample analysis: From the equation given in the appendix for continuous data, the 95% confidence interval for the estimated average weight of the pilot population is estimated to be about +/- 5.4 lbs [+/- 2.4 kg], or from 169.6 to 180.4 lbs [76.9 to 81.8 kg]. Risk management analysis: There is a 5% probability that the average pilot weight is not within the confidence interval. The sample average of 175 pounds is estimated to have a margin of error of +/- 3%, or +/- 5.2 pounds [+/- 2.4 kg]. ©Copyright 2010 John C Goodpasture Page 6
  • 9. Acknowledgement The author is indebted to Dr. Walter P. Bond, Associate Professor (retired) of Florida Institute of Technology for suggestions and peer review. Appendix Proportional category data Confidence Interval, proportions The following equations define the confidence interval for varying confidence objectives, where  is the symbol for „square root‟ [sqrt]. Note the square root is multiplied by a numerical factor. The numerical factor is a so-called „Z‟ number taken from the standard normal bell curve. Z = 1 corresponds to one standard deviation, σ. Z values typically range +/- 3 about the standard normal mean of „0‟. In order to use this equation, „p‟ cannot be very close to 0 or 1 since the validity of the equation depends „p‟ being in the mid-range of the confidence probability. 80% Interval = p +/- 1.3 * [p * (1 - p) / N] 90% Interval = p +/- 1.7 * [p * (1 - p) / N] 95% Interval = p +/- 2 * [p * (1 - p) / N] 99% Interval = p +/- 2.7 * [p * (1 - p) / N] The confidence objective, expressed as a %, is read as, for example, 80% confidence the real population parameter is within the interval, and 20% confidence the real population parameter is outside the interval. Margin of Error, proportions The following chart is a plot of three different sample sizes, N, showing the margin of error as the proportion „p‟ changes. This chart is based on the formula for margin of error given below: +/- Margin of Error = ½ Interval width / p Where ½ Interval width = +/- Z * [p * (1 - p) / N] And where Z = 2* for 95% confidence * more precisely: 1.96 ©Copyright 2010 John C Goodpasture Page 7
  • 10. Continuous data and descriptive statistics Confidence Interval The following equations give approximations of the interval range. „N‟ is the count of data values in the sample; N is the square root of „N‟. The numerical factor in the numerator comes from a table of „t‟ values that are developed by statisticians for sampling analysis. The „t‟ value depends on the count of the sample points. The „t‟ value typically ranges +/- 3; it is taken from the T-distribution that is approximately Normal. 80% Interval = Sample average +/- (1.3 / N) x Sample σ [narrowest interval] 90% Interval = Sample average +/- (1.7 / N) x Sample σ 95% Interval = Sample average +/- (2 / N) x Sample σ 99% Interval = Sample average +/- (2.7 / N) x Sample σ [widest interval] Note: The Sample σ, or sample standard deviation, is calculated, usually by spreadsheet function, from the sample data. ©Copyright 2010 John C Goodpasture Page 8
  • 11. Margin of error The margin of error is based on the following equation: Margin of error = +/- ‘t’ * sample σ / √N Sample average Where ‘t’ = 2* for 95% confidence interval *more precisely: 1.96 The following is a plot for the margin of error as a function of the sample size, N ©Copyright 2010 John C Goodpasture Page 9
  • 12. . ©Copyright 2010 John C Goodpasture Page 10
  • 13. John C. Goodpasture, PMP and Managing Principal at Square Peg Consulting, is a program manager, coach, author, and project consultant specializing in technology projects with emphasis on quantitative methods, project planning, and risk management. His career in program management has spanned the U.S. Department of Defense; the defense, intelligence, and aerospace industry; and the IT back office where he led several efforts in ERP systems. He has coached many project teams in the U.S., Europe, and Asia. John is the author of numerous books, magazine articles, and web logs in the field of project management, the most recent of which is “Project Management the agile way: Making it work in the enterprise”. He blogs at johngoodpasture.com, and his work products are found in the library at www. sqpegconsulting.com. ©Copyright 2010 John C Goodpasture Page 11