SlideShare a Scribd company logo
Quantified Self
      Exploiting Your Data


           22 March 2012
            Akram Najjar




 This talk is en “eye opener”
     We will Not discuss
    Techniques or “How”
      Data is Analyzed!

We will Only talk about “What”
  such methods can give us
What Methods can you Apply to Your Data?


A. The Bell Shaped Curve (Normal Distribution)
B. Correlation of two variables
C. Forecasting using Simple Linear Regression
   (Best Line of Fit)
D. Statistical Process Control




                                                 3 / 25




Other Tools that work directly on Data . . . .

   Goodness of Fit testing
   Independence Testing
   Moving Averages and Exponential Smoothing
   Non-Linear Regression
    (polynomial, exponential, logarithmic)
   Weighted Index Scoring
   Excel: The Pivot Table
   Excel: Conditional Formatting


                                                 4 / 25
A. The Bell Shaped Curve
                  (The Gaussian or Normal Distribution)
                     Useful when you have a lot of data
                     Prepare a Bar Chart or a Frequency Table
                     Most likely, they will plot as a Bell Shaped Curve
                      (Normal/Gauss Curve)
                     Example: Measurements of most natural variables
                     Example: Measurements of most manufactured items
                     Prepare a frequency table of your data
                     How many times did you get a specific value?
                     Out of 200 measurements, how many times was your Systolic
                      Blood Pressure = 110,115, 120, 125, 130, 135, 140 . .


                                                                                    5 / 25




                      Here are 24 Systolic Blood Pressure
                      Measurements – They Look like a Bell Curve
                                                                   Probability of
                                                                  Pressure > 125
                                                                   = (4 + 2) / 24
                                                                   = 1/4 = 25%
How many times?




                        Probability of Pressure > 125 = (4 + 2) / 24 = 1/4 = 25%
If we had 201 measurements . . . .

                                 Total Count in Bars
                                 = Area of Bars
                                 = Probability > 122
                                 = 15.83%




            The Bell Shaped Curve
          is completely defined by:

a)   Average (115) of the data

b)   Standard deviation (7) of the data. It
     indicates how spread is our data from
     the average.

     (Approx 70% of observations are
     between 115-7 and 115+7)
What do we get if we use the Bell Shaped
Curve (Normal Distribution)?

    Benefit 1: measuring the spread of our data
    Benefit 2: we can now compare specific
     scores in two different population (next slide)
    Benefit 3: if we know the measure, we can
     compute the probability of it happening
    Benefit 4: if we know the probability, we can
     work out the cut off measure that will give it


                                                       9 / 25




If I have the same score 78 in Courses A and B,
     can I say I am doing the same in both?

                          78
                     72


                           88
Benefits 3 and 4

   Given a specific measurement or range, what is
    the probability of their occurrence?
        Probability I will get a fever of more than 38 degrees?
        Probability flights will be more than 30 minutes late?
        Probability my systolic is > 122
   Given the probability, what is the cutoff
    measurement?
        I want to remain at a sugar level representing the top 15%
         allowed, what is the level related to that?
        If Human Resources want the top 15% results, what is the
         passing grade?
                                                                      11 / 25




B. Correlation
   If we have two sets of data, how are they related?
   Example: Blood Pressure vs Intake of Salt
   Example: Advertising Expenditure vs Sales Revenue
   Example: Hours walked per day vs Weight in Kilograms
   What is the direction of the relationship?
        Direct or inverse?
   What is the strength of the relationship?
        Correlation
   We use the Correlation Function (Demonstrate in Excel)

                                                                      12 / 25
C. Forecasting using Simple Linear
  Regression (Best Line of Fit)
   If we have an independent variable (X): Sugar Intake
   And a dependent variable (Y): Weight
   What is the relationship that allows us to forecast
    Weight for different Sugar Intakes?

   We need two columns: X and Y
   Simple Linear Regression allows us to find the Best
    Line to fit our data


                                                      13 / 25




           Regression finds the Best Line
             that Fits our Observations


      5
Y
      4

      3

      2

      1

     0,0     1    2    3    4    5   6    7    8
Which Straight Line Best Fits our Observations?


      5
 Y
      4

      3

      2

      1

     0,0    1    2   3   4   5    6   7   8




     Multiple Regression: allows us to find the
         Equation Y = aX1 + bX2 + cX3 + d




                X2           X3


      X1                                      Y
                                                  16 / 25
D. Statistical Process Control (SPC)

              The Purpose of SPC is to Monitor a Process
              SPC allows us to Check if a variable is behaving properly
                   Over time
                   Over different locations/departments
                   Over different events
                   Over different samples
              Control Charts were first used in Bell Labs (1924)
              Although mostly used in industry SPC can be used in any sector




                                                                                          17 / 25




               The General Form of a Control Chart: 4 Components
                            4) Process Data
                                                           1) UCL : Upper Control Limit
Our Variable




                                                                       2) AL : Average Line




                                                            3) LCL : Lower Lower Limit




                     The IDs of the Samples - - - - - OR The Time Series
This Process is “In control”

50
45
40
35
                                               Upper Limit
30
25
20
15
10
                                               Lower Limit
5
0




      This Process is Regularly “Out of Control”




            Look for an explanation INSIDE the system
This Process is Irregularly “Out of Control”




     Look for an explanation OUTSIDE the system




This Process is Irregularly “Out of Control”.
                                    Trends in either
                                    direction of 5 or
                                       more points




     Look for an explanation OUTSIDE the system
The 7 Point Rule: there is a problem if 7 points in a
row (Or more) are above the average or below it




      Look for an explanation OUTSIDE the system




            Types of Control Charts
Thank you
for your kind
    attention

More Related Content

PDF
Akram najjar exploiting your data (in color)
PPS
Standard deviationnormal distributionshow
PPTX
Dispersion 2
PDF
14 ch ken black solution
PPT
Descriptive stat
PPTX
Propteties of Standard Deviation
PPT
Dispersion
PDF
17 ch ken black solution
Akram najjar exploiting your data (in color)
Standard deviationnormal distributionshow
Dispersion 2
14 ch ken black solution
Descriptive stat
Propteties of Standard Deviation
Dispersion
17 ch ken black solution

What's hot (19)

PDF
Measures of Dispersion
PPTX
PPTX
Biostatistics i
PDF
13 ch ken black solution
PPTX
Standard deviation
PDF
15 ch ken black solution
PPTX
Statistical analysis in analytical chemistry
PPT
statistics
PPTX
Measure of dispersion
PPT
Variability
PPTX
Properties of Standard Deviation
PPT
Measure of Dispersion
PPTX
Assessing Normality
PDF
11 ch ken black solution
PPTX
Measures of dispersion
PPT
Statistics-Measures of dispersions
PDF
16 ch ken black solution
PPT
Standard deviation
PPTX
The Standard Normal Distribution
Measures of Dispersion
Biostatistics i
13 ch ken black solution
Standard deviation
15 ch ken black solution
Statistical analysis in analytical chemistry
statistics
Measure of dispersion
Variability
Properties of Standard Deviation
Measure of Dispersion
Assessing Normality
11 ch ken black solution
Measures of dispersion
Statistics-Measures of dispersions
16 ch ken black solution
Standard deviation
The Standard Normal Distribution
Ad

Viewers also liked (18)

PPT
A couple cool Chinabuye's products
PDF
Path to market analysis
PPTX
Measurement Blood Coagulation Rate
PPTX
Hiphop php
PDF
Exploiting your Data
PPTX
사회적시간은행 이그나이트광주 20120317
PDF
웹20년과정치 세종대웹20주년 20141017
DOC
Formulir oprec saung sastra
PPT
A couple cool Chinabuye's products
DOC
кодове за Gta san andreas
PPTX
Samsung galaxy s 2
PPT
High expectations @ SAMS
PPT
Teaching as Inquiry
PDF
HVTT14 Investigating Heavy Vehicle Rollover Crashes and the Influence of Road...
PPTX
Ppt laju reaksi
PPT
Jesica mafe
PPT
mafe
PDF
HVTT14 Traffic Safety Risks with EU Tractor-Semitrailer Rigs on Slippery Roads
A couple cool Chinabuye's products
Path to market analysis
Measurement Blood Coagulation Rate
Hiphop php
Exploiting your Data
사회적시간은행 이그나이트광주 20120317
웹20년과정치 세종대웹20주년 20141017
Formulir oprec saung sastra
A couple cool Chinabuye's products
кодове за Gta san andreas
Samsung galaxy s 2
High expectations @ SAMS
Teaching as Inquiry
HVTT14 Investigating Heavy Vehicle Rollover Crashes and the Influence of Road...
Ppt laju reaksi
Jesica mafe
mafe
HVTT14 Traffic Safety Risks with EU Tractor-Semitrailer Rigs on Slippery Roads
Ad

Similar to Akram najjar exploiting your data (for printing) (20)

PPTX
Statistical quality__control_2
PPT
Qc tools
PPT
Qc tools
PDF
Normal Distribution
PDF
7. logistics regression using spss
PPT
Lesson 5 - Chebyshev and Normal.ppt
PPT
Measures of dispersion
PDF
Measures of dispersion discuss 2.2
PPTX
Measure of Dispersion in statistics
PPTX
Quality control program 05042018
DOCX
Module-2_Notes-with-Example for data science
PDF
TESCO Evaluation of Non-Normal Meter Data
PPT
Spc training
PDF
A-guide-to-creating-and-interpreting-run-and-control-charts
PDF
Back to the basics-Part2: Data exploration: representing and testing data pro...
PPT
Cairo 02 Stat Inference
PDF
Data analysis ( Bio-statistic )
PPTX
Internal quality control
PPT
Quantitative_analysis and methods built software
PPT
SPC WithAdrian Adrian Beale
Statistical quality__control_2
Qc tools
Qc tools
Normal Distribution
7. logistics regression using spss
Lesson 5 - Chebyshev and Normal.ppt
Measures of dispersion
Measures of dispersion discuss 2.2
Measure of Dispersion in statistics
Quality control program 05042018
Module-2_Notes-with-Example for data science
TESCO Evaluation of Non-Normal Meter Data
Spc training
A-guide-to-creating-and-interpreting-run-and-control-charts
Back to the basics-Part2: Data exploration: representing and testing data pro...
Cairo 02 Stat Inference
Data analysis ( Bio-statistic )
Internal quality control
Quantitative_analysis and methods built software
SPC WithAdrian Adrian Beale

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
Getting Started with Data Integration: FME Form 101
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Getting Started with Data Integration: FME Form 101
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
A comparative analysis of optical character recognition models for extracting...
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Programs and apps: productivity, graphics, security and other tools
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction

Akram najjar exploiting your data (for printing)

  • 1. Quantified Self Exploiting Your Data 22 March 2012 Akram Najjar This talk is en “eye opener” We will Not discuss Techniques or “How” Data is Analyzed! We will Only talk about “What” such methods can give us
  • 2. What Methods can you Apply to Your Data? A. The Bell Shaped Curve (Normal Distribution) B. Correlation of two variables C. Forecasting using Simple Linear Regression (Best Line of Fit) D. Statistical Process Control 3 / 25 Other Tools that work directly on Data . . . .  Goodness of Fit testing  Independence Testing  Moving Averages and Exponential Smoothing  Non-Linear Regression (polynomial, exponential, logarithmic)  Weighted Index Scoring  Excel: The Pivot Table  Excel: Conditional Formatting 4 / 25
  • 3. A. The Bell Shaped Curve (The Gaussian or Normal Distribution)  Useful when you have a lot of data  Prepare a Bar Chart or a Frequency Table  Most likely, they will plot as a Bell Shaped Curve (Normal/Gauss Curve)  Example: Measurements of most natural variables  Example: Measurements of most manufactured items  Prepare a frequency table of your data  How many times did you get a specific value?  Out of 200 measurements, how many times was your Systolic Blood Pressure = 110,115, 120, 125, 130, 135, 140 . . 5 / 25 Here are 24 Systolic Blood Pressure Measurements – They Look like a Bell Curve Probability of Pressure > 125 = (4 + 2) / 24 = 1/4 = 25% How many times? Probability of Pressure > 125 = (4 + 2) / 24 = 1/4 = 25%
  • 4. If we had 201 measurements . . . . Total Count in Bars = Area of Bars = Probability > 122 = 15.83% The Bell Shaped Curve is completely defined by: a) Average (115) of the data b) Standard deviation (7) of the data. It indicates how spread is our data from the average. (Approx 70% of observations are between 115-7 and 115+7)
  • 5. What do we get if we use the Bell Shaped Curve (Normal Distribution)?  Benefit 1: measuring the spread of our data  Benefit 2: we can now compare specific scores in two different population (next slide)  Benefit 3: if we know the measure, we can compute the probability of it happening  Benefit 4: if we know the probability, we can work out the cut off measure that will give it 9 / 25 If I have the same score 78 in Courses A and B, can I say I am doing the same in both? 78 72 88
  • 6. Benefits 3 and 4  Given a specific measurement or range, what is the probability of their occurrence?  Probability I will get a fever of more than 38 degrees?  Probability flights will be more than 30 minutes late?  Probability my systolic is > 122  Given the probability, what is the cutoff measurement?  I want to remain at a sugar level representing the top 15% allowed, what is the level related to that?  If Human Resources want the top 15% results, what is the passing grade? 11 / 25 B. Correlation  If we have two sets of data, how are they related?  Example: Blood Pressure vs Intake of Salt  Example: Advertising Expenditure vs Sales Revenue  Example: Hours walked per day vs Weight in Kilograms  What is the direction of the relationship?  Direct or inverse?  What is the strength of the relationship?  Correlation  We use the Correlation Function (Demonstrate in Excel) 12 / 25
  • 7. C. Forecasting using Simple Linear Regression (Best Line of Fit)  If we have an independent variable (X): Sugar Intake  And a dependent variable (Y): Weight  What is the relationship that allows us to forecast Weight for different Sugar Intakes?  We need two columns: X and Y  Simple Linear Regression allows us to find the Best Line to fit our data 13 / 25 Regression finds the Best Line that Fits our Observations 5 Y 4 3 2 1 0,0 1 2 3 4 5 6 7 8
  • 8. Which Straight Line Best Fits our Observations? 5 Y 4 3 2 1 0,0 1 2 3 4 5 6 7 8 Multiple Regression: allows us to find the Equation Y = aX1 + bX2 + cX3 + d X2 X3 X1 Y 16 / 25
  • 9. D. Statistical Process Control (SPC)  The Purpose of SPC is to Monitor a Process  SPC allows us to Check if a variable is behaving properly  Over time  Over different locations/departments  Over different events  Over different samples  Control Charts were first used in Bell Labs (1924)  Although mostly used in industry SPC can be used in any sector 17 / 25 The General Form of a Control Chart: 4 Components 4) Process Data 1) UCL : Upper Control Limit Our Variable 2) AL : Average Line 3) LCL : Lower Lower Limit The IDs of the Samples - - - - - OR The Time Series
  • 10. This Process is “In control” 50 45 40 35 Upper Limit 30 25 20 15 10 Lower Limit 5 0 This Process is Regularly “Out of Control” Look for an explanation INSIDE the system
  • 11. This Process is Irregularly “Out of Control” Look for an explanation OUTSIDE the system This Process is Irregularly “Out of Control”. Trends in either direction of 5 or more points Look for an explanation OUTSIDE the system
  • 12. The 7 Point Rule: there is a problem if 7 points in a row (Or more) are above the average or below it Look for an explanation OUTSIDE the system Types of Control Charts
  • 13. Thank you for your kind attention