SlideShare a Scribd company logo
ANALYTICAL
REPRESENTATION
OF DATA
BY UNSA SHAKIR
Descriptives Data
 Disorganized Data
Comedy 7 Suspense 8 Comedy 7 Suspense 7
Drama 8 Horror 7 Drama 5 Comedy 6
Horror 8 Comedy 5 Drama 3 Drama 3
Suspense 7 Horror 8 Comedy 6 Suspense 6
Horror 8 Comedy 6 Drama 7 Horror 9
Drama 5 Horror 9 Drama 6 Suspense 4
Drama 5 Horror 7 Suspense 3 Suspense 4
Horror 7 Suspense 5 Horror 10 Suspense 5
Horror 9 Suspense 6 Comedy 6 Drama 8
Comedy 7 Comedy 5 Comedy 4 Drama 4
• Design frequency distribution table along bar graph
Descriptives
 Reducing and Describing Data (organized)
Genre Average Rating
Comedy 5.9
Drama 5.4
Horror 8.2
Suspense 5.5
Descriptives
 Displaying Data
Rating of Movie Genre Enjoyment
0
1
2
3
4
5
6
7
8
9
Comedy Drama Horror Suspense
Genre
AverageRating
MEASURE OF CENTRAL
TENDENCY & DISPERSION
Example : Previous slides
Identify each of the following examples as qualitative or
quantitative variables.
1. The residence hall for each student in a statistics
class. (qualitative)
2. The amount of gasoline pumped by the next 10 customers
at the local Uni-mart. (quantitative )
3. The amount of radon in the basement of each of 25 homes
in a new development. (quantitative )
4. The color of the baseball cap worn by each of 20
students. (qualitative)
5. The length of time to complete a mathematics
homework assignment. (quantitative )
6. The state in which each truck is registered when
stopped and inspected at a weigh station. (qualitative)
EXERCISE
Countries People enjoy
watching
movies (%)
People do not
enjoy watching
movies (%)
Pakistan 11 14
India 33 10
London 3 8
Australia 41 2
OVERALL TOTAL 88 34
 Most common ways to solve the example
 Histogram (useful to examine a variable)
 Pie Chart (5 category max)
 Frequency polygon
Example: stem-leaf plot
For some sports in the Olympics, such as diving and
gymnastics, the highest and lowest scores given by a judge
are dropped. Study the scores given by ten judges for two
gymnasts listed
Example:
 The numbers below represent the weights in pounds
of fish caught by contestants in a Striped Bass Derby.
Use the data to make a stem-and-leaf plot
Example
The Marks scored by 20 Students in a Unit Test out
of 25 are given below
12,10,08,12,04,15,18,23,18,16,16,12,23,18,12,05,16,16
,12,20
Prepare a Frequency Distribution Table
Example
Consider the following marks (out of 50) scored in
Mathematics by 50 Students of 8th Class.
41,31,33,32,28,31,21,10,30,22,33,37,12,05,08,15,39,2
6,41,46,34,22,09,11,16,22,25,29,31,39,23,31,21,45,47
,30,22,17,36,18,20,22,44,16,24,10,27,39,28,17
Prepare a Frequency Distribution Table
Measures of Central Tendency
 3 measures of central tendency are commonly used in
statistical analysis - MEAN, MEDIAN, and MODE.
 Each measure is designed to represent a “typical” value in
the distribution.
Measures of Central Tendency
1. Mode
 Value of the distribution that occurs most frequently
(i.e., largest category)
 Only measure that can be used with nominal-level
variables
Example:
a) {1,0,5,9,12,8} - No mode
b) {4,5,5,5,9,20,30} – mode = 5
c) {2,2,5,9,9,15} - bimodal, mode 2 and 9
Mode for group data
Example: Find the mode of the data:
Solution:
Here the maximum frequency is 22. The number
corresponding to maximum frequency is the mode.
Hence mode = 15
Number 12 13 14 15 16
Frequency 7 9 6 22 20
 analytical representation of data
• l = 60
• h = 20
• Fm = 16
• F1 = 15
• F2 = 14
• Use given data to calculate mode
Exercise
1.
Answer: 14.07
2.
Answer: 14.64
Class Frequen
cy
1-5
6-10
11-15
16-20
21-25
26-30
2
4
9
7
5
3
Total 30
Class
interval
1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18
Frequency 5 3 2 1 6 4
Find the mode of the following data:
Measures of Central Tendency
2. Median
 value of the variable in the “middle” of the
distribution
 same as the 50th percentile
 When N is odd #, median is middle case:
 N=5: 2 2 6 9 11
• median=6
 When N is even #, median is the score between the
middle 2 cases:
 N=6: 2 2 5 9 11 15
• median=(5+9)/2 = 7
To compute the median
 first you order the values of X from low to high:  85,
90, 94, 94, 95, 97, 97, 97, 97, 98
 then count number of observations = 10.
 When the number of observations are even, average
the two middle numbers to calculate the median.
 This example, 96 is the median
(middle) score.
Median
 Find the Median
4 5 6 6 7 8 9 10 12
 Find the Median
5 6 6 7 8 9 10 12
 Find the Median
5 6 6 7 8 9 10 100,000
Median for group dataExample
Example 1: Find the median for the following
grouped data.
Class Interval Frequency
1 – 5 4
6 – 10 3
11 – 15 6
16 – 20 5
21 – 25 2
N = 20
Median for group data
Calculating cumulative frequency:
Class Interval Frequency Cumulative
Frequency
(fc)
1 – 5 4 4 4
6 – 10 3 7 4 + 3 =7
11 – 15 6 13 7 + 6 = 13
16 – 20 5 18 13 + 5 = 18
21 – 25 2 20 8 + 2 =20
N = 20
Median for group data
Solution:
Calculate median using the formula
median = L + ((N/2) – p.c.f) x i
fm
median = 11 + (20/2) – 7 x 5
6
= 13.5
Median for group data
Solution:
So we have,
L = 11
frequency of median class (fm) = 6
cumulative frequency of preceeding median class
(p.c.f) = 7
Size/width of class (i) = 5
N = 20
Exercise
Find the median of the following data:
1.
Answer: 15.5
2.
Class Frequen
cy
1-5
6-10
11-15
16-20
21-25
26-30
2
4
9
7
5
3
Total 30
Class
interval
1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18
Frequency 5 3 2 1 6 4
Answer: 11
Measures of Central Tendency
3. Mean
The arithmetic average
 Amount each individual would get if the total were
divided among all the individuals in a distribution
 Symbolized as:
 x for the mean of a sample
 μ for the mean of a population
 Formula: X = (Xi )
N
Finding the Mean
 Formula for Mean: X = (Σ x)
N
 Given the data set: {3, 5, 10, 4, 3}
X = (3 + 5 + 10 + 4 + 3) = 25
5 5
X = 5
Find the Mean and median
Q: 85, 87, 89, 91, 98, 100
Mean: 91.67
Median: 90
Q: 5, 87, 89, 91, 98, 100
Mean: 78.3
Median: 90
Mean for group data
 Example 1: The number of goals scored by a hockey
team in 20 matches is given here.
4, 6, 3, 2, 2, 4, 1, 5, 3, 0, 4, 5, 4, 5, 4, 0, 4, 3, 6, 4
Mean for ungroup data
• Solution: Now the mean is calculated as
X = sum of the scores
number of scores
= ∑fx
N
= 69
20
= 3.45
Mean for group data Example 2
 Find the mean of given frequency table.
Class - Interval Frequency
0 – 4 3
5 – 9 5
10 – 14 7
15 – 19 4
20 – 24 6
N = 25
Mean for group data
• Solution:
• Step 1: Find the mid point (x) of each class interval.
• Step 2: Calculate fx by multiplying the values of
f and x.
• Step 3: Add all fx and calculate ∑fx.
Mean for group data
Class -
Interval
Midpoint of
Class Interval
(x)
Frequency
(f)
fx
0 – 4 2 3 6
5 – 9 7 5 35
10 – 14 12 7 84
15 – 19 17 4 68
20 – 24 22 6 132
N = 25 ∑fx = 325
Mean for group data
• Step 4: Now the mean is calculated as
X = sum of the scores
total number of scores
= ∑fx
N
= 325 = 13
25
Exercise
Answer: 46.5 kg
Weight(kg) Frequency
(f)
20-29 1
30-39 8
40-49 10
50-59 6
60-69 5
• Consider data set
of weights of 30
students. Find
the mean of
grouped data.
Relationships between the measurements
 When the mean, median and mode are all equal, the
distribution of the data set has a bell-shaped curve. The
distribution is then said to be symmetric.
 If Mode < Median < Mean, then the distribution is said to be
positive/right skewed, meaning there are a few unusual large
values.
 If Mean < Median < Mode, then the distribution is said to be
negative/left skewed, that is there are some unusual small
values.
 analytical representation of data
Measures of Central Tendency
 Levels of Measurement
 Nominal
 Mode only (categories defy ranking)
 Often, percent or proportion better
 Ordinal
 Mode or Median (typically, median preferred)
 Interval/Ratio
 Mode, Median, or Mean
 Mean if skew/outlier not a big problem (judgment call)
Measures of Central Tendency
 In-class exercise:
 Find the mode, median & mean of the following numbers:
8, 4 , 10, 2 , 5 , 1 , 6 , 2 , 11 , 2
 Does this distribution have a positive or negative skew?
 Answers:
 Mode (most common) = 2
 Median (middle value) (1 2 2 2 4 5 6 8 10 11)= 4.5
 Mean = (Xi ) / N = 51/10 = 5.1
Measures of Dispersion
 Central Tendency doesn’t tell us everything whereas
Dispersion/Deviation/Spread tells us a lot about how the
data values are distributed.
 We are most interested in:
The range
The semi-interquartile range (SIR)
Standard Deviation (σ)
Variance (σ2)
Measures of Dispersion
1. Range (R)
 The scale distance between the highest and lowest
score
 R = (high score-low score)
 Simplest and most straightforward measure of
dispersion
 Limitation: even one extreme score can
throw off our understanding of dispersion
Example
 What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
Solution: The largest score is 9; the smallest score is
1; the range is XL - XS = 9 - 1 = 8
• {0, 8, 9, 9, 11, 53} Range = 53
• {0, 8, 9, 9, 11, 11} Range = 11
When To Use the Range
 The range is used when
 you have ordinal data or
 you are presenting your results to people with little or no
knowledge of statistics
 The range is rarely used in scientific work as it is fairly
insensitive
 It depends on only two scores in the set of data, XL and XS
 Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
Percentile of a Score
Percentile of score x = • 100
number of scores less than x
total number of scores
The percentile is the percentage of scores in its frequency
distribution that are equal to or lower than it.
For example, a test score that is greater than 75% of the
scores of people taking the test is said to be at the 75th
percentile,
• The value below which a percentage of data falls
Example
 You are the fourth tallest person in a group of 20
 80% of people are shorter than you:
 That means you are at the 80th percentile.
Example:
• In the test 12% got D, 50% got C, 30% got B and 8% got
A
• You got a B, so add up
• all the 12% that got D,
• all the 50% that got C,
• half of the 30% that got B,
Example
 You Score a B!
 for a total percentile of 12% + 50% + 15% = 77%
 In other words you did "as well or better than 77% of
the class"
 (Why take half of B? Because you shouldn't imagine
you got the "Best B", or the "Worst B", just an
average B.)
Q1, Q2, Q3
divides ranked scores into four equal parts
Quartiles
25% 25% 25% 25%
Q3Q2Q1
Quartiles
Q1 = P25
Q2 = P50
Q3 = P75
The Semi-Interquartile Range
 The semi-interquartile range (or SIR) is defined as
the difference of the first and third quartiles divided
by two
 The first quartile is the 25th percentile
 The third quartile is the 75th percentile
 SIR = (Q3 - Q1) / 2
 IR= (Q3 - Q1)
Example:
Prison Rates (per 100k), 2001:
R = 795 (Louisiana) – 126 (Maine) = 669
• Q = 478 (Arizona) – 281 (New Mexico) =
197/2 = 98.5
• Q = 478 (Arizona) – 281 (New Mexico) = 197
126
281 478 795366
25% 50% 75%
53
Example
 What is the SIR for the
data to the right?
 25 % of the scores are
below 5
 5 is the first quartile
 25 % of the scores are
above 25
 25 is the third quartile
 SIR = (Q3 - Q1) / 2 = (25 -
5) / 2 = 10
2
4
6
 5 = 25th
%tile
8
10
12
14
20
30
 25 = 75th
%tile
60
MEASURES OF DISPERSION
 Standard deviation
 Uses every score in the distribution
 Measures the standard or typical distance from
the mean
Deviation score = Xi - X
 Example: with Mean= 50 and Xi = 53, the
deviation score is 53 - 50 = 3
Calculation Exercise
 Number of classes include
a sample of 5 students:
Calculate the mean,
variance & standard
deviation
mean = 20 / 5 = 4
s2 (variance)= 14/5 =
2.8
s= 2.8 =1.67
Xi (Xi – X) (Xi - X)2
5 1 1
2 -2 4
6 2 4
5 1 1
2 -2 4
 = 20 0 14
Calculating Variance, Then Standard Deviation
 Number of credits
include a sample of 8
students:
 Calculate the mean,
variance & standard
deviation
Xi (Xi – X) (Xi - X)2
10 -4 16
9 -5 25
13 -1 1
17 3 9
15 1 1
16 2 4
14 0 0
18 4 16
 = 112 0 72
Variance and Standard Deviation
 Instead of taking the absolute value, we square
the deviations from the mean. This yields a
positive value.
 This will result in measures we call the Variance
and the Standard Deviation
Sample - Population -
s Standard Deviation σ Standard Deviation
s2 Variance σ2 Variance
Calculating the Variance and Standard Deviation
with respect to population.
Formulae:
Variance:
2
( )iX X
s
N


2
2
( )iX X
s
N



Standard Deviation:
Ungrouped data
Calculating the Variance and Standard Deviation
with respect to sample.
Formulae:
Variance:
2
( )iX X
s
N


2
2
( )iX X
s
N



Standard Deviation:
-1
Ungrouped data
 analytical representation of data
Example: Ungrouped sample data
 7 , 6, 8, 5 , 9 ,4, 7 , 7 , 6, 6
 Range = 9-4=5
 Mean
 Variance
 Standard Deviation
61
2
2
( ) 18.5
2.0556
1 9
x x
S
n


  


_
6.5
x
x
n
 

2
( )
2.0556 1.4337
1
x x
S
n


  


Calculating the Variance and Standard Deviation
with respect to population.
Formulae:
Variance: Standard Deviation:
Grouped data
ƒ (X − M)²
∑ƒ
ƒ (X − M)²
∑ƒ
Calculating the Variance and Standard Deviation
with respect to sample.
Formulae:
Variance: Standard Deviation:
Grouped data
ƒ (X − M)²
∑ƒ
ƒ (X − M)²
∑ƒ
Example (Grouped data)
Find the variance and standard deviation of the sample data
below:
variance= 852.75/100 = 8.5275
Standard deviation = √8.5275 = 2.9201
Weight
(Class
Interval)
Fre
que
ncy,
f
Class
mid
point,
X
fX Mean,
M
s.d, x
(X-
M)
(X-M)2 Variance
f(X-M)2
60-62
63-65
66-68
69-71
72-74
5
18
42
27
8
61
64
67
70
73
305
1152
2814
1890
584
67.45
67.45
67.45
67.45
67.45
-6.45
-3.45
-0.45
2.55
5.55
41.6025
11.9025
0.2025
6.5025
30.8025
208.0125
214.245
8.505
175.5675
246.42
Total 100 6745 91.0125 852.75
Exercise
Consider data set of weights of 30 students. Find the standard deviation.
Weight(kg)
Frequency
(f)
20-29 1
30-39 8
40-49 10
50-59 6
60-69 5
Frequency Table Test Scores
Observation Frequency
(scores) (# occurrences)
65 1
70 2
75 3
80 4
85 3
90 2
95 1
What is the range of test
scores?
A: 30 (95 minus 65)
When calculating mean, one
must divide by what number?
A: 16 (total # occurrences)
Example
1
1
161.75
50
3.235
n
i i
i
n
i
i
f x
x
f








CGPA
(Class)
Frequency,
f
Class Mark
(Midpoint),
x
fx
2.50 - 2.75 2 2.625 5.250
2.75 - 3.00 10 2.875 28.750
3.00 - 3.25 15 3.125 46.875
3.25 - 3.50 13 3.375 43.875
3.50 - 3.75 7 3.625 25.375
3.75 - 4.00 3 3.875 11.625
Total 50 161.750
Mean
Median Example
CGPA (Class) Frequency, f
Cum.
frequency
2.50 - 2.75 2 2
2.75 - 3.00 10 12
3.00 - 3.25 15 27
3.25 - 3.50 13 40
3.50 - 3.75 7 47
3.75 - 4.00 3 50
Total 50
217.3
15
1225
25.000.3,
~



 
xMedian
Example for ungrouped data :-
The median and mode of this data
4, 6, 3, 1, 2, 5, 7, 3
Median & Mode
Mode Example
CGPA
(Class) Frequency
2.50 - 2.75 2
2.75 - 3.00 10
3.00 - 3.25 15
3.25 - 3.50 13
3.50 - 3.75 7
3.75 - 4.00 3
Total 50
1
1 2
5
3.00 0.25( )
5 2
3.179
x L c
  
   
   
 


The following data give the total number of
iPods sold by a mail order company on each of
30 days. Construct a frequency table.
Find the mean, variance and standard
deviation, mode and median.
23 14 19 23 20 16 27 9 21 14
22 13 26 16 18 12 9 26 20 16
8 25 11 15 28 22 10 5 17 21

More Related Content

PDF
Andes building a secure platform with the enhanced iopmp
PPTX
Microprocessors historical background
PDF
Tutorial ns 3-tutorial-slides
PPTX
Final draft intel core i5 processors architecture
PPTX
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
PPT
how email works
PDF
Cisco Internetworking Operating System (ios)
PPTX
Group 3 measures of central tendency and variation - (mean, median, mode, ra...
Andes building a secure platform with the enhanced iopmp
Microprocessors historical background
Tutorial ns 3-tutorial-slides
Final draft intel core i5 processors architecture
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
how email works
Cisco Internetworking Operating System (ios)
Group 3 measures of central tendency and variation - (mean, median, mode, ra...

Similar to analytical representation of data (20)

PPTX
Biostatistics Measures of central tendency
PPTX
Measures of Central Tendency - Grouped Data
PDF
Lessontwo - Measures of Tendency.pptx.pdf
PDF
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
PDF
Lesson2 - chapter two Measures of Tendency.pptx.pdf
PPTX
Lesson3 lpart one - Measures mean [Autosaved].pptx
PPT
Mean_Median_Mode Measures in Statistics.ppt
PPT
Measures of central Tendencies Mean_Median_Mode.ppt
PPT
Measures of Central Tendency Presentation.ppt
PPT
Mean_Median_Mode.ppt-rangeandset of data
PPT
Mean_Median_Mode.ppthhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh...
PPT
Statistics 3, 4
PPT
Community Medicine C22 P04 STATISTICAL AVERAGES.ppt
PPTX
Lecture 10.1 10.2 bt
PPTX
Lesson2 lecture two in Measures mean.pptx
PPTX
Basics of Stats (2).pptx
PPTX
AlsamerLagoyo-assessment for teachers education
PPTX
assessment on education for ptcp for teacher
PDF
Day2 session i&amp;ii - spss
PPTX
Stat Chapter 3.pptx, proved detail statistical issues
Biostatistics Measures of central tendency
Measures of Central Tendency - Grouped Data
Lessontwo - Measures of Tendency.pptx.pdf
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
Lesson2 - chapter two Measures of Tendency.pptx.pdf
Lesson3 lpart one - Measures mean [Autosaved].pptx
Mean_Median_Mode Measures in Statistics.ppt
Measures of central Tendencies Mean_Median_Mode.ppt
Measures of Central Tendency Presentation.ppt
Mean_Median_Mode.ppt-rangeandset of data
Mean_Median_Mode.ppthhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh...
Statistics 3, 4
Community Medicine C22 P04 STATISTICAL AVERAGES.ppt
Lecture 10.1 10.2 bt
Lesson2 lecture two in Measures mean.pptx
Basics of Stats (2).pptx
AlsamerLagoyo-assessment for teachers education
assessment on education for ptcp for teacher
Day2 session i&amp;ii - spss
Stat Chapter 3.pptx, proved detail statistical issues
Ad

More from Unsa Shakir (20)

PPTX
Types of diode
PPTX
Transistor
PPTX
Single diode circuits
PPTX
Silicon control rectifier
PPTX
Rectifiers
PPT
Operational amplifier
PPTX
Diode voltage multiplier
PPT
Types of transistors
PPTX
Clipper and clamper circuits
PPT
kinds of distribution
PDF
Probability of card
PPT
hypothesis test
PPT
correlation and regression
PPT
probability
PPT
tree diagrams
PPTX
counting techniques
PPTX
frequency distribution
PPT
graphic representations in statistics
PPTX
introduction to statistical theory
PPTX
FSM and ASM
Types of diode
Transistor
Single diode circuits
Silicon control rectifier
Rectifiers
Operational amplifier
Diode voltage multiplier
Types of transistors
Clipper and clamper circuits
kinds of distribution
Probability of card
hypothesis test
correlation and regression
probability
tree diagrams
counting techniques
frequency distribution
graphic representations in statistics
introduction to statistical theory
FSM and ASM
Ad

Recently uploaded (20)

PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPTX
communication and presentation skills 01
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PPTX
introduction to high performance computing
PDF
Design Guidelines and solutions for Plastics parts
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPT
Occupational Health and Safety Management System
PPTX
Management Information system : MIS-e-Business Systems.pptx
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PPTX
Artificial Intelligence
PDF
Abrasive, erosive and cavitation wear.pdf
PPT
Total quality management ppt for engineering students
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
communication and presentation skills 01
Exploratory_Data_Analysis_Fundamentals.pdf
Fundamentals of safety and accident prevention -final (1).pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
introduction to high performance computing
Design Guidelines and solutions for Plastics parts
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Safety Seminar civil to be ensured for safe working.
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
distributed database system" (DDBS) is often used to refer to both the distri...
Occupational Health and Safety Management System
Management Information system : MIS-e-Business Systems.pptx
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Artificial Intelligence
Abrasive, erosive and cavitation wear.pdf
Total quality management ppt for engineering students
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...

analytical representation of data

  • 2. Descriptives Data  Disorganized Data Comedy 7 Suspense 8 Comedy 7 Suspense 7 Drama 8 Horror 7 Drama 5 Comedy 6 Horror 8 Comedy 5 Drama 3 Drama 3 Suspense 7 Horror 8 Comedy 6 Suspense 6 Horror 8 Comedy 6 Drama 7 Horror 9 Drama 5 Horror 9 Drama 6 Suspense 4 Drama 5 Horror 7 Suspense 3 Suspense 4 Horror 7 Suspense 5 Horror 10 Suspense 5 Horror 9 Suspense 6 Comedy 6 Drama 8 Comedy 7 Comedy 5 Comedy 4 Drama 4 • Design frequency distribution table along bar graph
  • 3. Descriptives  Reducing and Describing Data (organized) Genre Average Rating Comedy 5.9 Drama 5.4 Horror 8.2 Suspense 5.5
  • 4. Descriptives  Displaying Data Rating of Movie Genre Enjoyment 0 1 2 3 4 5 6 7 8 9 Comedy Drama Horror Suspense Genre AverageRating
  • 6. Example : Previous slides Identify each of the following examples as qualitative or quantitative variables. 1. The residence hall for each student in a statistics class. (qualitative) 2. The amount of gasoline pumped by the next 10 customers at the local Uni-mart. (quantitative ) 3. The amount of radon in the basement of each of 25 homes in a new development. (quantitative ) 4. The color of the baseball cap worn by each of 20 students. (qualitative) 5. The length of time to complete a mathematics homework assignment. (quantitative ) 6. The state in which each truck is registered when stopped and inspected at a weigh station. (qualitative)
  • 7. EXERCISE Countries People enjoy watching movies (%) People do not enjoy watching movies (%) Pakistan 11 14 India 33 10 London 3 8 Australia 41 2 OVERALL TOTAL 88 34
  • 8.  Most common ways to solve the example  Histogram (useful to examine a variable)  Pie Chart (5 category max)  Frequency polygon
  • 9. Example: stem-leaf plot For some sports in the Olympics, such as diving and gymnastics, the highest and lowest scores given by a judge are dropped. Study the scores given by ten judges for two gymnasts listed
  • 10. Example:  The numbers below represent the weights in pounds of fish caught by contestants in a Striped Bass Derby. Use the data to make a stem-and-leaf plot
  • 11. Example The Marks scored by 20 Students in a Unit Test out of 25 are given below 12,10,08,12,04,15,18,23,18,16,16,12,23,18,12,05,16,16 ,12,20 Prepare a Frequency Distribution Table
  • 12. Example Consider the following marks (out of 50) scored in Mathematics by 50 Students of 8th Class. 41,31,33,32,28,31,21,10,30,22,33,37,12,05,08,15,39,2 6,41,46,34,22,09,11,16,22,25,29,31,39,23,31,21,45,47 ,30,22,17,36,18,20,22,44,16,24,10,27,39,28,17 Prepare a Frequency Distribution Table
  • 13. Measures of Central Tendency  3 measures of central tendency are commonly used in statistical analysis - MEAN, MEDIAN, and MODE.  Each measure is designed to represent a “typical” value in the distribution.
  • 14. Measures of Central Tendency 1. Mode  Value of the distribution that occurs most frequently (i.e., largest category)  Only measure that can be used with nominal-level variables Example: a) {1,0,5,9,12,8} - No mode b) {4,5,5,5,9,20,30} – mode = 5 c) {2,2,5,9,9,15} - bimodal, mode 2 and 9
  • 15. Mode for group data Example: Find the mode of the data: Solution: Here the maximum frequency is 22. The number corresponding to maximum frequency is the mode. Hence mode = 15 Number 12 13 14 15 16 Frequency 7 9 6 22 20
  • 17. • l = 60 • h = 20 • Fm = 16 • F1 = 15 • F2 = 14 • Use given data to calculate mode
  • 18. Exercise 1. Answer: 14.07 2. Answer: 14.64 Class Frequen cy 1-5 6-10 11-15 16-20 21-25 26-30 2 4 9 7 5 3 Total 30 Class interval 1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18 Frequency 5 3 2 1 6 4 Find the mode of the following data:
  • 19. Measures of Central Tendency 2. Median  value of the variable in the “middle” of the distribution  same as the 50th percentile  When N is odd #, median is middle case:  N=5: 2 2 6 9 11 • median=6  When N is even #, median is the score between the middle 2 cases:  N=6: 2 2 5 9 11 15 • median=(5+9)/2 = 7
  • 20. To compute the median  first you order the values of X from low to high:  85, 90, 94, 94, 95, 97, 97, 97, 97, 98  then count number of observations = 10.  When the number of observations are even, average the two middle numbers to calculate the median.  This example, 96 is the median (middle) score.
  • 21. Median  Find the Median 4 5 6 6 7 8 9 10 12  Find the Median 5 6 6 7 8 9 10 12  Find the Median 5 6 6 7 8 9 10 100,000
  • 22. Median for group dataExample Example 1: Find the median for the following grouped data. Class Interval Frequency 1 – 5 4 6 – 10 3 11 – 15 6 16 – 20 5 21 – 25 2 N = 20
  • 23. Median for group data Calculating cumulative frequency: Class Interval Frequency Cumulative Frequency (fc) 1 – 5 4 4 4 6 – 10 3 7 4 + 3 =7 11 – 15 6 13 7 + 6 = 13 16 – 20 5 18 13 + 5 = 18 21 – 25 2 20 8 + 2 =20 N = 20
  • 24. Median for group data Solution: Calculate median using the formula median = L + ((N/2) – p.c.f) x i fm median = 11 + (20/2) – 7 x 5 6 = 13.5
  • 25. Median for group data Solution: So we have, L = 11 frequency of median class (fm) = 6 cumulative frequency of preceeding median class (p.c.f) = 7 Size/width of class (i) = 5 N = 20
  • 26. Exercise Find the median of the following data: 1. Answer: 15.5 2. Class Frequen cy 1-5 6-10 11-15 16-20 21-25 26-30 2 4 9 7 5 3 Total 30 Class interval 1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18 Frequency 5 3 2 1 6 4 Answer: 11
  • 27. Measures of Central Tendency 3. Mean The arithmetic average  Amount each individual would get if the total were divided among all the individuals in a distribution  Symbolized as:  x for the mean of a sample  μ for the mean of a population  Formula: X = (Xi ) N
  • 28. Finding the Mean  Formula for Mean: X = (Σ x) N  Given the data set: {3, 5, 10, 4, 3} X = (3 + 5 + 10 + 4 + 3) = 25 5 5 X = 5
  • 29. Find the Mean and median Q: 85, 87, 89, 91, 98, 100 Mean: 91.67 Median: 90 Q: 5, 87, 89, 91, 98, 100 Mean: 78.3 Median: 90
  • 30. Mean for group data  Example 1: The number of goals scored by a hockey team in 20 matches is given here. 4, 6, 3, 2, 2, 4, 1, 5, 3, 0, 4, 5, 4, 5, 4, 0, 4, 3, 6, 4
  • 31. Mean for ungroup data • Solution: Now the mean is calculated as X = sum of the scores number of scores = ∑fx N = 69 20 = 3.45
  • 32. Mean for group data Example 2  Find the mean of given frequency table. Class - Interval Frequency 0 – 4 3 5 – 9 5 10 – 14 7 15 – 19 4 20 – 24 6 N = 25
  • 33. Mean for group data • Solution: • Step 1: Find the mid point (x) of each class interval. • Step 2: Calculate fx by multiplying the values of f and x. • Step 3: Add all fx and calculate ∑fx.
  • 34. Mean for group data Class - Interval Midpoint of Class Interval (x) Frequency (f) fx 0 – 4 2 3 6 5 – 9 7 5 35 10 – 14 12 7 84 15 – 19 17 4 68 20 – 24 22 6 132 N = 25 ∑fx = 325
  • 35. Mean for group data • Step 4: Now the mean is calculated as X = sum of the scores total number of scores = ∑fx N = 325 = 13 25
  • 36. Exercise Answer: 46.5 kg Weight(kg) Frequency (f) 20-29 1 30-39 8 40-49 10 50-59 6 60-69 5 • Consider data set of weights of 30 students. Find the mean of grouped data.
  • 37. Relationships between the measurements  When the mean, median and mode are all equal, the distribution of the data set has a bell-shaped curve. The distribution is then said to be symmetric.  If Mode < Median < Mean, then the distribution is said to be positive/right skewed, meaning there are a few unusual large values.  If Mean < Median < Mode, then the distribution is said to be negative/left skewed, that is there are some unusual small values.
  • 39. Measures of Central Tendency  Levels of Measurement  Nominal  Mode only (categories defy ranking)  Often, percent or proportion better  Ordinal  Mode or Median (typically, median preferred)  Interval/Ratio  Mode, Median, or Mean  Mean if skew/outlier not a big problem (judgment call)
  • 40. Measures of Central Tendency  In-class exercise:  Find the mode, median & mean of the following numbers: 8, 4 , 10, 2 , 5 , 1 , 6 , 2 , 11 , 2  Does this distribution have a positive or negative skew?  Answers:  Mode (most common) = 2  Median (middle value) (1 2 2 2 4 5 6 8 10 11)= 4.5  Mean = (Xi ) / N = 51/10 = 5.1
  • 41. Measures of Dispersion  Central Tendency doesn’t tell us everything whereas Dispersion/Deviation/Spread tells us a lot about how the data values are distributed.  We are most interested in: The range The semi-interquartile range (SIR) Standard Deviation (σ) Variance (σ2)
  • 42. Measures of Dispersion 1. Range (R)  The scale distance between the highest and lowest score  R = (high score-low score)  Simplest and most straightforward measure of dispersion  Limitation: even one extreme score can throw off our understanding of dispersion
  • 43. Example  What is the range of the following data: 4 8 1 6 6 2 9 3 6 9 Solution: The largest score is 9; the smallest score is 1; the range is XL - XS = 9 - 1 = 8 • {0, 8, 9, 9, 11, 53} Range = 53 • {0, 8, 9, 9, 11, 11} Range = 11
  • 44. When To Use the Range  The range is used when  you have ordinal data or  you are presenting your results to people with little or no knowledge of statistics  The range is rarely used in scientific work as it is fairly insensitive  It depends on only two scores in the set of data, XL and XS  Two very different sets of data can have the same range: 1 1 1 1 9 vs 1 3 5 7 9
  • 45. Percentile of a Score Percentile of score x = • 100 number of scores less than x total number of scores The percentile is the percentage of scores in its frequency distribution that are equal to or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile, • The value below which a percentage of data falls
  • 46. Example  You are the fourth tallest person in a group of 20  80% of people are shorter than you:  That means you are at the 80th percentile.
  • 47. Example: • In the test 12% got D, 50% got C, 30% got B and 8% got A • You got a B, so add up • all the 12% that got D, • all the 50% that got C, • half of the 30% that got B,
  • 48. Example  You Score a B!  for a total percentile of 12% + 50% + 15% = 77%  In other words you did "as well or better than 77% of the class"  (Why take half of B? Because you shouldn't imagine you got the "Best B", or the "Worst B", just an average B.)
  • 49. Q1, Q2, Q3 divides ranked scores into four equal parts Quartiles 25% 25% 25% 25% Q3Q2Q1
  • 50. Quartiles Q1 = P25 Q2 = P50 Q3 = P75
  • 51. The Semi-Interquartile Range  The semi-interquartile range (or SIR) is defined as the difference of the first and third quartiles divided by two  The first quartile is the 25th percentile  The third quartile is the 75th percentile  SIR = (Q3 - Q1) / 2  IR= (Q3 - Q1)
  • 52. Example: Prison Rates (per 100k), 2001: R = 795 (Louisiana) – 126 (Maine) = 669 • Q = 478 (Arizona) – 281 (New Mexico) = 197/2 = 98.5 • Q = 478 (Arizona) – 281 (New Mexico) = 197 126 281 478 795366 25% 50% 75%
  • 53. 53 Example  What is the SIR for the data to the right?  25 % of the scores are below 5  5 is the first quartile  25 % of the scores are above 25  25 is the third quartile  SIR = (Q3 - Q1) / 2 = (25 - 5) / 2 = 10 2 4 6  5 = 25th %tile 8 10 12 14 20 30  25 = 75th %tile 60
  • 54. MEASURES OF DISPERSION  Standard deviation  Uses every score in the distribution  Measures the standard or typical distance from the mean Deviation score = Xi - X  Example: with Mean= 50 and Xi = 53, the deviation score is 53 - 50 = 3
  • 55. Calculation Exercise  Number of classes include a sample of 5 students: Calculate the mean, variance & standard deviation mean = 20 / 5 = 4 s2 (variance)= 14/5 = 2.8 s= 2.8 =1.67 Xi (Xi – X) (Xi - X)2 5 1 1 2 -2 4 6 2 4 5 1 1 2 -2 4  = 20 0 14
  • 56. Calculating Variance, Then Standard Deviation  Number of credits include a sample of 8 students:  Calculate the mean, variance & standard deviation Xi (Xi – X) (Xi - X)2 10 -4 16 9 -5 25 13 -1 1 17 3 9 15 1 1 16 2 4 14 0 0 18 4 16  = 112 0 72
  • 57. Variance and Standard Deviation  Instead of taking the absolute value, we square the deviations from the mean. This yields a positive value.  This will result in measures we call the Variance and the Standard Deviation Sample - Population - s Standard Deviation σ Standard Deviation s2 Variance σ2 Variance
  • 58. Calculating the Variance and Standard Deviation with respect to population. Formulae: Variance: 2 ( )iX X s N   2 2 ( )iX X s N    Standard Deviation: Ungrouped data
  • 59. Calculating the Variance and Standard Deviation with respect to sample. Formulae: Variance: 2 ( )iX X s N   2 2 ( )iX X s N    Standard Deviation: -1 Ungrouped data
  • 61. Example: Ungrouped sample data  7 , 6, 8, 5 , 9 ,4, 7 , 7 , 6, 6  Range = 9-4=5  Mean  Variance  Standard Deviation 61 2 2 ( ) 18.5 2.0556 1 9 x x S n        _ 6.5 x x n    2 ( ) 2.0556 1.4337 1 x x S n       
  • 62. Calculating the Variance and Standard Deviation with respect to population. Formulae: Variance: Standard Deviation: Grouped data ƒ (X − M)² ∑ƒ ƒ (X − M)² ∑ƒ
  • 63. Calculating the Variance and Standard Deviation with respect to sample. Formulae: Variance: Standard Deviation: Grouped data ƒ (X − M)² ∑ƒ ƒ (X − M)² ∑ƒ
  • 64. Example (Grouped data) Find the variance and standard deviation of the sample data below: variance= 852.75/100 = 8.5275 Standard deviation = √8.5275 = 2.9201 Weight (Class Interval) Fre que ncy, f Class mid point, X fX Mean, M s.d, x (X- M) (X-M)2 Variance f(X-M)2 60-62 63-65 66-68 69-71 72-74 5 18 42 27 8 61 64 67 70 73 305 1152 2814 1890 584 67.45 67.45 67.45 67.45 67.45 -6.45 -3.45 -0.45 2.55 5.55 41.6025 11.9025 0.2025 6.5025 30.8025 208.0125 214.245 8.505 175.5675 246.42 Total 100 6745 91.0125 852.75
  • 65. Exercise Consider data set of weights of 30 students. Find the standard deviation. Weight(kg) Frequency (f) 20-29 1 30-39 8 40-49 10 50-59 6 60-69 5
  • 66. Frequency Table Test Scores Observation Frequency (scores) (# occurrences) 65 1 70 2 75 3 80 4 85 3 90 2 95 1 What is the range of test scores? A: 30 (95 minus 65) When calculating mean, one must divide by what number? A: 16 (total # occurrences)
  • 67. Example 1 1 161.75 50 3.235 n i i i n i i f x x f         CGPA (Class) Frequency, f Class Mark (Midpoint), x fx 2.50 - 2.75 2 2.625 5.250 2.75 - 3.00 10 2.875 28.750 3.00 - 3.25 15 3.125 46.875 3.25 - 3.50 13 3.375 43.875 3.50 - 3.75 7 3.625 25.375 3.75 - 4.00 3 3.875 11.625 Total 50 161.750 Mean
  • 68. Median Example CGPA (Class) Frequency, f Cum. frequency 2.50 - 2.75 2 2 2.75 - 3.00 10 12 3.00 - 3.25 15 27 3.25 - 3.50 13 40 3.50 - 3.75 7 47 3.75 - 4.00 3 50 Total 50 217.3 15 1225 25.000.3, ~      xMedian
  • 69. Example for ungrouped data :- The median and mode of this data 4, 6, 3, 1, 2, 5, 7, 3 Median & Mode
  • 70. Mode Example CGPA (Class) Frequency 2.50 - 2.75 2 2.75 - 3.00 10 3.00 - 3.25 15 3.25 - 3.50 13 3.50 - 3.75 7 3.75 - 4.00 3 Total 50 1 1 2 5 3.00 0.25( ) 5 2 3.179 x L c               
  • 71. The following data give the total number of iPods sold by a mail order company on each of 30 days. Construct a frequency table. Find the mean, variance and standard deviation, mode and median. 23 14 19 23 20 16 27 9 21 14 22 13 26 16 18 12 9 26 20 16 8 25 11 15 28 22 10 5 17 21