analytical representation of data

ANALYTICAL
REPRESENTATION
OF DATA
BY UNSA SHAKIR

Descriptives Data
 Disorganized Data
Comedy 7 Suspense 8 Comedy 7 Suspense 7
Drama 8 Horror 7 Drama 5 Comedy 6
Horror 8 Comedy 5 Drama 3 Drama 3
Suspense 7 Horror 8 Comedy 6 Suspense 6
Horror 8 Comedy 6 Drama 7 Horror 9
Drama 5 Horror 9 Drama 6 Suspense 4
Drama 5 Horror 7 Suspense 3 Suspense 4
Horror 7 Suspense 5 Horror 10 Suspense 5
Horror 9 Suspense 6 Comedy 6 Drama 8
Comedy 7 Comedy 5 Comedy 4 Drama 4
• Design frequency distribution table along bar graph

Descriptives
 Reducing and Describing Data (organized)
Genre Average Rating
Comedy 5.9
Drama 5.4
Horror 8.2
Suspense 5.5

Descriptives
 Displaying Data
Rating of Movie Genre Enjoyment
0
1
2
3
4
5
6
7
8
9
Comedy Drama Horror Suspense
Genre
AverageRating

MEASURE OF CENTRAL
TENDENCY & DISPERSION

Example : Previous slides
Identify each of the following examples as qualitative or
quantitative variables.
1. The residence hall for each student in a statistics
class. (qualitative)
2. The amount of gasoline pumped by the next 10 customers
at the local Uni-mart. (quantitative )
3. The amount of radon in the basement of each of 25 homes
in a new development. (quantitative )
4. The color of the baseball cap worn by each of 20
students. (qualitative)
5. The length of time to complete a mathematics
homework assignment. (quantitative )
6. The state in which each truck is registered when
stopped and inspected at a weigh station. (qualitative)

EXERCISE
Countries People enjoy
watching
movies (%)
People do not
enjoy watching
movies (%)
Pakistan 11 14
India 33 10
London 3 8
Australia 41 2
OVERALL TOTAL 88 34

 Most common ways to solve the example
 Histogram (useful to examine a variable)
 Pie Chart (5 category max)
 Frequency polygon

Example: stem-leaf plot
For some sports in the Olympics, such as diving and
gymnastics, the highest and lowest scores given by a judge
are dropped. Study the scores given by ten judges for two
gymnasts listed

Example:
 The numbers below represent the weights in pounds
of fish caught by contestants in a Striped Bass Derby.
Use the data to make a stem-and-leaf plot

Example
The Marks scored by 20 Students in a Unit Test out
of 25 are given below
12,10,08,12,04,15,18,23,18,16,16,12,23,18,12,05,16,16
,12,20
Prepare a Frequency Distribution Table

Example
Consider the following marks (out of 50) scored in
Mathematics by 50 Students of 8th Class.
41,31,33,32,28,31,21,10,30,22,33,37,12,05,08,15,39,2
6,41,46,34,22,09,11,16,22,25,29,31,39,23,31,21,45,47
,30,22,17,36,18,20,22,44,16,24,10,27,39,28,17
Prepare a Frequency Distribution Table

Measures of Central Tendency
 3 measures of central tendency are commonly used in
statistical analysis - MEAN, MEDIAN, and MODE.
 Each measure is designed to represent a “typical” value in
the distribution.

1. Mode
 Value of the distribution that occurs most frequently
(i.e., largest category)
 Only measure that can be used with nominal-level
variables
Example:
a) {1,0,5,9,12,8} - No mode
b) {4,5,5,5,9,20,30} – mode = 5
c) {2,2,5,9,9,15} - bimodal, mode 2 and 9

Mode for group data
Example: Find the mode of the data:
Solution:
Here the maximum frequency is 22. The number
corresponding to maximum frequency is the mode.
Hence mode = 15
Number 12 13 14 15 16
Frequency 7 9 6 22 20

analytical representation of data

• l = 60
• h = 20
• Fm = 16
• F1 = 15
• F2 = 14
• Use given data to calculate mode

Exercise
1.
Answer: 14.07
2.
Answer: 14.64
Class Frequen
cy
1-5
6-10
11-15
16-20
21-25
26-30
2
4
9
7
5
3
Total 30
Class
interval
1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18
Frequency 5 3 2 1 6 4
Find the mode of the following data:

2. Median
 value of the variable in the “middle” of the
distribution
 same as the 50th percentile
 When N is odd #, median is middle case:
 N=5: 2 2 6 9 11
• median=6
 When N is even #, median is the score between the
middle 2 cases:
 N=6: 2 2 5 9 11 15
• median=(5+9)/2 = 7

To compute the median
 first you order the values of X from low to high:  85,
90, 94, 94, 95, 97, 97, 97, 97, 98
 then count number of observations = 10.
 When the number of observations are even, average
the two middle numbers to calculate the median.
 This example, 96 is the median
(middle) score.

Median
 Find the Median
4 5 6 6 7 8 9 10 12
 Find the Median
5 6 6 7 8 9 10 12
 Find the Median
5 6 6 7 8 9 10 100,000

Median for group dataExample
Example 1: Find the median for the following
grouped data.
Class Interval Frequency
1 – 5 4
6 – 10 3
11 – 15 6
16 – 20 5
21 – 25 2
N = 20

Median for group data
Calculating cumulative frequency:
Class Interval Frequency Cumulative
Frequency
(fc)
1 – 5 4 4 4
6 – 10 3 7 4 + 3 =7
11 – 15 6 13 7 + 6 = 13
16 – 20 5 18 13 + 5 = 18
21 – 25 2 20 8 + 2 =20
N = 20

Solution:
Calculate median using the formula
median = L + ((N/2) – p.c.f) x i
fm
median = 11 + (20/2) – 7 x 5
6
= 13.5

Solution:
So we have,
L = 11
frequency of median class (fm) = 6
cumulative frequency of preceeding median class
(p.c.f) = 7
Size/width of class (i) = 5
N = 20

Exercise
Find the median of the following data:
1.
Answer: 15.5
2.
Class Frequen
cy
1-5
6-10
11-15
16-20
21-25
26-30
2
4
9
7
5
3
Total 30
Class
interval
1 – 3 4 – 6 7 – 9 10 – 12 13 – 15 16 – 18
Frequency 5 3 2 1 6 4
Answer: 11

3. Mean
The arithmetic average
 Amount each individual would get if the total were
divided among all the individuals in a distribution
 Symbolized as:
 x for the mean of a sample
 μ for the mean of a population
 Formula: X = (Xi )
N

Finding the Mean
 Formula for Mean: X = (Σ x)
N
 Given the data set: {3, 5, 10, 4, 3}
X = (3 + 5 + 10 + 4 + 3) = 25
5 5
X = 5

Find the Mean and median
Q: 85, 87, 89, 91, 98, 100
Mean: 91.67
Median: 90
Q: 5, 87, 89, 91, 98, 100
Mean: 78.3
Median: 90

Mean for group data
 Example 1: The number of goals scored by a hockey
team in 20 matches is given here.
4, 6, 3, 2, 2, 4, 1, 5, 3, 0, 4, 5, 4, 5, 4, 0, 4, 3, 6, 4

Mean for ungroup data
• Solution: Now the mean is calculated as
X = sum of the scores
number of scores
= ∑fx
N
= 69
20
= 3.45

Mean for group data Example 2
 Find the mean of given frequency table.
Class - Interval Frequency
0 – 4 3
5 – 9 5
10 – 14 7
15 – 19 4
20 – 24 6
N = 25

Mean for group data
• Solution:
• Step 1: Find the mid point (x) of each class interval.
• Step 2: Calculate fx by multiplying the values of
f and x.
• Step 3: Add all fx and calculate ∑fx.

Mean for group data
Class -
Interval
Midpoint of
Class Interval
(x)
Frequency
(f)
fx
0 – 4 2 3 6
5 – 9 7 5 35
10 – 14 12 7 84
15 – 19 17 4 68
20 – 24 22 6 132
N = 25 ∑fx = 325

Mean for group data
• Step 4: Now the mean is calculated as
X = sum of the scores
total number of scores
= ∑fx
N
= 325 = 13
25

Exercise
Answer: 46.5 kg
Weight(kg) Frequency
(f)
20-29 1
30-39 8
40-49 10
50-59 6
60-69 5
• Consider data set
of weights of 30
students. Find
the mean of
grouped data.

Relationships between the measurements
 When the mean, median and mode are all equal, the
distribution of the data set has a bell-shaped curve. The
distribution is then said to be symmetric.
 If Mode < Median < Mean, then the distribution is said to be
positive/right skewed, meaning there are a few unusual large
values.
 If Mean < Median < Mode, then the distribution is said to be
negative/left skewed, that is there are some unusual small
values.

 Levels of Measurement
 Nominal
 Mode only (categories defy ranking)
 Often, percent or proportion better
 Ordinal
 Mode or Median (typically, median preferred)
 Interval/Ratio
 Mode, Median, or Mean
 Mean if skew/outlier not a big problem (judgment call)

 In-class exercise:
 Find the mode, median & mean of the following numbers:
8, 4 , 10, 2 , 5 , 1 , 6 , 2 , 11 , 2
 Does this distribution have a positive or negative skew?
 Answers:
 Mode (most common) = 2
 Median (middle value) (1 2 2 2 4 5 6 8 10 11)= 4.5
 Mean = (Xi ) / N = 51/10 = 5.1

Measures of Dispersion
 Central Tendency doesn’t tell us everything whereas
Dispersion/Deviation/Spread tells us a lot about how the
data values are distributed.
 We are most interested in:
The range
The semi-interquartile range (SIR)
Standard Deviation (σ)
Variance (σ2)

Measures of Dispersion
1. Range (R)
 The scale distance between the highest and lowest
score
 R = (high score-low score)
 Simplest and most straightforward measure of
dispersion
 Limitation: even one extreme score can
throw off our understanding of dispersion

Example
 What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
Solution: The largest score is 9; the smallest score is
1; the range is XL - XS = 9 - 1 = 8
• {0, 8, 9, 9, 11, 53} Range = 53
• {0, 8, 9, 9, 11, 11} Range = 11

When To Use the Range
 The range is used when
 you have ordinal data or
 you are presenting your results to people with little or no
knowledge of statistics
 The range is rarely used in scientific work as it is fairly
insensitive
 It depends on only two scores in the set of data, XL and XS
 Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9

Percentile of a Score
Percentile of score x = • 100
number of scores less than x
total number of scores
The percentile is the percentage of scores in its frequency
distribution that are equal to or lower than it.
For example, a test score that is greater than 75% of the
scores of people taking the test is said to be at the 75th
percentile,
• The value below which a percentage of data falls

Example
 You are the fourth tallest person in a group of 20
 80% of people are shorter than you:
 That means you are at the 80th percentile.

Example:
• In the test 12% got D, 50% got C, 30% got B and 8% got
A
• You got a B, so add up
• all the 12% that got D,
• all the 50% that got C,
• half of the 30% that got B,

Example
 You Score a B!
 for a total percentile of 12% + 50% + 15% = 77%
 In other words you did "as well or better than 77% of
the class"
 (Why take half of B? Because you shouldn't imagine
you got the "Best B", or the "Worst B", just an
average B.)

Q1, Q2, Q3
divides ranked scores into four equal parts
Quartiles
25% 25% 25% 25%
Q3Q2Q1

Quartiles
Q1 = P25
Q2 = P50
Q3 = P75

The Semi-Interquartile Range
 The semi-interquartile range (or SIR) is defined as
the difference of the first and third quartiles divided
by two
 The first quartile is the 25th percentile
 The third quartile is the 75th percentile
 SIR = (Q3 - Q1) / 2
 IR= (Q3 - Q1)

Example:
Prison Rates (per 100k), 2001:
R = 795 (Louisiana) – 126 (Maine) = 669
• Q = 478 (Arizona) – 281 (New Mexico) =
197/2 = 98.5
• Q = 478 (Arizona) – 281 (New Mexico) = 197
126
281 478 795366
25% 50% 75%

53
Example
 What is the SIR for the
data to the right?
 25 % of the scores are
below 5
 5 is the first quartile
 25 % of the scores are
above 25
 25 is the third quartile
 SIR = (Q3 - Q1) / 2 = (25 -
5) / 2 = 10
2
4
6
 5 = 25th
%tile
8
10
12
14
20
30
 25 = 75th
%tile
60

MEASURES OF DISPERSION
 Standard deviation
 Uses every score in the distribution
 Measures the standard or typical distance from
the mean
Deviation score = Xi - X
 Example: with Mean= 50 and Xi = 53, the
deviation score is 53 - 50 = 3

Calculation Exercise
 Number of classes include
a sample of 5 students:
Calculate the mean,
variance & standard
deviation
mean = 20 / 5 = 4
s2 (variance)= 14/5 =
2.8
s= 2.8 =1.67
Xi (Xi – X) (Xi - X)2
5 1 1
2 -2 4
6 2 4
5 1 1
2 -2 4
 = 20 0 14

Calculating Variance, Then Standard Deviation
 Number of credits
include a sample of 8
students:
 Calculate the mean,
variance & standard
deviation
Xi (Xi – X) (Xi - X)2
10 -4 16
9 -5 25
13 -1 1
17 3 9
15 1 1
16 2 4
14 0 0
18 4 16
 = 112 0 72

Variance and Standard Deviation
 Instead of taking the absolute value, we square
the deviations from the mean. This yields a
positive value.
 This will result in measures we call the Variance
and the Standard Deviation
Sample - Population -
s Standard Deviation σ Standard Deviation
s2 Variance σ2 Variance

Calculating the Variance and Standard Deviation
with respect to population.
Formulae:
Variance:
2
( )iX X
s
N


2
2
( )iX X
s
N



Standard Deviation:
Ungrouped data

with respect to sample.
Formulae:
Variance:
2
( )iX X
s
N


2
2
( )iX X
s
N



Standard Deviation:
-1
Ungrouped data

Example: Ungrouped sample data
 7 , 6, 8, 5 , 9 ,4, 7 , 7 , 6, 6
 Range = 9-4=5
 Mean
 Variance
 Standard Deviation
61
2
2
( ) 18.5
2.0556
1 9
x x
S
n


  


_
6.5
x
x
n
 

2
( )
2.0556 1.4337
1
x x
S
n


  



with respect to population.
Formulae:
Variance: Standard Deviation:
Grouped data
ƒ (X − M)²
∑ƒ
ƒ (X − M)²
∑ƒ

with respect to sample.
Formulae:
Variance: Standard Deviation:
Grouped data
ƒ (X − M)²
∑ƒ
ƒ (X − M)²
∑ƒ

Example (Grouped data)
Find the variance and standard deviation of the sample data
below:
variance= 852.75/100 = 8.5275
Standard deviation = √8.5275 = 2.9201
Weight
(Class
Interval)
Fre
que
ncy,
f
Class
mid
point,
X
fX Mean,
M
s.d, x
(X-
M)
(X-M)2 Variance
f(X-M)2
60-62
63-65
66-68
69-71
72-74
5
18
42
27
8
61
64
67
70
73
305
1152
2814
1890
584
67.45
67.45
67.45
67.45
67.45
-6.45
-3.45
-0.45
2.55
5.55
41.6025
11.9025
0.2025
6.5025
30.8025
208.0125
214.245
8.505
175.5675
246.42
Total 100 6745 91.0125 852.75

Exercise
Consider data set of weights of 30 students. Find the standard deviation.
Weight(kg)
Frequency
(f)
20-29 1
30-39 8
40-49 10
50-59 6
60-69 5

Frequency Table Test Scores
Observation Frequency
(scores) (# occurrences)
65 1
70 2
75 3
80 4
85 3
90 2
95 1
What is the range of test
scores?
A: 30 (95 minus 65)
When calculating mean, one
must divide by what number?
A: 16 (total # occurrences)

Example
1
1
161.75
50
3.235
n
i i
i
n
i
i
f x
x
f








CGPA
(Class)
Frequency,
f
Class Mark
(Midpoint),
x
fx
2.50 - 2.75 2 2.625 5.250
2.75 - 3.00 10 2.875 28.750
3.00 - 3.25 15 3.125 46.875
3.25 - 3.50 13 3.375 43.875
3.50 - 3.75 7 3.625 25.375
3.75 - 4.00 3 3.875 11.625
Total 50 161.750
Mean

Median Example
CGPA (Class) Frequency, f
Cum.
frequency
2.50 - 2.75 2 2
2.75 - 3.00 10 12
3.00 - 3.25 15 27
3.25 - 3.50 13 40
3.50 - 3.75 7 47
3.75 - 4.00 3 50
Total 50
217.3
15
1225
25.000.3,
~



 
xMedian

Example for ungrouped data :-
The median and mode of this data
4, 6, 3, 1, 2, 5, 7, 3
Median & Mode

Mode Example
CGPA
(Class) Frequency
2.50 - 2.75 2
2.75 - 3.00 10
3.00 - 3.25 15
3.25 - 3.50 13
3.50 - 3.75 7
3.75 - 4.00 3
Total 50
1
1 2
5
3.00 0.25( )
5 2
3.179
x L c
  
   
   
 



The following data give the total number of
iPods sold by a mail order company on each of
30 days. Construct a frequency table.
Find the mean, variance and standard
deviation, mode and median.
23 14 19 23 20 16 27 9 21 14
22 13 26 16 18 12 9 26 20 16
8 25 11 15 28 22 10 5 17 21

analytical representation of data

More Related Content

Similar to analytical representation of data (20)

More from Unsa Shakir (20)

Recently uploaded (20)

analytical representation of data