T7 data analysis

Analyzing and interpreting data

By Rama Krishna Kompella

Myths
– Complex analysis and big words impress people.
– Analysis comes at the end when there is data to
analyze.
– Qualitative analysis is easier than quantitative analysis
– Data have their own meaning
– Stating limitations weakens the evaluation
– Computer analysis is always easier and better

Blind men and an elephant
- Indian fable

Things aren’t always what we think!
Six blind men go to observe an elephant. One feels the side and thinks the
elephant is like a wall. One feels the tusk and thinks the elephant is a like a
spear. One touches the squirming trunk and thinks the elephant is like a
snake. One feels the knee and thinks the elephant is like a tree. One
touches the ear, and thinks the elephant is like a fan. One grasps the tail and
thinks it is like a rope. They argue long and loud and though each was partly
in the right, all were in the wrong.
For a detailed version of this fable see: http://www.wordinfo.info/words/index/info/view_unit/1/?
letter=B&spage=3

Data analysis and interpretation
• Think about analysis EARLY
• Start with a plan
• Code, enter, clean
• Analyze
• Interpret
• Reflect
− What did we learn?
− What conclusions can we draw?
− What are our recommendations?
− What are the limitations of our analysis?

Why do I need an analysis plan?
• To make sure the questions and your data
collection instrument will get the information
you want
• Think about your “report” when you are
designing your data collection instruments

Do you want to report…
• the number of people who answered each
question?
• how many people answered a, b, c, d?
• the percentage of respondents who
answered a, b, c, d?
• the average number or score?
• the mid-point among a range of answers?
• a change in score between two points in
time?
• how people compared?
• quotes and people’s own words

Common descriptive statistics
• Count (frequencies)
• Percentage
• Mean
• Mode
• Median
• Range
• Standard deviation
• Variance
• Ranking

Key components of a data
analysis plan
• Purpose of the evaluation
• Questions
• What you hope to learn from the question
• Analysis technique
• How data will be presented

Steps in Processing of Data
• Preparing of raw data
• Editing
– Field editing
– Office editing
• Coding
– Establishment of appropriate category
– Mutually exclusive
• Tabulation
– Sorting and counting
– Summarizing of data

Types of Tabulation
• Simple or one-way tabulation
– Question with only one response (adds up to 100)
– Multiple response to a question ( doesn’t add up
to 100)
• Cross tabulation or two-way tabulation

Classification of Data
• Number of groups
• Width of the class interval
• Exclusive categories
• Exhaustive categories
• Avoid extremes

Frequency Distribution Tables

Lower Limit

Upper Limit

Measures of Central Tendency
• Measure of central tendency, of a data set is a
measure of the "middle" value of the data set
• The mean, median and mode are all valid
measures of central tendency
• But, under different conditions, some measures
of central tendency become more appropriate to
use than others

Mean
• The mean (or average) is the most popular
and well known measure of central tendency
• It can be used with both discrete and
continuous data, although its use is most
often with continuous data

Median & Mode
• The median is the middle score for a set of
data that has been arranged in order of
magnitude. The median is less affected by
outliers and skewed data.
• The mode is the most frequent score in our
data set. On a histogram it represents the
highest bar in a bar chart or histogram. You
can, therefore, sometimes consider the mode
as being the most popular option.

Choosing appropriate measure

Type of Variable Best measure of central tendency

Nominal Mode

Ordinal Median

Interval/Ratio (not skewed) Mean

Interval/Ratio (skewed) Median

How to represent the results
• Graphics should be used whenever practical
• Generally used graphics to depict the results
are:
– Bar charts
– Line charts
– Pie / round charts

Measures of Dispersion
Measures of dispersion (or variability or spread)
indicate the extent to which the observed
values are “spread out” around that center —
how “far apart” observed values typically are
from each other and therefore from some
average value (in particular, the mean).

Measures of Dispersion
• There are three main measures of dispersion:
– The range
– The semi-interquartile range (SIR)
– Variance / standard deviation

21

The Range
• The range is defined as the difference
between the largest score in the set of data
and the smallest score in the set of data, XL -
XS
• What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
• The largest score (XL) is 9; the smallest score
(XS) is 1; the range is XL - XS = 9 - 1 = 8
22

When To Use the Range
• The range is used when
– you have ordinal data or
– you are presenting your results to people with
little or no knowledge of statistics
• The range is rarely used in scientific work as it
is fairly insensitive
– It depends on only two scores in the set of data, XL
and XS
– Two very different sets of data can have the same
range:
1 1 1 1 9 vs 1 3 5 7 9 23

The Semi-Interquartile Range
• The semi-interquartile range (or SIR) is defined
as the difference of the first and third
quartiles divided by two
– The first quartile is the 25th percentile
– The third quartile is the 75th percentile
• SIR = (Q3 - Q1) / 2

24

SIR Example
• What is the SIR for the 2
data to the right? 4
← 5 = 25th %tile
• 25 % of the scores are 6
below 5 8
– 5 is the first quartile 10
• 25 % of the scores are 12
above 25 14
– 25 is the third quartile
20
• SIR = (Q3 - Q1) / 2 = (25 - ← 25 = 75th %tile
30
5) / 2 = 10 60 25

When To Use the SIR
• The SIR is often used with skewed data as it is
insensitive to the extreme scores

26

Mean Deviation
The key concept for describing normal distributions
and making predictions from them is called
deviation from the mean.
We could just calculate the average distance between each
observation and the mean.
• We must take the absolute value of the distance,
otherwise they would just cancel out to zero!
Formula:
| X − Xi |
∑ n

Mean Deviation: An Example
Data: X = {6, 10, 5, 4, 9, 8} X = 42 / 6 = 7

X – Xi Abs. Dev.
1. Compute X (Average)
7–6 1 2. Compute X – X and take the
7 – 10 3 Absolute Value to get
Absolute Deviations
7–5 2 3. Sum the Absolute
7–4 3 Deviations
4. Divide the sum of the
7–9 2
absolute deviations by N
7–8 1
Total: 12 12 / 6 = 2

What Does it Mean?
• On Average, each observation is two units away
from the mean.

Is it Really that Easy?
• No!
• Absolute values are difficult to manipulate
algebraically
• Absolute values cause enormous problems for
calculus (Discontinuity)
• We need something else…

Variance

• Variance is defined as the average of the
square deviations:
∑ ( X − µ) 2
σ2 =
N

30

What Does the Variance Formula
Mean?
• First, it says to subtract the mean from each of
the scores
– This difference is called a deviate or a deviation
score
– The deviate tells us how far a given score is from
the typical, or average, score
– Thus, the deviate is a measure of dispersion for a
given score

31

Mean?
• Why can’t we simply take the average of
the deviates? That is, why isn’t variance
defined as:
σ 2
≠
∑ ( X − µ)
N
This is not the formula
for variance!

32

Mean?
• One of the definitions of the mean was that it
always made the sum of the scores minus the
mean equal to 0
• Thus, the average of the deviates must be 0
since the sum of the deviates must equal 0
• To avoid this problem, statisticians square the
deviate score prior to averaging them
– Squaring the deviate score makes all the squared
scores positive
33

Computational Formula
• When calculating variance, it is often easier to use
a computational formula which is algebraically
equivalent to the definitional formula:

( ∑ X) 2

∑X ∑( X −µ)
2
− 2

N
σ
2
= =
N N

∀ σ2 is the population variance, X is a score, µ is
the population mean, and N is the number of 34

Computational Formula Example

X X2 X-µ (X-µ2
)
9 81 2 4
8 64 1 1
6 36 -1 1
5 25 -2 4
8 64 1 1
6 36 -1 1
Σ 42
= Σ 306
= Σ 0
= Σ 12
=
35

Computational Formula Example
( ∑ X) 2

∑X ∑( X −µ)
2 2
−
N
σ
2
σ =
2
=
N N
2
12
306 − 42 =
= 6 6
6 =2
306 − 294
=
6
12
=
6
=2

36

Variance of a Sample
• Because the sample mean is not a perfect
estimate of the population mean, the formula for
the variance of a sample is slightly different from
the formula for the variance of a population:

s
2
=
(
∑ X −X )2

N −1
• s2 is the sample variance, X is a score, X is
the sample mean, and N is the number of
37
scores

Homework
• The following are test scores from a class of
20 students:
• 96 95 93 89 83 83 81 77 77 77 71 71 70 68 68
65 57 55 48 42
• Find out the measures of central tendency
and dispersion
• What do you observe from the values of
measures of central tendency?

T7 data analysis

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to T7 data analysis (20)

More from kompellark (20)

Recently uploaded (20)

T7 data analysis

Editor's Notes