SlideShare a Scribd company logo
Descriptive Statistics
Five types of statistical analysis
Descriptive
Inferential
Differences
Associative
Predictive
What are the characteristics of
the respondents?
What are the characteristics of
the population?
Are two or more groups the same
or different?
Are two or more variables related
in a systematic way?
Can we predict one variable if we know
one or more other variables?
 Summarization of a collection of data in a clear and
understandable way
 the most basic form of statistics
 lays the foundation for all statistical knowledge
Descriptive Statistics
Measures of central tendency
• mean, median, mode
Measures of dispersion
• range, standard deviation, and coefficient of variation
Measures of shape
• skewness and kurtosis
•If you use fewer statistics to describe the distribution of a variable,
you lose information but gain clarity.
Type of
Measurement
Nominal
Two
categories
More than
two categories
Frequency table
Proportion (percentage)
Frequency table
Category proportions
(percentages)
Mode
Type of
descriptive analysis
Ratio means
Type of
Measurement
Type of
descriptive analysis
Ordinal Rank order
Median
Interval Arithmetic mean
Data Tabulation
• Tabulation: The organized arrangement of data in a
table format that is easy to read and understand.
– A count of the number of responses to each question.
• Simple Tabulation: tabulating of results of only one
variable informs you how often each response was
given.
• Frequency Distribution: A distribution of data that
summarizes the number of times a certain value of a
variable occurs expressed in terms of percentages.
The arrangement of statistical data in a row-and-
column format that exhibits the count of
responses or observations for each category
assigned to a variable
• How many of certain brand users can be called loyal?
• What percentage of the market are heavy users and
light users?
• How many consumers are aware of a new product?
• What brand is the “Top of Mind” of the market?
Frequency Tables
More on relative frequency distributions
• Rules for relative frequency distributions:
– Make sure each observation is in one and only one category.
– Use categories of equal width.
– Choose an appealing number of categories.
– Provide labels
– Double-check your graph.
A histogram is a relative
frequency distribution of a
quantitative variable
643 Networking
213 print ad
179 Online recruitment site
112 Placement firm
18 Temporary agency
How did you find your last job?
7006005004003002001000
Networking
print ad
Online recruitment site
Placement firm
Temporary agency
55.2 %
18.3 %
15.4 %
9.6 %
1.5 %
A bar graph is a relative
frequency distribution of
a qualitative variable
Malimu descriptive statistics.
How many times per week do you use mouthwash ?
1__ 2__ 3__ 4__ 5__ 6__ 7__
1 1 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 6 6 6 7 7
1 2
2 3
3 5
4 7
5 5
6 3
7 2
0
1
2
3
4
5
6
7
1
2
3
4
5
6
7
Normal Distribution
µ
σ
- ∝ ∝a b
IQ
The total area under the curve is equal to 1, i.e. It takes in all
observations
The area of a region under the normal distribution between any
two values equals the probability of observing a value in that
range when an observation is randomly selected from the
distribution
For example, on a single draw there is a 34% chance of selecting
from the distribution a person with an IQ between 100 and 115
Normal Distributions
 Curve is basically bell shaped
from - ∝ to ∝
 symmetric with scores
concentrated in the middle (i.e. on
the mean) than in the tails.
Mean, medium and mode
coincide
They differ in how spread out
they are.
 The area under each curve is 1.
The height of a normal
distribution can be specified
mathematically in terms of two
parameters: the mean (µ) and the
standard deviation (σ).
Occur when one tail of the distribution is longer than the other.
Positive Skew Distributions
 have a long tail in the positive direction.
 sometimes called "skewed to the right"
 more common than distributions with negative skews
E.g. distribution of income. Most people make under $80,000 a
year, but some make quite a bit more with a small number making
many millions of dollars per year
 The positive tail therefore extends out quite a long way
Negative Skew Distributions
have a long tail in the negative direction.
called "skewed to the left."
negative tail stops at zero
E.g. GPA
Skewed Distributions
• Kurtosis: how peaked a distribution is. A
zero indicates normal distribution, positive
numbers indicate a peak, negative numbers
indicate a flatter distribution)
Peaked
distribution
Flat distribution
Thanks, Scott!
–central tendency
–Dispersion or variability
A quantitative measure of the degree to
which scores in a distribution are spread
out or are clustered together
Summary statistics
Measures of Central Tendency
• Mode: the number that occurs most often
in a string (nominal data)
• Median: half of the responses fall above
this point, half fall below this point
(ordinal data)
• Mean: the average (interval/ratio data)
Mode
 the most frequent category
users 25%
non-users 75%
Advantages:
• meaning is obvious
• the only measure of central tendency that can be used
with nominal data.
Disadvantages
• many distributions have more than one mode, i.e. are
“multimodal”
• greatly subject to sample fluctuations
• therefore not recommended to be used as the only
measure of central tendency.
Median
the middle observation of the data
number times per week consumers use mouthwash
1 1 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 6 6 6 7 7
Frequency
distribution of
Mouthwash
use per week
Heavy userLight user Mode
Median
Mean
The Mean (average value)
sum of all the scores divided by the number of scores.
 a good measure of central tendency for roughly
symmetric distributions
 can be misleading in skewed distributions since it can be
greatly influenced by extreme scores in which case other
statistics such as the median may be more informative
 formula µ = ΣX/N (population)
X = Σxi/n (sample)
where µ an X are the population & sample means
and N and n are the number of scores.
¯
¯
Normal Distributions with
different Means
0- ∝ ∝µ1 µ2
• Minimum, Maximum, and Range (Highest
value minus the lowest value)
• Variance
• Standard Deviation (A measure’s distance
from the mean)
Measures of Dispersion or
Variability
Distribution of Final Course Grades in MGMT 3220Y
0
5
10
15
20
25
Grade
Frequency
Frequency 3 10 20 23 12
F D C B A
RANGE
- 1 SD
+ 1 SD
Variance
• The difference between an observed value and
the mean is called the deviation from the mean
• The variance is the mean squared deviation
from the mean
• i.e. you subtract each value from the mean,
square each result and then take the average.
• Because it is squared it can never be negative
σ2
= Σ(x- xi)2
/n¯
• The standard deviation is the square root of
the variance
• Thus the standard deviation is expressed in
the same units as the variables
• Helps us to understand how clustered or
spread the distribution is around the mean
value.
Standard Deviation
S = √ Σ(x- xi)2
/n¯
Measures of Dispersion
Suppose we are testing the new flavor of a fruit punch
Dislike 1 2 3 4 5 Like Data
1. 3
2. 5
3. 3
4. 5
5. 3
6. 5
x
x
x
x
x
x
X= 4
σ2
= 1
S = 1
σ2
= Σ(x- xi)2
/n¯ S = √ Σ(x- xi)2
/n¯
Measures of Dispersion
Dislike 1 2 3 4 5 Like Data
1. 5
2. 4
3. 5
4. 5
5. 5
6. 4
x
x
x
x
x
x
X = 4.67
σ2
=0.22
S = 0.47
σ2
= Σ(x- xi)2
/n¯ S = √ Σ(x- xi)2
/n¯
¯
Measures of Dispersion
Dislike 1 2 3 4 5 Like Data
1. 1
2. 5
3. 1
4. 5
5. 1
6. 5
x
x
x
x
x
x
X= 3
σ2
=4
S = 2
σ2
= Σ(x- xi)2
/n¯ S = √ Σ(x- xi)2
/n¯
¯
µ- ∝ ∝
σ1
σ2
σ3
Normal Distributions
with different SD
• A statistical technique that involves tabulating the
results of two or more variables simultaneously
• informs you how often each response was given
• Shows relationships among and between variables
• frequency distribution for each subgroup
compared to the frequency distribution for the total
sample
• must be nominally scaled
Cross Tabulation
Cross-tabulation
• Helps answer questions about whether two
or more variables of interest are linked:
– Is the type of mouthwash user (heavy or
light) related to gender?
– Is the preference for a certain flavor (cherry
or lemon) related to the geographic region
(north, south, east, west)?
– Is income level associated with gender?
• Cross-tabulation determines association not
causality.
• The variable being studied is called the
dependent variable or response variable.
• A variable that influences the dependent
variable is called independent variable.
Dependent and Independent Variables
Cross-tabulation
• Cross-tabulation of two or more variables is
possible if the variables are discrete:
– The frequency of one variable is subdivided by the
other variable categories.
• Generally a cross-tabulation table has:
– Row percentages
– Column percentages
– Total percentages
• Which one is better?
DEPENDS on which variable is considered as
independent.
• A contingency table shows the conjoint
distribution of two discrete variables
• This distribution represents the probability
of observing a case in each cell
– Probability is calculated as:
Contingency Table
Observed cases
Total cases
P=
10 9 19
52.6% 47.4% 100.0%
55.6% 18.8% 28.8%
15.2% 13.6% 28.8%
5 25 30
16.7% 83.3% 100.0%
27.8% 52.1% 45.5%
7.6% 37.9% 45.5%
3 14 17
17.6% 82.4% 100.0%
16.7% 29.2% 25.8%
4.5% 21.2% 25.8%
18 48 66
27.3% 72.7% 100.0%
100.0% 100.0% 100.0%
27.3% 72.7% 100.0%
Count
% within GROUPINC
% within Gender
% of Total
Count
% within GROUPINC
% within Gender
% of Total
Count
% within GROUPINC
% within Gender
% of Total
Count
% within GROUPINC
% within Gender
% of Total
income <= 5
5>Income<= 10
income >10
GROUPINC
Total
Female Male
Gender
Total
Cross tabulation
GROUPINC * Gender Crosstabulation
General Procedure for
Hypothesis Test
1. Formulate H0 (null hypothesis) and H1
(alternative hypothesis)
2. Select appropriate test
3. Choose level of significance
4. Calculate the test statistic (SPSS)
5. Determine the probability associated with
the statistic.
• Determine the critical value of the test
statistic.
General Procedure for
Hypothesis Test
6 a) Compare with the level of significance, α
b) Determine if the critical value falls in the
rejection region. (check tables)
7 Reject or do not reject H0
8 Draw a conclusion
• The hypothesis the researcher wants to test is called
the alternative hypothesis H1.
• The opposite of the alternative hypothesis is the null
hypothesis H0(the status quo)(no difference between
the sample and the population, or between samples).
• The objective is to DISPROVE the null hypothesis.
• The Significance Level is the Critical probability of
choosing between the null hypothesis and the
alternative hypothesis
1. Formulate H1andH0
• The selection of a proper Test depends on:
– Scale of the data
• nominal
• interval
– the statistic you seek to compare
• Proportions (percentages)
• means
– the sampling distribution of such statistic
• Normal Distribution
• T Distribution
∀χ2
Distribution
– Number of variables
• Univariate
• Bivariate
• Multivariate
– Type of question to be answered
2. Select Appropriate Test
Example
A tire manufacturer believes that men are more aware of their
brand than women. To find out, a survey is conducted of 100
customers, 65 of whom are men and 35 of whom are women.
The question they are asked is: Are you aware of our brand: Yes
or No. 50 of the men were aware and 15 were not, whereas 10 of
the women were aware and 25 were not.
Are these differences significant?
Aware 50 10 60
Unaware 15 25 40
65 35 100
Men Women Total
We want to know whether brand awareness is
associated with gender. What are the Hypotheses
1. Formulate H1andH0
H0:
H1:
There is no difference in brand awareness based on gender
There is a difference in brand awareness based on gender
Chi-square test results are unstable if cell count is lower than 5
• Used to discover whether 2 or more groups of one variable
(dependent variable) vary significantly from each other with
respect to some other variable (independent variable).
• Are the two variables of interest associated:
– Do men and women differ with respect to product usage
(heavy, medium, or light)
– Is the preference for a certain flavor (cherry or lemon) related
to the geographic region (north, south, east, west)?
H0: Two variables are independent (not associated)
H1: Two variables are not independent (associated)
• Must be nominal level, or, if interval or ratio must be divided into
categories
X2
(Chi Square)
2. Select Appropriate Test
Aware 50/39 10/21 60
Unaware 15/26 25/14 40
65 35 100
Men Women Total
Awareness of Tire Manufacturer’s Brand
Estimated cell
Frequency n
CR
E
ji
ij
=
Ri = total observed frequency in the ith
row
Cj = total observed frequency in the jth
column
n = sample size
Eij = estimated cell frequency
3. Choose Level of Significance
Whenever we draw inferences about a population, there is a risk
that an incorrect conclusion will be reached
The real question is how strong the evidence in favor of the
alternative hypothesis must be to reject the null hypothesis.
The significance level states the probability of rejecting H0 when
in fact it is true.
In this example an error would be committed if we said that there
is a difference between men and women with respect to brand
awareness when in fact there was no difference i.e. we have rejected
the null hypothesis when it is in fact true
This error is commonly known as Type I error, The value of α is
called the significance level of the test Type I error
• Significance Level selected is typically .05 or .01
• i.e 5% or 1%
•In other words we are willing to accept the risk
that 5% (or 1%) of the time the results we get
indicate that we should reject the null hypothesis
when it is in fact true.
• 5% (or 1%) of the time we are willing to commit
a Type 1 error
• stating there is a difference between men and
women with respect to brand awareness when in
fact there is no difference
3. Choose Level of Significance
• We commit Type error II when we
incorrectly accept a null hypothesis when it
is false. The probability of committing Type
error II is denoted by β.
• In our example we commit a type II error
when we say that.
there is NO difference between men and women
with respect to brand awareness (we accept the
null hypothesis) when in fact there is
Accept null Reject null
Null is true
Null is false
Correct-Correct-
no errorno error
Type IType I
errorerror
Type IIType II
errorerror
Correct-Correct-
no errorno error
Type I and Type II Errors
Which is worse?
• Both are serious, but traditionally Type I error has
been considered more serious, that’s why the
objective of hypothesis testing is to reject H0 only
when there is enough evidence that supports it.
• Therefore, we choose α to be as small as possible
without compromising β. (accepting when false)
• Increasing the sample size for a given α will
decrease β (I.e. accepting the null hypothesis when it is
in fact false)
Aware 50/39 10/21 60
Unaware 15/26 25/14 40
65 35 100
Men Women Total
Awareness of Tire Manufacturer’s Brand
Estimated cell
Frequency n
CR
E
ji
ij
=
Ri = total observed frequency in the ith
row
Cj = total observed frequency in the jth
column
n = sample size
Eij = estimated cell frequency
x² = chi-square statistics
Oi = observed frequency in the ith
cell
Ei = expected frequency on the ith
cell
∑
−
=
i
ii )²(
²
E
EO
x
n
CR
E
ji
ij =
Ri = total observed frequency in the ith
row
Cj = total observed frequency in the jth
column
n = sample size
Eij = estimated cell frequency
Estimated cell
Frequency
Chi-Square
statistic
Chi-Square Test
Degrees of
Freedom
d.f.=(R-1)(C-1)
While there will be n such squared deviations only (n - 1) of them
are free to assume any value whatsoever.
This is because the final squared deviation from the mean must
include the one value of X such that the sum of all the Xs divided by
n will equal the obtained mean of the sample.
All of the other (n - 1) squared deviations from the mean can,
theoretically, have any values whatsoever..
Degrees of Freedom
 the number of values in the final calculation of a statistic that
are free to vary
For example To calculate the standard deviation of a random
sample, we must first calculate the mean of that sample and
then compute the sum of the squared deviations from that mean
21
)2110(
39
)3950(
22
2 −
+
−
=X
14
)1425(
26
)2615(
22
−
+
−
+
161.22
643.8654.4762.5102.3
2
2
=
=+++=
χ
χ
1)12)(12(..
)1)(1(..
=−−=
−−=
fd
CRfd
4. Calculate the Test Statistic
Chi-Square Test: Differences Among Groups
Chi-square test results are unstable if cell count is lower than 5
5. Determine the Probability-
value (Critical Value)
•The p-value is the probability of seeing a random
sample at least as extreme as the sample observed given
that the null hypothesis is true.
• given the value of alpha, α we use statistical theory to
determine the rejection region.
• If the sample falls into this region we reject the null
hypothesis; otherwise, we accept it
• Sample evidence that falls into the rejection region is
called statistically significant at the alpha level.
A combination is the selection of a certain number of objects taken from
a group of objects without regard to order. We use the symbol (5
choose 3) to indicate that we have five objects taken three at a time,
without regard to order.
To calculate the possible number of combinations the formula used is
5x4x3x2x1 = 120 = 10
(3x2x1)x(2x1) = 12
If we choose a sample of 5 from a total of 20 there are 15, 504 possible
combinations.
If we took the means of some measurement for each of the possible
combinations those means would form a normal distribution.
COMBINATIONS
0 2.8162.023-2.023
α/2α/2
A critical value is the value that a test statistic must exceed in
order for the the null hypothesis to be rejected.
For example, the critical value of t (with 12 degrees of freedom
using the .05 significance level) is 2.18.
This means that for the probability value to be less than or equal
to .05, the absolute value of the t statistic must be 2.18 or greater.
Critical value
Test statistic
Significance level (.05)
critical value
Significance from p-values --
continued
• How small is a “small” p-value? This is largely a matter of
semantics but if the
– p-value is less than 0.01, it provides “convincing”
evidence that the alternative hypothesis is true;
– p-value is between 0.01 and 0.05, there is “strong”
evidence in favor of the alternative hypothesis;
– p-value is between 0.05 and 0.10, it is in a “gray area”;
– p-values greater than 0.10 are interpreted as weak or no
evidence in support of the alternative.
Chi-square Test for Independence
Under H0, the probability distribution is approximately
distributed by the Chi-square distribution (χ2
).
Chi-square α
χ2
Reject H0
3.84
22.16
χ2
with 1 d.f. at .05 critical value = 3.84
5. Determine the Probability-value (Critical Value)
6 a) Compare with the level of significance, α
b) Determine if the critical value falls in the rejection
region. (check tables)
22.16 is greater than 3.84 and falls in the rejection area
In fact it is significant at the .001 level, which means that the
chance that our variables are independent, and we just happened
to pick an outlying sample, is less than 1/1000
Or, in other words, the chance that we have a Type 1 error is less
than .1%
i.e. That there is a .1% chance that we reject the null
hypothesis when it is true -- that there is no difference
between men and women with respect to brand awareness,
and say that there is, when in fact the null hypothesis is true:
there is no difference.
7 Reject or do not reject H0
Since 22.16 is greater than 3.84 we reject the null
hypothesis
8 Draw a conclusion
Men and women differ with respect to brand
awareness, specifically, men are more brand
aware then women

More Related Content

PPT
Descriptive Statistics and Data Visualization
PPT
Testing Hypothesis
PPTX
data analysis techniques and statistical softwares
PPTX
Normal distribution
PPTX
Descriptive statistics
PPSX
Inferential statistics.ppt
PPTX
Descriptive statistics
PPT
Ppt for 1.1 introduction to statistical inference
Descriptive Statistics and Data Visualization
Testing Hypothesis
data analysis techniques and statistical softwares
Normal distribution
Descriptive statistics
Inferential statistics.ppt
Descriptive statistics
Ppt for 1.1 introduction to statistical inference

What's hot (20)

PDF
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
PPTX
STATISTIC ESTIMATION
PPTX
Business statistics
PDF
Statistical Distributions
PPTX
Inferential statistics
PPTX
Introduction to Descriptive Statistics
PDF
Excel and research
PPT
Bivariate analysis
PPTX
Estimation and hypothesis
PPTX
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
PPTX
Cluster analysis
PPTX
Estimation and confidence interval
PDF
Measures of central tendency
PPTX
Univariate & bivariate analysis
PPTX
Regression analysis
PPTX
Basics on statistical data analysis
PPTX
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
PPTX
Descriptive statistics
PPTX
Basic Statistics & Data Analysis
PPTX
Introduction to statistics
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
STATISTIC ESTIMATION
Business statistics
Statistical Distributions
Inferential statistics
Introduction to Descriptive Statistics
Excel and research
Bivariate analysis
Estimation and hypothesis
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Cluster analysis
Estimation and confidence interval
Measures of central tendency
Univariate & bivariate analysis
Regression analysis
Basics on statistical data analysis
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Descriptive statistics
Basic Statistics & Data Analysis
Introduction to statistics
Ad

Viewers also liked (20)

PPTX
Descriptive statistics
PPT
Day 3 descriptive statistics
PPT
Descriptive Statistics
PDF
02 descriptive statistics
PDF
Descriptive statistics
PPT
Basic Descriptive Statistics
PPTX
Data Analysis: Descriptive Statistics
PPTX
Common statistical tools used in research and their uses
PPT
Torturing numbers - Descriptive Statistics for Growers (2013)
PPTX
Das20502 chapter 1 descriptive statistics
PPTX
Descriptive Statistics
 
PPT
Descriptive statistics
PPTX
Statistics for the Health Scientist: Basic Statistics II
PPTX
1a difference between inferential and descriptive statistics (explanation)
PPT
Malimu sources of errors
PPT
Malimu research protocol
PPT
Malimu principles of outbreak investigation
PPT
Chapter 1 descriptive_stats_2_rev_2009
PPT
Malimu statistical significance testing.
PPTX
Maddaloni, daniela, descriptive statistics
Descriptive statistics
Day 3 descriptive statistics
Descriptive Statistics
02 descriptive statistics
Descriptive statistics
Basic Descriptive Statistics
Data Analysis: Descriptive Statistics
Common statistical tools used in research and their uses
Torturing numbers - Descriptive Statistics for Growers (2013)
Das20502 chapter 1 descriptive statistics
Descriptive Statistics
 
Descriptive statistics
Statistics for the Health Scientist: Basic Statistics II
1a difference between inferential and descriptive statistics (explanation)
Malimu sources of errors
Malimu research protocol
Malimu principles of outbreak investigation
Chapter 1 descriptive_stats_2_rev_2009
Malimu statistical significance testing.
Maddaloni, daniela, descriptive statistics
Ad

Similar to Malimu descriptive statistics. (20)

PPTX
2. chapter ii(analyz)
PDF
Res701 research methodology lecture 7 8-devaprakasam
PDF
Statistical Methods in Research
PPT
Business statistics
PPTX
Statr sessions 4 to 6
PPTX
3. Statistical Analysis.pptx
PDF
MEASURE-OF-VARIABILITY- for students. Ppt
PPTX
Standard deviation
 
PPTX
Statistics ppt.ppt
PPTX
Ch5-quantitative-data analysis.pptx
PPTX
ststs nw.pptx
PPT
Chapter34
PPT
Stats-Review-Maie-St-John-5-20-2009.ppt
PPTX
Central Tendency and Dispersion.pptx
PPT
4.15.04a.ppt
PPT
4.15.04a.ppt
PPT
Introduction to Statistics2312.ppt
PPT
Introduction to Statistics23122223.ppt
PPTX
INTRODUCTION OF STATISTICS FINAL YEAR VIII SEM
PPTX
Introduction to Educational statistics and measurement
2. chapter ii(analyz)
Res701 research methodology lecture 7 8-devaprakasam
Statistical Methods in Research
Business statistics
Statr sessions 4 to 6
3. Statistical Analysis.pptx
MEASURE-OF-VARIABILITY- for students. Ppt
Standard deviation
 
Statistics ppt.ppt
Ch5-quantitative-data analysis.pptx
ststs nw.pptx
Chapter34
Stats-Review-Maie-St-John-5-20-2009.ppt
Central Tendency and Dispersion.pptx
4.15.04a.ppt
4.15.04a.ppt
Introduction to Statistics2312.ppt
Introduction to Statistics23122223.ppt
INTRODUCTION OF STATISTICS FINAL YEAR VIII SEM
Introduction to Educational statistics and measurement

More from Miharbi Ignasm (20)

PPT
Malimu variance and standard deviation
PPT
Malimu primary health care.
PPT
Malimu organization of health services
PPT
Malimu nutritional disorders of public health importance
PPT
Malimu nutrition related non communicable diseases
PPT
Malimu non communicable disease
PPT
Malimu measures of disease frequency
PPT
Malimu investigation of an outbreak of communicable diseases pnco-2
PPT
Malimu introduction to study designs
PPT
Malimu intro to surveys
PPT
Malimu intro to epidemiology
PPT
Malimu intro to community health
PPT
Malimu environmental management and sanitation md3 17 4-07
PPT
Malimu demography
PPT
Malimu data collection methods
PPT
Malimu cross sectional studies.
PPT
Malimu cohort studies
PPTX
Malimu case control studies
PPT
nutrition &amp; infection
PPTX
Factors affecting community health
Malimu variance and standard deviation
Malimu primary health care.
Malimu organization of health services
Malimu nutritional disorders of public health importance
Malimu nutrition related non communicable diseases
Malimu non communicable disease
Malimu measures of disease frequency
Malimu investigation of an outbreak of communicable diseases pnco-2
Malimu introduction to study designs
Malimu intro to surveys
Malimu intro to epidemiology
Malimu intro to community health
Malimu environmental management and sanitation md3 17 4-07
Malimu demography
Malimu data collection methods
Malimu cross sectional studies.
Malimu cohort studies
Malimu case control studies
nutrition &amp; infection
Factors affecting community health

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
GDM (1) (1).pptx small presentation for students
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
master seminar digital applications in india
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Insiders guide to clinical Medicine.pdf
Microbial diseases, their pathogenesis and prophylaxis
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPH.pptx obstetrics and gynecology in nursing
GDM (1) (1).pptx small presentation for students
O7-L3 Supply Chain Operations - ICLT Program
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
102 student loan defaulters named and shamed – Is someone you know on the list?
O5-L3 Freight Transport Ops (International) V1.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
master seminar digital applications in india
Supply Chain Operations Speaking Notes -ICLT Program
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
human mycosis Human fungal infections are called human mycosis..pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Insiders guide to clinical Medicine.pdf

Malimu descriptive statistics.

  • 2. Five types of statistical analysis Descriptive Inferential Differences Associative Predictive What are the characteristics of the respondents? What are the characteristics of the population? Are two or more groups the same or different? Are two or more variables related in a systematic way? Can we predict one variable if we know one or more other variables?
  • 3.  Summarization of a collection of data in a clear and understandable way  the most basic form of statistics  lays the foundation for all statistical knowledge Descriptive Statistics Measures of central tendency • mean, median, mode Measures of dispersion • range, standard deviation, and coefficient of variation Measures of shape • skewness and kurtosis •If you use fewer statistics to describe the distribution of a variable, you lose information but gain clarity.
  • 4. Type of Measurement Nominal Two categories More than two categories Frequency table Proportion (percentage) Frequency table Category proportions (percentages) Mode Type of descriptive analysis
  • 5. Ratio means Type of Measurement Type of descriptive analysis Ordinal Rank order Median Interval Arithmetic mean
  • 6. Data Tabulation • Tabulation: The organized arrangement of data in a table format that is easy to read and understand. – A count of the number of responses to each question. • Simple Tabulation: tabulating of results of only one variable informs you how often each response was given. • Frequency Distribution: A distribution of data that summarizes the number of times a certain value of a variable occurs expressed in terms of percentages.
  • 7. The arrangement of statistical data in a row-and- column format that exhibits the count of responses or observations for each category assigned to a variable • How many of certain brand users can be called loyal? • What percentage of the market are heavy users and light users? • How many consumers are aware of a new product? • What brand is the “Top of Mind” of the market? Frequency Tables
  • 8. More on relative frequency distributions • Rules for relative frequency distributions: – Make sure each observation is in one and only one category. – Use categories of equal width. – Choose an appealing number of categories. – Provide labels – Double-check your graph. A histogram is a relative frequency distribution of a quantitative variable 643 Networking 213 print ad 179 Online recruitment site 112 Placement firm 18 Temporary agency How did you find your last job? 7006005004003002001000 Networking print ad Online recruitment site Placement firm Temporary agency 55.2 % 18.3 % 15.4 % 9.6 % 1.5 % A bar graph is a relative frequency distribution of a qualitative variable
  • 10. How many times per week do you use mouthwash ? 1__ 2__ 3__ 4__ 5__ 6__ 7__ 1 1 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 6 6 6 7 7 1 2 2 3 3 5 4 7 5 5 6 3 7 2 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7
  • 12. IQ The total area under the curve is equal to 1, i.e. It takes in all observations The area of a region under the normal distribution between any two values equals the probability of observing a value in that range when an observation is randomly selected from the distribution For example, on a single draw there is a 34% chance of selecting from the distribution a person with an IQ between 100 and 115
  • 13. Normal Distributions  Curve is basically bell shaped from - ∝ to ∝  symmetric with scores concentrated in the middle (i.e. on the mean) than in the tails. Mean, medium and mode coincide They differ in how spread out they are.  The area under each curve is 1. The height of a normal distribution can be specified mathematically in terms of two parameters: the mean (µ) and the standard deviation (σ).
  • 14. Occur when one tail of the distribution is longer than the other. Positive Skew Distributions  have a long tail in the positive direction.  sometimes called "skewed to the right"  more common than distributions with negative skews E.g. distribution of income. Most people make under $80,000 a year, but some make quite a bit more with a small number making many millions of dollars per year  The positive tail therefore extends out quite a long way Negative Skew Distributions have a long tail in the negative direction. called "skewed to the left." negative tail stops at zero E.g. GPA Skewed Distributions
  • 15. • Kurtosis: how peaked a distribution is. A zero indicates normal distribution, positive numbers indicate a peak, negative numbers indicate a flatter distribution) Peaked distribution Flat distribution Thanks, Scott!
  • 16. –central tendency –Dispersion or variability A quantitative measure of the degree to which scores in a distribution are spread out or are clustered together Summary statistics
  • 17. Measures of Central Tendency • Mode: the number that occurs most often in a string (nominal data) • Median: half of the responses fall above this point, half fall below this point (ordinal data) • Mean: the average (interval/ratio data)
  • 18. Mode  the most frequent category users 25% non-users 75% Advantages: • meaning is obvious • the only measure of central tendency that can be used with nominal data. Disadvantages • many distributions have more than one mode, i.e. are “multimodal” • greatly subject to sample fluctuations • therefore not recommended to be used as the only measure of central tendency.
  • 19. Median the middle observation of the data number times per week consumers use mouthwash 1 1 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 6 6 6 7 7 Frequency distribution of Mouthwash use per week Heavy userLight user Mode Median Mean
  • 20. The Mean (average value) sum of all the scores divided by the number of scores.  a good measure of central tendency for roughly symmetric distributions  can be misleading in skewed distributions since it can be greatly influenced by extreme scores in which case other statistics such as the median may be more informative  formula µ = ΣX/N (population) X = Σxi/n (sample) where µ an X are the population & sample means and N and n are the number of scores. ¯ ¯
  • 21. Normal Distributions with different Means 0- ∝ ∝µ1 µ2
  • 22. • Minimum, Maximum, and Range (Highest value minus the lowest value) • Variance • Standard Deviation (A measure’s distance from the mean) Measures of Dispersion or Variability
  • 23. Distribution of Final Course Grades in MGMT 3220Y 0 5 10 15 20 25 Grade Frequency Frequency 3 10 20 23 12 F D C B A RANGE - 1 SD + 1 SD
  • 24. Variance • The difference between an observed value and the mean is called the deviation from the mean • The variance is the mean squared deviation from the mean • i.e. you subtract each value from the mean, square each result and then take the average. • Because it is squared it can never be negative σ2 = Σ(x- xi)2 /n¯
  • 25. • The standard deviation is the square root of the variance • Thus the standard deviation is expressed in the same units as the variables • Helps us to understand how clustered or spread the distribution is around the mean value. Standard Deviation S = √ Σ(x- xi)2 /n¯
  • 26. Measures of Dispersion Suppose we are testing the new flavor of a fruit punch Dislike 1 2 3 4 5 Like Data 1. 3 2. 5 3. 3 4. 5 5. 3 6. 5 x x x x x x X= 4 σ2 = 1 S = 1 σ2 = Σ(x- xi)2 /n¯ S = √ Σ(x- xi)2 /n¯
  • 27. Measures of Dispersion Dislike 1 2 3 4 5 Like Data 1. 5 2. 4 3. 5 4. 5 5. 5 6. 4 x x x x x x X = 4.67 σ2 =0.22 S = 0.47 σ2 = Σ(x- xi)2 /n¯ S = √ Σ(x- xi)2 /n¯ ¯
  • 28. Measures of Dispersion Dislike 1 2 3 4 5 Like Data 1. 1 2. 5 3. 1 4. 5 5. 1 6. 5 x x x x x x X= 3 σ2 =4 S = 2 σ2 = Σ(x- xi)2 /n¯ S = √ Σ(x- xi)2 /n¯ ¯
  • 29. µ- ∝ ∝ σ1 σ2 σ3 Normal Distributions with different SD
  • 30. • A statistical technique that involves tabulating the results of two or more variables simultaneously • informs you how often each response was given • Shows relationships among and between variables • frequency distribution for each subgroup compared to the frequency distribution for the total sample • must be nominally scaled Cross Tabulation
  • 31. Cross-tabulation • Helps answer questions about whether two or more variables of interest are linked: – Is the type of mouthwash user (heavy or light) related to gender? – Is the preference for a certain flavor (cherry or lemon) related to the geographic region (north, south, east, west)? – Is income level associated with gender? • Cross-tabulation determines association not causality.
  • 32. • The variable being studied is called the dependent variable or response variable. • A variable that influences the dependent variable is called independent variable. Dependent and Independent Variables
  • 33. Cross-tabulation • Cross-tabulation of two or more variables is possible if the variables are discrete: – The frequency of one variable is subdivided by the other variable categories. • Generally a cross-tabulation table has: – Row percentages – Column percentages – Total percentages • Which one is better? DEPENDS on which variable is considered as independent.
  • 34. • A contingency table shows the conjoint distribution of two discrete variables • This distribution represents the probability of observing a case in each cell – Probability is calculated as: Contingency Table Observed cases Total cases P=
  • 35. 10 9 19 52.6% 47.4% 100.0% 55.6% 18.8% 28.8% 15.2% 13.6% 28.8% 5 25 30 16.7% 83.3% 100.0% 27.8% 52.1% 45.5% 7.6% 37.9% 45.5% 3 14 17 17.6% 82.4% 100.0% 16.7% 29.2% 25.8% 4.5% 21.2% 25.8% 18 48 66 27.3% 72.7% 100.0% 100.0% 100.0% 100.0% 27.3% 72.7% 100.0% Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total income <= 5 5>Income<= 10 income >10 GROUPINC Total Female Male Gender Total Cross tabulation GROUPINC * Gender Crosstabulation
  • 36. General Procedure for Hypothesis Test 1. Formulate H0 (null hypothesis) and H1 (alternative hypothesis) 2. Select appropriate test 3. Choose level of significance 4. Calculate the test statistic (SPSS) 5. Determine the probability associated with the statistic. • Determine the critical value of the test statistic.
  • 37. General Procedure for Hypothesis Test 6 a) Compare with the level of significance, α b) Determine if the critical value falls in the rejection region. (check tables) 7 Reject or do not reject H0 8 Draw a conclusion
  • 38. • The hypothesis the researcher wants to test is called the alternative hypothesis H1. • The opposite of the alternative hypothesis is the null hypothesis H0(the status quo)(no difference between the sample and the population, or between samples). • The objective is to DISPROVE the null hypothesis. • The Significance Level is the Critical probability of choosing between the null hypothesis and the alternative hypothesis 1. Formulate H1andH0
  • 39. • The selection of a proper Test depends on: – Scale of the data • nominal • interval – the statistic you seek to compare • Proportions (percentages) • means – the sampling distribution of such statistic • Normal Distribution • T Distribution ∀χ2 Distribution – Number of variables • Univariate • Bivariate • Multivariate – Type of question to be answered 2. Select Appropriate Test
  • 40. Example A tire manufacturer believes that men are more aware of their brand than women. To find out, a survey is conducted of 100 customers, 65 of whom are men and 35 of whom are women. The question they are asked is: Are you aware of our brand: Yes or No. 50 of the men were aware and 15 were not, whereas 10 of the women were aware and 25 were not. Are these differences significant? Aware 50 10 60 Unaware 15 25 40 65 35 100 Men Women Total
  • 41. We want to know whether brand awareness is associated with gender. What are the Hypotheses 1. Formulate H1andH0 H0: H1: There is no difference in brand awareness based on gender There is a difference in brand awareness based on gender Chi-square test results are unstable if cell count is lower than 5
  • 42. • Used to discover whether 2 or more groups of one variable (dependent variable) vary significantly from each other with respect to some other variable (independent variable). • Are the two variables of interest associated: – Do men and women differ with respect to product usage (heavy, medium, or light) – Is the preference for a certain flavor (cherry or lemon) related to the geographic region (north, south, east, west)? H0: Two variables are independent (not associated) H1: Two variables are not independent (associated) • Must be nominal level, or, if interval or ratio must be divided into categories X2 (Chi Square) 2. Select Appropriate Test
  • 43. Aware 50/39 10/21 60 Unaware 15/26 25/14 40 65 35 100 Men Women Total Awareness of Tire Manufacturer’s Brand Estimated cell Frequency n CR E ji ij = Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size Eij = estimated cell frequency
  • 44. 3. Choose Level of Significance Whenever we draw inferences about a population, there is a risk that an incorrect conclusion will be reached The real question is how strong the evidence in favor of the alternative hypothesis must be to reject the null hypothesis. The significance level states the probability of rejecting H0 when in fact it is true. In this example an error would be committed if we said that there is a difference between men and women with respect to brand awareness when in fact there was no difference i.e. we have rejected the null hypothesis when it is in fact true This error is commonly known as Type I error, The value of α is called the significance level of the test Type I error
  • 45. • Significance Level selected is typically .05 or .01 • i.e 5% or 1% •In other words we are willing to accept the risk that 5% (or 1%) of the time the results we get indicate that we should reject the null hypothesis when it is in fact true. • 5% (or 1%) of the time we are willing to commit a Type 1 error • stating there is a difference between men and women with respect to brand awareness when in fact there is no difference
  • 46. 3. Choose Level of Significance • We commit Type error II when we incorrectly accept a null hypothesis when it is false. The probability of committing Type error II is denoted by β. • In our example we commit a type II error when we say that. there is NO difference between men and women with respect to brand awareness (we accept the null hypothesis) when in fact there is
  • 47. Accept null Reject null Null is true Null is false Correct-Correct- no errorno error Type IType I errorerror Type IIType II errorerror Correct-Correct- no errorno error Type I and Type II Errors
  • 48. Which is worse? • Both are serious, but traditionally Type I error has been considered more serious, that’s why the objective of hypothesis testing is to reject H0 only when there is enough evidence that supports it. • Therefore, we choose α to be as small as possible without compromising β. (accepting when false) • Increasing the sample size for a given α will decrease β (I.e. accepting the null hypothesis when it is in fact false)
  • 49. Aware 50/39 10/21 60 Unaware 15/26 25/14 40 65 35 100 Men Women Total Awareness of Tire Manufacturer’s Brand Estimated cell Frequency n CR E ji ij = Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size Eij = estimated cell frequency
  • 50. x² = chi-square statistics Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell ∑ − = i ii )²( ² E EO x n CR E ji ij = Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size Eij = estimated cell frequency Estimated cell Frequency Chi-Square statistic Chi-Square Test Degrees of Freedom d.f.=(R-1)(C-1)
  • 51. While there will be n such squared deviations only (n - 1) of them are free to assume any value whatsoever. This is because the final squared deviation from the mean must include the one value of X such that the sum of all the Xs divided by n will equal the obtained mean of the sample. All of the other (n - 1) squared deviations from the mean can, theoretically, have any values whatsoever.. Degrees of Freedom  the number of values in the final calculation of a statistic that are free to vary For example To calculate the standard deviation of a random sample, we must first calculate the mean of that sample and then compute the sum of the squared deviations from that mean
  • 52. 21 )2110( 39 )3950( 22 2 − + − =X 14 )1425( 26 )2615( 22 − + − + 161.22 643.8654.4762.5102.3 2 2 = =+++= χ χ 1)12)(12(.. )1)(1(.. =−−= −−= fd CRfd 4. Calculate the Test Statistic Chi-Square Test: Differences Among Groups Chi-square test results are unstable if cell count is lower than 5
  • 53. 5. Determine the Probability- value (Critical Value) •The p-value is the probability of seeing a random sample at least as extreme as the sample observed given that the null hypothesis is true. • given the value of alpha, α we use statistical theory to determine the rejection region. • If the sample falls into this region we reject the null hypothesis; otherwise, we accept it • Sample evidence that falls into the rejection region is called statistically significant at the alpha level.
  • 54. A combination is the selection of a certain number of objects taken from a group of objects without regard to order. We use the symbol (5 choose 3) to indicate that we have five objects taken three at a time, without regard to order. To calculate the possible number of combinations the formula used is 5x4x3x2x1 = 120 = 10 (3x2x1)x(2x1) = 12 If we choose a sample of 5 from a total of 20 there are 15, 504 possible combinations. If we took the means of some measurement for each of the possible combinations those means would form a normal distribution. COMBINATIONS
  • 55. 0 2.8162.023-2.023 α/2α/2 A critical value is the value that a test statistic must exceed in order for the the null hypothesis to be rejected. For example, the critical value of t (with 12 degrees of freedom using the .05 significance level) is 2.18. This means that for the probability value to be less than or equal to .05, the absolute value of the t statistic must be 2.18 or greater. Critical value Test statistic Significance level (.05) critical value
  • 56. Significance from p-values -- continued • How small is a “small” p-value? This is largely a matter of semantics but if the – p-value is less than 0.01, it provides “convincing” evidence that the alternative hypothesis is true; – p-value is between 0.01 and 0.05, there is “strong” evidence in favor of the alternative hypothesis; – p-value is between 0.05 and 0.10, it is in a “gray area”; – p-values greater than 0.10 are interpreted as weak or no evidence in support of the alternative.
  • 57. Chi-square Test for Independence Under H0, the probability distribution is approximately distributed by the Chi-square distribution (χ2 ). Chi-square α χ2 Reject H0 3.84 22.16 χ2 with 1 d.f. at .05 critical value = 3.84 5. Determine the Probability-value (Critical Value)
  • 58. 6 a) Compare with the level of significance, α b) Determine if the critical value falls in the rejection region. (check tables) 22.16 is greater than 3.84 and falls in the rejection area In fact it is significant at the .001 level, which means that the chance that our variables are independent, and we just happened to pick an outlying sample, is less than 1/1000 Or, in other words, the chance that we have a Type 1 error is less than .1% i.e. That there is a .1% chance that we reject the null hypothesis when it is true -- that there is no difference between men and women with respect to brand awareness, and say that there is, when in fact the null hypothesis is true: there is no difference.
  • 59. 7 Reject or do not reject H0 Since 22.16 is greater than 3.84 we reject the null hypothesis 8 Draw a conclusion Men and women differ with respect to brand awareness, specifically, men are more brand aware then women

Editor's Notes