Survey data & sampling

How can we Define “Data”…..???
Terminologies
Types of Data
Data Collection…???
How to Analyze & Represent Data….???
What is Sample & Sampling…???
Terminologies in Sampling
Types of Sampling
How to Calculate Sample Size…..???

The word data is the plural of datum, which
literally means "to give“ or "something given".
“Data is a collection of facts, such as values
or measurements.”
“Data are measurements or observations
that are collected as a source of
information.”
It can be numbers, words, measurements,
observations or even just descriptions of
things.

Data Unit
A data unit is one entity (such as a
person or business) in the population being
studied, about which data are collected. A
data unit is also referred to as a unit record or
record.
Data Item
A data item is a characteristic of a
data unit which is measured or counted,
such as height, country of birth, or income.
A data item is also referred to as a variable
because the characteristic may vary
between data units, and may vary over time.

Observation
An observation is an occurrence of a
specific data item that is recorded about
a data unit. It may also be referred to as
datum, which is the singular form of data.
An observation may be numeric or non-
numeric.
Dataset
A dataset is a complete collection of
all observations.

There are main two types of data with
respect to its characteristics:
Qualitative Data
Quantitative Data

 “Data that is not given numerically.”
 It deals with description.
 It can be observed but not measured.
 Qualitative → Quality
Example: Favorite Color, Place of Birth,
Favorite Food, Type of Car

 It is given in numerical form.
It deals with numbers.
 It can be measured.
 Quantitative → Quantity
Example: Length, Height, Area, Volume,
Weight, Speed, Time, Temperature,
Humidity, Sound Levels, Cost, Ages, etc.

Quantitative data can be divided into:
Discrete Data
Continuous Data
 Discrete data is counted, Continuous
data is measured

Discrete Data
Discrete data can only take certain
values (like whole numbers).
Example: The number of students in a class
(you can't have half a student).
Continuous Data
Continuous Data is data that can take
any value (within a range).
Example: A person's height: could be any
value (within the range of human heights),
not just certain fixed heights,

Univariate Data
It means "one variable" (one type of data).
Example: Travel Time (minutes): 15, 29, 8,
42, 35, 21, 18, 42, 26
The variable is Travel Time.

Bivariate or Multivariate Data
It means "two or more than two variables“.
With bivariate or multivariate data you have
two or more than two sets of related data that
you want to compare.
Example:
The two variables are
Ice Cream Sales and
Temperature.
Univariate Data Bivariate or Multivariate Data
 Involving a single variable  Involving two or more variables
 Does not deal with causes or
relationships
 Deals with causes or relationships

There are main two types of data with
respect to data collection techniques
Primary Data
Secondary Data

Primary data means original data that
has been collected specially for the
purpose in mind. It means someone
collected the data from the original source
first hand. Data collected this way is called
Primary Data.
Example: Questionnaire, Surveys,
Experiments, Interviews.

Secondary data is data that has been
collected for another purpose. When we use
Statistical Method with Primary Data from
another purpose for our purpose we refer to
it as Secondary Data.
Example: Books, Journals, Magazines,
Newspapers, E-journals, General Websites,
Web-blogs.

Data
Primary
Data
Quantitative
Data
Univariate
Data
Bivariate
Data
Qualitative
Data
Univariate
Data
Bivariate
Data
Secondary
Data
Quantitative
Data
Univariate
Data
Bivariate
Data
Qualitative
Data
Univariate
Data
Bivariate
Data

“Data Collection is a process of obtaining
useful information for a defined purpose
from various sources.”
The issue is not: How do we collect data?
It issue is: How do we collect useful data?

The purpose of data collection is:
 To obtain information to keep on record
 To make decisions about important issues
 To pass information on to others

“A document that defines all the details
concerning data collection, including how
much and what type of data is required and
when and how it should be collected.”
Why do we want the data?
What purpose will they serve?
Where will we collect the data?
What type of data will we collect?
Who will collect the data?
How do we collect the right data?

Tools used to collect data are
 Mail
 Telephone
 In-person and Web-based Surveys
 Direct or Participatory Observation
 Interviews
 Focus Groups
 Expert Opinion
 Case Studies
 Literature Search
 Content Analysis of Internal and External Records
The data collection tools must be strong
enough to support the findings of the evaluation.

“Analysis of data is a process of
inspecting, cleaning, transforming, and
modeling data with the goal of
highlighting useful information,
suggesting conclusions, and supporting
decision making.”

Bar Graphs
Pie Charts
Line Graphs
Scatter (x,y) Plots
Pictographs
Histograms
Frequency
Distribution
Stem and Leaf Plots
Cumulative Tables and
Graphs
Relative Frequency
Check Sheet

A Bar Graph (also called Bar Chart) is a
graphical display of data using bars of
different heights.

A Histogram is a graphical display of data
using bars of different heights.
It is similar to a Bar
Chart, but a histogram
groups numbers into
ranges.

A special chart that uses "pie slices" to
show relative sizes of data.

A graph that shows information that is
connected in some way (such as change
over time)

A graph of plotted points that show the
relationship between two sets of data.

A Pictograph is a way of showing data
using images.

Frequency:
Frequency is how often something
occurs.
By counting frequencies we can make
a Frequency Distribution table.
Example: Sam's team has
scored the following
numbers of goals in recent
football games:

A special table where each data value is
split into a "leaf" (usually the last digit)
and a "stem" (the other digits).
Like in this example:

Suppose you have the following list of values: 12, 13,
21, 27, 33, 34, 35, 37, 40, 40, 41. You could make a
frequency distribution table showing how many tens,
twenties, thirties, and forties you have:
Frequency
Class
Frequency
10 - 19 2
20 - 29 2
30 - 39 4
40 - 49 3

Cumulative means "how much so far". To
have cumulative totals, just add up the
values as you go.
Example: Jamie has earned
this much in the last 6
months:

“How often something happens divided
by all outcomes.”

“A generic tool that can be adapted for
a wide variety of purposes, the check
sheet is a structured, prepared form for
collecting and analyzing data.”

Census
A Census is when we collect data for every
member of the group (the whole "population").
Sample
“A Sample is when we collect data just for
selected members of the group.”
Example: There are 120 people in your local
football club.
We can ask everyone (all 120) what their age
is. That is a census.
Or you could just choose the people that are
there this afternoon. That is a sample.
Sample

Sampling is the process of selecting
units from population of interest so that
by studying the sample we may fairly
generalize our results back to the
population from which they were chosen.

Sampling reduce expenses and time by
allowing researchers to estimate information
about a whole population without having to survey
each member of the population.
Sampling is like taking out and testing a few grains
of rice from the cooking vessel to know if the dish
is done or not.

Sampling Universe
Population from which we are sampling.
Sampling Unit
The unit selected during the process of
sampling.
Example: If we select households from a list of all
units in the population, the sampling unit is in this
case the household.

Basic Sampling Unit or Elementary Unit
The sampling unit selected at the last
stage of sampling.
In a multi-stage survey if we first select
villages and then select household within those
selected villages, the basic sampling unit would
be the household.
Respondent
Person who’s responding to our
questionnaires on the field.

Survey Subject
Entity or person from whom we are
collecting data.
Sampling Frame
Description of the sampling universe,
usually in the form of the list of sampling
units.
Example: Villages, Households or Individuals.

There are main two types of Sampling Technique:
Probability Sampling
Non-Probability Sampling

A probability sampling is one in which
every unit in the population has a chance
(greater than zero) of being selected in
the sample.
Probability Sampling can be further sub-
classified into:
Stratified Sampling
 Simple Random Sampling
 Systematic Sampling
Cluster Sampling

Simple Random Sampling (SRS)
In a simple random sampling (SRS) of a
given size, all such subsets of the frame are
given an equal probability. Each element of
the frame thus has an equal probability of
selection: the frame is not subdivided or
partitioned.
Simple random sampling is always an EPS
design (equal probability of selection), but not all
EPS designs are simple random sampling.

SRS may also be cumbersome and tedious when
sampling from an unusually large target
population.
Example: N college students want to get a ticket for
a basketball game, but there are not enough tickets
(X) for them, so they decide to have a fair way to
see who gets to go.
Then, everybody is given a number (1 to N), and
random numbers are generated, either
electronically or from a table of random numbers.

Systematic Sampling
A method of selecting sample members
from a larger population according to a
random starting point and a fixed, periodic
interval called the sampling interval.
The sampling interval (sometimes known as
the skip) is calculated as:
where n is the sample size, and N is the
population size.

Example: Suppose you want to sample 8 houses
from a street of 120 houses.
Skip = k = 120/8 =15
So, every 15th house is chosen after a random
starting point between 1 and 15.
If the random starting point is 11, then the
houses selected are 11, 26, 41, 56, 71, 86, 101, and
116.

Stratified Sampling
Where the population embraces a
number of distinct categories, the frame can
be organized by these categories into
separate "strata." Each stratum is then
sampled as an independent sub-population,
out of which individual elements can be
randomly selected.

Example: Suppose that in a company there are
the following staff: Total: 180
Male (Full-time): 90 Male (Part-time): 18
Female (Full-time): 9 Female (Part-time): 63
we are asked to take a sample of 40 staff,
stratified according to the above categories.
Male (Full-time) = 90 x (40 / 180) = 20
Male (Part-time) = 18 x (40 / 180) = 4
Female (Full-time) = 9 x (40 / 180) = 2
Female (Part-time) = 63 x (40 / 180) = 14

Cluster Sampling
Cluster sampling is exactly what its title
implies. You randomly select clusters or
groups in a population instead of
individuals.
The objective of this method is to choose a
limited number of smaller geographic areas in
which simple or systematic random sampling
can be conducted.

It’s completed in 2 stages:
1st Stage: Random Selection of Clusters: The
entire population of interest is divided into
small distinct geographic areas, such as
villages, camps, etc. We then need to find an
approximate size of the population for each
“village”.
2nd Stage = Random Selection of Households
within Clusters: Households are chosen
randomly within each cluster using simple
or systematic random sampling.

Advantages Disadvantages
Simple
Random
Sampling
(SRS)
 Estimates are easy to calculate.
Simple random sampling is always an
EPS design, but not all EPS designs are
simple random sampling.
If sampling frame large, this method
impracticable.
Minority subgroups of interest in
population may not be present in sample
in sufficient numbers for study.
Systematic
Sampling
 Sample easy to select
Suitable sampling frame can be
identified easily
Sample evenly spread over entire
reference population
Sample may be biased if hidden
periodicity in population coincides with
that of selection.
Difficult to assess precision of estimate
from one survey.
Stratified
Sampling
Low Cost
Greater accuracy
Better coverage
Sampling frame of entire population has
to be prepared separately for each stratum
When examining multiple criteria,
stratifying variables may be related to
some, but not to others, further
complicating the design, and potentially
reducing the utility of the strata.
In some cases. stratified sampling can
potentially require a larger sample than
would other methods
Cluster
Sampling
Cuts down on the cost of preparing a
sampling frame.
This can reduce travel and other
administrative costs.
sampling error is higher for a simple
random sample of same size.
Often used to evaluate vaccination
coverage in EPI

Non-probability sampling is any
sampling method where some elements
of the population have no chance of
selection or where the probability of
selection can't be accurately determined.
Probability Sampling can be further sub-
classified into:
Quota Sampling
Accidental Sampling

Quota Sampling
In quota sampling, the population is first
segmented into mutually exclusive sub-
groups, just as in stratified sampling. Then
judgment is used to select the subjects or
units from each segment based on a
specified proportion.
Example: An interviewer may be told to
sample 200 females and 300 males between
the age of 45 and 60.

In quota sampling the selection of the
sample is non-random.
Interviewers might be tempted to
interview those who look most helpful.
The problem is that these samples may
be biased because not everyone gets a
chance of selection.

Accidental Sampling
Accidental sampling (sometimes known
as Grab, Convenience or Opportunity
sampling) is a type of non-probability
sampling which involves the sample being
drawn from that part of the population
which is close to hand.

Example: If the interviewer were to
conduct such a survey at a shopping center
early in the morning on a given day, the
people that he/she could interview would
be limited to those given there at that given
time, which would not represent the views
of other members of society in such an area.
If the survey were to be conducted at
different times of day and several times per
week. This type of sampling is most useful
for pilot testing.

Sample size depends upon :
Population size
Confidence Interval
Confidence Level
By increasing sample size, accuracy
increases and margin of error decreases

Confidence Level
The confidence level tells you how
sure you can be.
It is expressed as a percentage and
represents how often the true percentage of
the population who would pick an answer
lies within the confidence interval.
The 95% confidence level means you can
be 95% certain; the 99% confidence level
means you can be 99% certain. Most
researchers use the 95% confidence level.

Confidence Interval
It expresses the degree of uncertainty
associated with a sample statistic. A
confidence interval is an interval estimate
combined with a probability statement.
Interval Estimate
An interval estimate is defined by
two numbers, between which a
population parameter is said to lie.
For example, a < μ < b is an interval
estimate for the population mean μ. It
indicates that the population mean is greater
than a but less than b.

“What is data..??” available from:
http://www.mathsisfun.com/data/data.html (20 March 2013)
“Sampling” available from:
http://en.wikipedia.org/wiki/Sampling_statistics (21 March
2013)
“Qualitative data analysis ” available from:
http://www.learnhigher.ac.uk/analysethis/main/qualitative.ht
ml (14 March 2013)
“Calculating the Sample Size ” available from:
http://www.ifad.org/gender/tools/hfs/anthropometry/ant_3.ht
m (21 March 2013)
“Sampling Strategies” available from: http://www.dissertation-
statistics.com/sampling-strategies.html (21 March 2013)
“Univariate vs Bivariate Data” available from:
http://regentsprep.org/REgents/math/ALGEBRA/AD1/unidat.
htm (21 March 2013)
3/18/2015

Survey data & sampling

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Survey data & sampling (20)

Recently uploaded (20)

Survey data & sampling

Editor's Notes