SlideShare a Scribd company logo
Data Presentation
Main purpose of data presentation or
displaying data
 To make the findings easy and clear to understand
 To provide comprehensive information in a succinct and
efficient way
Four ways of displaying data
 Text
 Tables
 Graph
 Statistical measures ( see in descriptive statistics
lecture)
Text
 The most common method of communication in both quantitative
and qualitative research studies
 Writing should be thematic; i.e written around various themes of
report
 Text should place the most important and significant findings in the
context of short- and longer-term trends.
 It should explore relationships, causes and effects, to the extent that
they can be supported by evidence.
 It should show readers the significance of the most current
information.
Good example of text
Net profits of non-financial companies in the Netherlands
amounted to 19 billion euros in the second quarter of 2008. This
is the lowest level for three years. Profits were 11 percent lower
than in the second quarter of 2007. The drop in net profits is the
result of two main factors: higher interest costs - the companies
paid more net interest - and lower profits of foreign subsidiaries.
Source: Statistics Netherlands
 Findings should be integrated into the literature citing references
using acceptable system of citation
Tables
 A table is the simplest means of summarizing a set of
observations .
 Tables are more informative when they are not overly
complex.
 As a general rule, tables and columns within them should
always clearly labeled.
 If units of measurement are involved, they should be
specified.
 Uses: It can be used for all types of numerical data.
Table
Structure of tables
1. Title: indicates the table number and describes the type of
data the table contains
2. Row stub: the subcategories of a variable, listed along the y-
axis
3. Column headings: the subcategories of a variable, listed along
the x-axis
4. Body: the cells housing the analysed data
* Supplementary notes or footnotes
Structure of tables
Table (x) Attitudes towards uranium by age
Attitudes towards
uranium mining
Age of respondent
Total
<25 25-34 35-44 45-54 55+
Strongly favourable cell
Favourable
Uncertain
Unfavaourable
Strongly unfavourable
Total
Title
Column headings
Row stub
Body
Source: …………………………Hypothetical data
Supplementary notes
subcategory
 should give a clear and accurate description
of the data.
 should answer the three questions “what”,
“where” and “when”.
 Be short and concise, and avoid using verbs.
Tables
Types of tables
1. Univariate (also known as frequency tables) –
containing information about one variable
2. Bivariate (also known as cross-tabulations) –
containing information about two variables
3. Polyvariate or multivariate – containing information
about more than two variables
Tables
Frequency distributions (Absolute frequency)
 One type of data that is commonly used to evaluate data
 For nominal and ordinal data, a frequency distribution
consists of a set of class or categories
Kaposi ‘s
sarcoma
Number of
individuals
Yes 246
No 2314
Table (1) Cases of Kaposi’s sarcoma for the first 2560 AIDS patients
reported to the Centers for Disease Control in Atlanta, Georgia
Tables (Frequency table/Univariate)
Table (2) Cigarette consumption per person aged 18 or older, United
states, 1900-1960
Year
Number of
Cigarettes
1990 54
1910 151
1920 665
1930 1485
1940 1976
1950 3522
1960 4171
Tables
Frequency distributions
 For discrete or continuous data, the range of values of the
observations must be broken down into a series of distinct,
non overlapping intervals with equal width intervals
Tables (Frequency table/Univariate)
Table (3) Absolute frequencies of serum cholesterol levels for 1067 US
males age 25 to 34 years, 1976-1980
Cholesterol Level
(mg/100ml)
Number of
Men
80-119 13
120-159 150
160-199 442
200-239 299
240-279 115
280-319 34
320-359 9
360-399 5
Total 1067
Tables
Relative Frequency
 The proportion of the total number of observations that
appears in that interval
 Computed by dividing the number of values within an interval
by the total number of values in the table
 Relative frequencies are useful for comparing sets of data
that contain unequal number of observations
Tables (Bivariate/cross-tabulations)
Table (4) Absolute and relative frequencies of serum cholesterol levels
for 2294 US males, 1976-1980
Cholesterol
Level
(mg/100ml)
Age 25-34 Age 55-64
Number of
Men
Relative
frequency (%)
Number of
Men
Relative
frequency (%)
80-119 13 1.2 5 0.4
120-159 150 14.1 48 3.9
160-199 442 41.4 265 21.6
200-239 299 28.0 458 37.3
240-279 115 10.8 281 22.9
280-319 34 3.2 128 10.4
320-359 9 0.8 35 2.9
360-399 5 0.5 7 0.6
Total 1067 100.0 1227 100.0
Tables
Cumulative relative Frequency
 The percentage of the total number of observations that
have a value less than or equal to the upper limit of the
interval
 Calculated by summing the relative frequencies for the
specified interval and all the previous ones
Tables (Bivariate/cross-tabulations)
Table (5) Relative and cumulative frequencies of serum cholesterol
levels for 2294 US males, 1976-1980
Cholesterol
Level
(mg/100ml)
Age 25-34 Age 55-64
Relative
frequency (%)
Cumulative
relative
frequency (%)
Relative
frequency (%)
Cumulative
relative
frequency (%)
80-119 1.2 1.2 0.4 0.4
120-159 14.1 15.3 3.9 4.3
160-199 41.4 56.7 21.6 25.9
200-239 28.0 84.7 37.3 63.2
240-279 10.8 95.5 22.9 86.1
280-319 3.2 98.7 10.4 96.5
320-359 0.8 99.5 2.9 99.4
360-399 0.5 100.0 0.6 100.0
Total 100.0 100.0
Tables
Types of percentages
 The use of percentages is a common procedure in the
interpretation of data
 Three types of percentage
 Row percentage
 Column percentage
 Total percentage
Use of rounding and decimals: Numeric values should be right
justified
What do you find out in this table?
Bad example
What do you find out in this table?
Bad example
Good example
GRAPHS
Graphs
 Graph or pictorial representation of numerical data → to
summarize and display data
 Should be designed to convey the general patterns in a set of
observations at a single glance
 Most informative graphs are relatively simple and self-
explanatory.
 They should be clearly labeled and units of measurement
should be indicated.
Bar Charts
 A popular type of graph used to display a frequency
distribution for nominal or ordinal data.
 Horizontal axis: various categories into which the
observations fall
 Vertical axis (height of bar): the frequency or the relative
frequency of observations within the class
 Uses: It is used to compare frequencies or values for different
categories or groups.
Bar Charts
 The bars can be either vertically or horizontally oriented.
 In the horizontal orientation, the text is easier to read.
 It is also easier to compare the different values when the bars
are ordered by size from smallest to largest, rather than
displayed arbitrarily.
 The bars should be much wider than the gaps between them.
 The gaps should not exceed 40% of the bar width.
Simple Bar chart (vertical)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990
Numberofcigarettes
Year
Figure (1) Cigarette consumption per person 18 years of age or older,
United states, 1900-1990
Simple Bar chart (horizontal)
Clustered Bar chart
Source: Fulfilling the Health Agenda for Women and Children: The 2014 Report
Clustered Bar chart
Figure 2. Use of yoga among adults in the past 12 months, by age group: United States,
2002, 2007, and 2012
Stacked bar
 Uses: A stacked bar chart can be used to show and compare
segments of totals.
 Caution should be exercised when using this type of chart.
 It can be difficult to analyze and compare, if there are too
many items in each stack or if many items are fairly close in
size.
Stacked bar
Stacked bar
Pie chart
 A pie chart can be used to show the percentage distribution
of one variable, but only a small number of categories can be
displayed, usually not more than six.
 There are 360 degrees in a circle and so the full circle can be
used to represent 100% or the total population.
 The circle or pie is divided into sections in accordance with
the magnitude of each category
 Each slice is proportionate to the size of each subcategory of a
frequency distribution
 Uses: Pies can be drawn for both qualitative data and
variables measured on a continuous scale but grouped into
categories
Pie chart
53.0%47.0%
Male Female
Figure (1) Gender of study population
Pie chart
69.80%
14.63%
3.86%
1.65%
3.14%
2.23%
4.69%
Hospitals
Ambulatory health care
Retail sale and medical goods
Figure (2) Government health expenditure by providers (2010-2011)
Pie chart
Histogram
 A special type of bar graph showing frequency distribution for
continuous data
 When we construct a histogram the values of the variable
under consideration are represented by the horizontal axis,
while the vertical axis has as its scale the frequency (or
relative frequency if desired) of occurrence.
 True class limit is used for continuity of values or observations
Histogram
Figure(1) Histogram for leadership aptitude scores for n= 30 football coaches.
Frequency polygon
 Frequency distribution can be portrayed graphically by means
of a frequency polygon, which is a special kind of line graph.
 To draw a frequency polygon we first place a dot above the
midpoint of each class interval represented on the horizontal
axis of a graph.
 The height of a given dot above the horizontal axis corresponds
to the frequency of the relevant class interval.
 Connecting the dots by straight lines produces the frequency
polygon.
Frequency polygon
 Note that the polygon is brought down to the horizontal axis
at the ends at points that would be the midpoints if there
were an additional cell at each end of the corresponding
histogram.
 This allows for the total area to be enclosed.
 The total area under the frequency polygon= the area under
the histogram.
Frequency polygon
Figure(3) Histogram and frequency
polygon of ages of 189 subjects
Figure (2) Frequency polygon of ages
of 189 students
Ogive curve
The graph of a
cumulative
probability
distribution is called
‘Ogive’.
Figure (3) Cumulative percentage frequency polygon for leadership aptitude
scores for n= 30 football coaches
Stem-and-leaf
 A properly constructed stem-and-leaf display, like a histogram, provides
information regarding the range of the data set, shows the location of the
highest concentration of measurements, and reveals the presence or
absence of symmetry.
 An advantage of the stem-and-leaf display over the histogram is the fact
that it preserves the information contained in the individual
measurements.
 Such information is lost when measurements are assigned to the class
intervals of a histogram.
 Another advantage of stem-and-leaf displays is that they can be
constructed during the tallying process, so the intermediate step of
preparing an ordered array is eliminated.
Construction of Stem-and-leaf
 The first part is called the stem, and the second part is called the leaf.
 The stem consists of one or more of the initial digits of the measurement,
and the leaf is composed of one or more of the remaining digits.
 All partitioned numbers are shown together in a single display; the stems
form an ordered column with the smallest stem at the top and the largest
at the bottom.
 In the stem column all stems within the range of the data even when a
measurement with that stem is not in the data set.
 The rows of the display contain the leaves, ordered and listed to the right
of their respective stems.
 When leaves consist of more than one digit, all digits after the first may be
deleted. Decimals when present in the original data are omitted in the
stem-and-leaf display.
 The stems are separated from their leaves by a vertical line.
 Thus, a stem-and-leaf display is also an ordered array of the data.
Stem-and-leaf
 Stem-and-leaf displays are most effective with relatively
small data sets.
 As a rule, they are not suitable for use in annual reports or
other communications aimed at the general public.
 They are primarily of value in helping researchers and
decision makers understand the nature of their data.
 Histograms are more appropriate for externally circulated
publications
Stem and leaf display
Box-and-Whisker Plots
 Useful ways to display data (Exploratory data analysis)
 At the centre of the plot is the median, which is surrounded by a
box the top and bottom of which are the limits within which the
middle 50% of observations fall.
 Sticking out of the top and bottom of the box are two whiskers
which extend to the most and least extreme scores respectively.
 The horizontal lines are called fences. The upper fence is at ( Q3 +
1.5(IQR)) or the largest X , whichever is lower.
 The lower fence is at (Q1 - 1.5(IQR)) or the smallest X , whichever is
higher.
 Values that are outside the fences are considered possible extreme
values, or outliers.
Box-and-Whisker Plots
 In fairly symmetric data sets, the adjacent values should
contain approximately 99% of the measurements.
 All points outside this range are represented by circles: these
observations are considered to be outliers or data points that
are not typical of the rest of values
Box-and-Whisker Plots
Figure () Boxplot of hygiene scores on day 1 of the Download Festival split by gender
Box-and-Whisker Plots
Figure () Boxplot of hygiene scores on day 1 of the Download Festival split by gender
Box-and-Whisker Plots
One way scatter plot
 Uses: One-way scatter plots are the simplest type of graph
that can be used to summarize a set of continuous
observations.
 A one-way scatter plot uses a single horizontal axis to display
the relative position of each data point.
 An advantage of a one-way scatter plot is that since each
observation is represented individually, no information is lost
 A disadvantage is that it may be difficult to read (and to
construct) if values are close to each other.
Figure 2.1 Crude death rates for the United States, 1988.
Scatter plot (Two way)
 Both the variables must be measured either on interval or
ratio scales
 The data on both the variables needs to be available in
absolute values for each observation
 Data for both variables is taken pairs and displayed as dots in
relation to their values on both axes
Scatter plot (Two way)
Scatter plot (Two way)
Figure (3) Scatter diagram reveals pattern of strong positive correlation
Line diagram or trend curve
 Most appropriate type of chart for time series
 A trend line can be drawn for data pertaining to both a specific
time (e.g. 1995,1996, 1997) or a period (e.g. 1985-1989, 1990-
1994, 1995-)
 A line diagram is useful way of conveying the changes when long-
term trends in a phenomenon or situation need to be studied
 For example, a line diagram would be useful for illustrating trends
in births or death rates and changes in population size
Line diagram or trend curve
 Uses: A set of data measured on a continuous interval or a ratio
scale can be displayed using a line diagram or trend curve
Area Chart
 For variables measured on an interval or a ratio scale,
information about the subcategories of a variable can be
presented in the form of an area chart.
 This is plotted in the same way as a line diagram but with the
area under each line shaded to highlight the total magnitude
of the subcategory in relation to other subcategories.
Area Chart
0
5
10
15
20
25
30
35
40
45
< 25 25-34 35-44 45-54 55+
Numberofrespondents
Age group
Female Male
Figure (1) Attitudes towards uranium mining
Exploratory data analysis (EDA)
 Exploratory Data Analysis (EDA) was heavily promoted by John
Tukey, whose book on the topic is widely regarded as a
statistical classic.
 Exploring data, by summarising and plotting variables and the
relationships between them, is an important step in
subsequent modelling and analysis.
 By exploring data, this procedure will gain insight the nature of
data set and look for the errors and anomalies.
Exploratory data analysis (EDA)
An approach to data analysis that emphasizes the use of informal
graphical procedures not based on prior assumptions about the
structure of the data or on formal models for the data.
The Cambridge Dictionary of Statistics, 4th edition
 The essence of this approach is that, broadly speaking, data are assumed
to possess the following structure
Data = Smooth + Rough
where the ‘Smooth’ is the underlying regularity or pattern in the data.
 The objective of the exploratory approach is to separate the smooth from
the ‘Rough’ with minimal use of formal mathematics or statistical
methods.
Exploratory data analysis (EDA)
 Two forms of EDA : Numerical summaries and plots
 Numerical summaries: Measures of central tendency, Measures
of spread, Measures of correlation, confidence intervals
 Plots: Histogram, Stem and leaf display, Box plot, scatter plot,
Bar plot,……..
Maps
 A graph used to plot variables by geographic locations
 Geographic information is an integral part of all statistical
data.
 Geographic areas have boundaries, names and other
information that make it possible to locate them on the
ground and relate statistical information to them.
 This spatial relationship is particularly important for census
data.
 Maps are the most efficient tools to visualize spatial patterns
Choropleth maps
 The most common type of
map is the choropleth map,
in which areas are shaded
in proportion to the value
of the variable being
displayed.
 This kind of map provides
an easy way to visualize
patterns across space.
 Only ratios (i.e. proportions,
rates or densities) can be
mapped with this technique
Proportional symbol map
SUGGESTIONS
Suggestions to improve graph
Suggestions to improve graph
Suggestions to improve graph
Suggestions to improve graph
Suggestions to improve graph
Suggestions to improve graph
Adjusting the chart parameters
Type of data Vs Commonly used graphical
presentation
Scales of measurement Graphical presentation
Nominal or ordinal scale Bar graph
Pie diagram
Trend diagram
Box plot
Interval or ratio scale Histogram
Frequency polygon
Ogive curve
Scatter plot
Take home message
OLIVE JEAN DUNN, VIRGINIA A. CLARK, “Basic statistics: A Primer for the Biomedical
Sciences”, Fourth Edition
References
1. Wayne W. Daniel, Chad L.Cross;”Biostatistics: a foundation for
analysis in the Health Sciences”, 10th edition
2. Marcello Pagano, Kimberlee Gauvreau; “Principles of
Biostatistics”, 2nd edition
3. Michael J Campbell, “Statistics at square one”, 2nd edition
4. Ranjit Kumar, “Research methodology”, 3rd edition
5. Olive Jean Dunn, Virginia A. Clark, “Basic statistics: A Primer for
the Biomedical Sciences”, 4th Edition
6. United Nations Geneva, 2009, “Making Data Meaningful Part 2:A
guide to presenting statistics”

More Related Content

PPTX
statistic
PPTX
Statistics
PPTX
Descriptive statistics
PPTX
Stat 3203 -pps sampling
PPTX
Uses of SPSS and Excel to analyze data
PPTX
Role of theory in research by priyadarshinee pradhan
PPT
Human fertility and it's determinant
PPTX
Session 1 introduction of demography (as of 3-1-2017)
statistic
Statistics
Descriptive statistics
Stat 3203 -pps sampling
Uses of SPSS and Excel to analyze data
Role of theory in research by priyadarshinee pradhan
Human fertility and it's determinant
Session 1 introduction of demography (as of 3-1-2017)

What's hot (20)

PPTX
What is difference between search and research
PPTX
presentation of data
PPTX
SPSS How to use Spss software
PPSX
Types of Statistics
PDF
Malthus theory and population growth through human history
PPTX
Ozz(morbidity and mortality)
PDF
Introduction to Statistics
PPTX
Importance of statistics
PPTX
Descriptive statistics
PPTX
Population projection (30 1-2017) by dr min ko ko
PPTX
Toward a theory of social practices
PPT
Time series slideshare
PPTX
PDF
Population Studies / Demography Introduction
PPTX
ppt on data collection , processing , analysis of data & report writing
PDF
Basic Concepts of Statistics - Lecture Notes
PPT
Demography
PPT
Histogram
PPTX
History of statistics #1
PPTX
Scales of measurment
What is difference between search and research
presentation of data
SPSS How to use Spss software
Types of Statistics
Malthus theory and population growth through human history
Ozz(morbidity and mortality)
Introduction to Statistics
Importance of statistics
Descriptive statistics
Population projection (30 1-2017) by dr min ko ko
Toward a theory of social practices
Time series slideshare
Population Studies / Demography Introduction
ppt on data collection , processing , analysis of data & report writing
Basic Concepts of Statistics - Lecture Notes
Demography
Histogram
History of statistics #1
Scales of measurment
Ad

Similar to 03.data presentation(2015) 2 (20)

PDF
2. Descriptive Statistics.pdf
PPTX
Data Organizarion and presentation (1).pptx
PPTX
2 Lecture 2 organizing and displaying of data.pptx
PPTX
Data Presentation biostatistics, school of public health
PPTX
STATISTICS.pptx
PPTX
Types of data and graphical representation
PPTX
Hanan's presentation.pptx
PPT
Data presentation 2
PPTX
3. data graphics.pptx biostatistics reasearch methodology
PPTX
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptx
PPTX
day two.pptx
PPT
Data Types and Descriptive Statistics.ppt
PPTX
Intro to statistics
PPTX
2. AAdata presentation edited edited tutor srudents(1).pptx
PPTX
Data Presentation Methods.pptx
PPTX
Health statics chapter three.pptx for students
PPT
Data presentation
PPTX
lupes presentation epsf mansursadjhhjgfhf.pptx
PPT
descriptive _statis_CT_17feb2016-1-1.ppt
PPTX
Data presentation.pptx
2. Descriptive Statistics.pdf
Data Organizarion and presentation (1).pptx
2 Lecture 2 organizing and displaying of data.pptx
Data Presentation biostatistics, school of public health
STATISTICS.pptx
Types of data and graphical representation
Hanan's presentation.pptx
Data presentation 2
3. data graphics.pptx biostatistics reasearch methodology
Chapter-2-Frequency-Distribution-and-Graphical-Presentation.pptx
day two.pptx
Data Types and Descriptive Statistics.ppt
Intro to statistics
2. AAdata presentation edited edited tutor srudents(1).pptx
Data Presentation Methods.pptx
Health statics chapter three.pptx for students
Data presentation
lupes presentation epsf mansursadjhhjgfhf.pptx
descriptive _statis_CT_17feb2016-1-1.ppt
Data presentation.pptx
Ad

More from Mmedsc Hahm (20)

PPSX
Solid waste-management-2858710
PPTX
Situation analysis
PPT
Quantification of medicines need
PPTX
Quality in hospital
PPT
Patient satisfaction &amp; quality in health care (16.3.2016) dr.nyunt nyunt wai
PPTX
Organising
PPT
Nscbl slide
PPTX
Introduction to hahm 2017
PPT
Hss lecture 2016 jan
PPTX
Hospital management17
PPTX
Hopital stat
PPT
Health planning approaches hahm 17
PPTX
Ephs and nhp
PPTX
Directing and leading 2017
PPT
Concepts of em
PPT
Access to medicines p pt 17 10-2015
PPTX
The dynamics of disease transmission
PPTX
Study designs dr.wah
PPTX
Standardization dr.wah
DOCX
Solid waste-management-2858710
Situation analysis
Quantification of medicines need
Quality in hospital
Patient satisfaction &amp; quality in health care (16.3.2016) dr.nyunt nyunt wai
Organising
Nscbl slide
Introduction to hahm 2017
Hss lecture 2016 jan
Hospital management17
Hopital stat
Health planning approaches hahm 17
Ephs and nhp
Directing and leading 2017
Concepts of em
Access to medicines p pt 17 10-2015
The dynamics of disease transmission
Study designs dr.wah
Standardization dr.wah

Recently uploaded (20)

PPTX
AI_in_Pharmaceutical_Technology_Presentation.pptx
PPTX
Infection prevention and control for medical students
PPTX
Galactosemia pathophysiology, clinical features, investigation and treatment ...
PPT
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
PPTX
1. Drug Distribution System.pptt b pharmacy
PPTX
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
PPTX
First aid in common emergency conditions.pptx
PDF
Structure Composition and Mechanical Properties of Australian O.pdf
PPTX
First Aid and Basic Life Support Training.pptx
PDF
2E-Learning-Together...PICS-PCISF con.pdf
PPTX
CBT FOR OCD TREATMENT WITHOUT MEDICATION
DOCX
Copies if quanti.docxsegdfhfkhjhlkjlj,klkj
PDF
Myers’ Psychology for AP, 1st Edition David G. Myers Test Bank.pdf
PPTX
Rheumatic heart diseases with Type 2 Diabetes Mellitus
PPTX
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
PPTX
BLS, BCLS Module-A life saving procedure
PDF
Megan Miller Colona Illinois - Passionate About CrossFit
PDF
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
PPTX
ABG advance Arterial Blood Gases Analysis
PPTX
COMMUNICATION SKILSS IN NURSING PRACTICE
AI_in_Pharmaceutical_Technology_Presentation.pptx
Infection prevention and control for medical students
Galactosemia pathophysiology, clinical features, investigation and treatment ...
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
1. Drug Distribution System.pptt b pharmacy
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
First aid in common emergency conditions.pptx
Structure Composition and Mechanical Properties of Australian O.pdf
First Aid and Basic Life Support Training.pptx
2E-Learning-Together...PICS-PCISF con.pdf
CBT FOR OCD TREATMENT WITHOUT MEDICATION
Copies if quanti.docxsegdfhfkhjhlkjlj,klkj
Myers’ Psychology for AP, 1st Edition David G. Myers Test Bank.pdf
Rheumatic heart diseases with Type 2 Diabetes Mellitus
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
BLS, BCLS Module-A life saving procedure
Megan Miller Colona Illinois - Passionate About CrossFit
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
ABG advance Arterial Blood Gases Analysis
COMMUNICATION SKILSS IN NURSING PRACTICE

03.data presentation(2015) 2

  • 2. Main purpose of data presentation or displaying data  To make the findings easy and clear to understand  To provide comprehensive information in a succinct and efficient way
  • 3. Four ways of displaying data  Text  Tables  Graph  Statistical measures ( see in descriptive statistics lecture)
  • 4. Text  The most common method of communication in both quantitative and qualitative research studies  Writing should be thematic; i.e written around various themes of report  Text should place the most important and significant findings in the context of short- and longer-term trends.  It should explore relationships, causes and effects, to the extent that they can be supported by evidence.  It should show readers the significance of the most current information.
  • 5. Good example of text Net profits of non-financial companies in the Netherlands amounted to 19 billion euros in the second quarter of 2008. This is the lowest level for three years. Profits were 11 percent lower than in the second quarter of 2007. The drop in net profits is the result of two main factors: higher interest costs - the companies paid more net interest - and lower profits of foreign subsidiaries. Source: Statistics Netherlands  Findings should be integrated into the literature citing references using acceptable system of citation
  • 6. Tables  A table is the simplest means of summarizing a set of observations .  Tables are more informative when they are not overly complex.  As a general rule, tables and columns within them should always clearly labeled.  If units of measurement are involved, they should be specified.  Uses: It can be used for all types of numerical data.
  • 7. Table Structure of tables 1. Title: indicates the table number and describes the type of data the table contains 2. Row stub: the subcategories of a variable, listed along the y- axis 3. Column headings: the subcategories of a variable, listed along the x-axis 4. Body: the cells housing the analysed data * Supplementary notes or footnotes
  • 8. Structure of tables Table (x) Attitudes towards uranium by age Attitudes towards uranium mining Age of respondent Total <25 25-34 35-44 45-54 55+ Strongly favourable cell Favourable Uncertain Unfavaourable Strongly unfavourable Total Title Column headings Row stub Body Source: …………………………Hypothetical data Supplementary notes subcategory  should give a clear and accurate description of the data.  should answer the three questions “what”, “where” and “when”.  Be short and concise, and avoid using verbs.
  • 9. Tables Types of tables 1. Univariate (also known as frequency tables) – containing information about one variable 2. Bivariate (also known as cross-tabulations) – containing information about two variables 3. Polyvariate or multivariate – containing information about more than two variables
  • 10. Tables Frequency distributions (Absolute frequency)  One type of data that is commonly used to evaluate data  For nominal and ordinal data, a frequency distribution consists of a set of class or categories Kaposi ‘s sarcoma Number of individuals Yes 246 No 2314 Table (1) Cases of Kaposi’s sarcoma for the first 2560 AIDS patients reported to the Centers for Disease Control in Atlanta, Georgia
  • 11. Tables (Frequency table/Univariate) Table (2) Cigarette consumption per person aged 18 or older, United states, 1900-1960 Year Number of Cigarettes 1990 54 1910 151 1920 665 1930 1485 1940 1976 1950 3522 1960 4171
  • 12. Tables Frequency distributions  For discrete or continuous data, the range of values of the observations must be broken down into a series of distinct, non overlapping intervals with equal width intervals
  • 13. Tables (Frequency table/Univariate) Table (3) Absolute frequencies of serum cholesterol levels for 1067 US males age 25 to 34 years, 1976-1980 Cholesterol Level (mg/100ml) Number of Men 80-119 13 120-159 150 160-199 442 200-239 299 240-279 115 280-319 34 320-359 9 360-399 5 Total 1067
  • 14. Tables Relative Frequency  The proportion of the total number of observations that appears in that interval  Computed by dividing the number of values within an interval by the total number of values in the table  Relative frequencies are useful for comparing sets of data that contain unequal number of observations
  • 15. Tables (Bivariate/cross-tabulations) Table (4) Absolute and relative frequencies of serum cholesterol levels for 2294 US males, 1976-1980 Cholesterol Level (mg/100ml) Age 25-34 Age 55-64 Number of Men Relative frequency (%) Number of Men Relative frequency (%) 80-119 13 1.2 5 0.4 120-159 150 14.1 48 3.9 160-199 442 41.4 265 21.6 200-239 299 28.0 458 37.3 240-279 115 10.8 281 22.9 280-319 34 3.2 128 10.4 320-359 9 0.8 35 2.9 360-399 5 0.5 7 0.6 Total 1067 100.0 1227 100.0
  • 16. Tables Cumulative relative Frequency  The percentage of the total number of observations that have a value less than or equal to the upper limit of the interval  Calculated by summing the relative frequencies for the specified interval and all the previous ones
  • 17. Tables (Bivariate/cross-tabulations) Table (5) Relative and cumulative frequencies of serum cholesterol levels for 2294 US males, 1976-1980 Cholesterol Level (mg/100ml) Age 25-34 Age 55-64 Relative frequency (%) Cumulative relative frequency (%) Relative frequency (%) Cumulative relative frequency (%) 80-119 1.2 1.2 0.4 0.4 120-159 14.1 15.3 3.9 4.3 160-199 41.4 56.7 21.6 25.9 200-239 28.0 84.7 37.3 63.2 240-279 10.8 95.5 22.9 86.1 280-319 3.2 98.7 10.4 96.5 320-359 0.8 99.5 2.9 99.4 360-399 0.5 100.0 0.6 100.0 Total 100.0 100.0
  • 18. Tables Types of percentages  The use of percentages is a common procedure in the interpretation of data  Three types of percentage  Row percentage  Column percentage  Total percentage Use of rounding and decimals: Numeric values should be right justified
  • 19. What do you find out in this table? Bad example
  • 20. What do you find out in this table? Bad example Good example
  • 22. Graphs  Graph or pictorial representation of numerical data → to summarize and display data  Should be designed to convey the general patterns in a set of observations at a single glance  Most informative graphs are relatively simple and self- explanatory.  They should be clearly labeled and units of measurement should be indicated.
  • 23. Bar Charts  A popular type of graph used to display a frequency distribution for nominal or ordinal data.  Horizontal axis: various categories into which the observations fall  Vertical axis (height of bar): the frequency or the relative frequency of observations within the class  Uses: It is used to compare frequencies or values for different categories or groups.
  • 24. Bar Charts  The bars can be either vertically or horizontally oriented.  In the horizontal orientation, the text is easier to read.  It is also easier to compare the different values when the bars are ordered by size from smallest to largest, rather than displayed arbitrarily.  The bars should be much wider than the gaps between them.  The gaps should not exceed 40% of the bar width.
  • 25. Simple Bar chart (vertical) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 Numberofcigarettes Year Figure (1) Cigarette consumption per person 18 years of age or older, United states, 1900-1990
  • 26. Simple Bar chart (horizontal)
  • 27. Clustered Bar chart Source: Fulfilling the Health Agenda for Women and Children: The 2014 Report
  • 28. Clustered Bar chart Figure 2. Use of yoga among adults in the past 12 months, by age group: United States, 2002, 2007, and 2012
  • 29. Stacked bar  Uses: A stacked bar chart can be used to show and compare segments of totals.  Caution should be exercised when using this type of chart.  It can be difficult to analyze and compare, if there are too many items in each stack or if many items are fairly close in size.
  • 32. Pie chart  A pie chart can be used to show the percentage distribution of one variable, but only a small number of categories can be displayed, usually not more than six.  There are 360 degrees in a circle and so the full circle can be used to represent 100% or the total population.  The circle or pie is divided into sections in accordance with the magnitude of each category  Each slice is proportionate to the size of each subcategory of a frequency distribution  Uses: Pies can be drawn for both qualitative data and variables measured on a continuous scale but grouped into categories
  • 33. Pie chart 53.0%47.0% Male Female Figure (1) Gender of study population
  • 34. Pie chart 69.80% 14.63% 3.86% 1.65% 3.14% 2.23% 4.69% Hospitals Ambulatory health care Retail sale and medical goods Figure (2) Government health expenditure by providers (2010-2011)
  • 36. Histogram  A special type of bar graph showing frequency distribution for continuous data  When we construct a histogram the values of the variable under consideration are represented by the horizontal axis, while the vertical axis has as its scale the frequency (or relative frequency if desired) of occurrence.  True class limit is used for continuity of values or observations
  • 37. Histogram Figure(1) Histogram for leadership aptitude scores for n= 30 football coaches.
  • 38. Frequency polygon  Frequency distribution can be portrayed graphically by means of a frequency polygon, which is a special kind of line graph.  To draw a frequency polygon we first place a dot above the midpoint of each class interval represented on the horizontal axis of a graph.  The height of a given dot above the horizontal axis corresponds to the frequency of the relevant class interval.  Connecting the dots by straight lines produces the frequency polygon.
  • 39. Frequency polygon  Note that the polygon is brought down to the horizontal axis at the ends at points that would be the midpoints if there were an additional cell at each end of the corresponding histogram.  This allows for the total area to be enclosed.  The total area under the frequency polygon= the area under the histogram.
  • 40. Frequency polygon Figure(3) Histogram and frequency polygon of ages of 189 subjects Figure (2) Frequency polygon of ages of 189 students
  • 41. Ogive curve The graph of a cumulative probability distribution is called ‘Ogive’. Figure (3) Cumulative percentage frequency polygon for leadership aptitude scores for n= 30 football coaches
  • 42. Stem-and-leaf  A properly constructed stem-and-leaf display, like a histogram, provides information regarding the range of the data set, shows the location of the highest concentration of measurements, and reveals the presence or absence of symmetry.  An advantage of the stem-and-leaf display over the histogram is the fact that it preserves the information contained in the individual measurements.  Such information is lost when measurements are assigned to the class intervals of a histogram.  Another advantage of stem-and-leaf displays is that they can be constructed during the tallying process, so the intermediate step of preparing an ordered array is eliminated.
  • 43. Construction of Stem-and-leaf  The first part is called the stem, and the second part is called the leaf.  The stem consists of one or more of the initial digits of the measurement, and the leaf is composed of one or more of the remaining digits.  All partitioned numbers are shown together in a single display; the stems form an ordered column with the smallest stem at the top and the largest at the bottom.  In the stem column all stems within the range of the data even when a measurement with that stem is not in the data set.  The rows of the display contain the leaves, ordered and listed to the right of their respective stems.  When leaves consist of more than one digit, all digits after the first may be deleted. Decimals when present in the original data are omitted in the stem-and-leaf display.  The stems are separated from their leaves by a vertical line.  Thus, a stem-and-leaf display is also an ordered array of the data.
  • 44. Stem-and-leaf  Stem-and-leaf displays are most effective with relatively small data sets.  As a rule, they are not suitable for use in annual reports or other communications aimed at the general public.  They are primarily of value in helping researchers and decision makers understand the nature of their data.  Histograms are more appropriate for externally circulated publications
  • 45. Stem and leaf display
  • 46. Box-and-Whisker Plots  Useful ways to display data (Exploratory data analysis)  At the centre of the plot is the median, which is surrounded by a box the top and bottom of which are the limits within which the middle 50% of observations fall.  Sticking out of the top and bottom of the box are two whiskers which extend to the most and least extreme scores respectively.  The horizontal lines are called fences. The upper fence is at ( Q3 + 1.5(IQR)) or the largest X , whichever is lower.  The lower fence is at (Q1 - 1.5(IQR)) or the smallest X , whichever is higher.  Values that are outside the fences are considered possible extreme values, or outliers.
  • 47. Box-and-Whisker Plots  In fairly symmetric data sets, the adjacent values should contain approximately 99% of the measurements.  All points outside this range are represented by circles: these observations are considered to be outliers or data points that are not typical of the rest of values
  • 48. Box-and-Whisker Plots Figure () Boxplot of hygiene scores on day 1 of the Download Festival split by gender
  • 49. Box-and-Whisker Plots Figure () Boxplot of hygiene scores on day 1 of the Download Festival split by gender
  • 51. One way scatter plot  Uses: One-way scatter plots are the simplest type of graph that can be used to summarize a set of continuous observations.  A one-way scatter plot uses a single horizontal axis to display the relative position of each data point.  An advantage of a one-way scatter plot is that since each observation is represented individually, no information is lost  A disadvantage is that it may be difficult to read (and to construct) if values are close to each other. Figure 2.1 Crude death rates for the United States, 1988.
  • 52. Scatter plot (Two way)  Both the variables must be measured either on interval or ratio scales  The data on both the variables needs to be available in absolute values for each observation  Data for both variables is taken pairs and displayed as dots in relation to their values on both axes
  • 54. Scatter plot (Two way) Figure (3) Scatter diagram reveals pattern of strong positive correlation
  • 55. Line diagram or trend curve  Most appropriate type of chart for time series  A trend line can be drawn for data pertaining to both a specific time (e.g. 1995,1996, 1997) or a period (e.g. 1985-1989, 1990- 1994, 1995-)  A line diagram is useful way of conveying the changes when long- term trends in a phenomenon or situation need to be studied  For example, a line diagram would be useful for illustrating trends in births or death rates and changes in population size
  • 56. Line diagram or trend curve  Uses: A set of data measured on a continuous interval or a ratio scale can be displayed using a line diagram or trend curve
  • 57. Area Chart  For variables measured on an interval or a ratio scale, information about the subcategories of a variable can be presented in the form of an area chart.  This is plotted in the same way as a line diagram but with the area under each line shaded to highlight the total magnitude of the subcategory in relation to other subcategories.
  • 58. Area Chart 0 5 10 15 20 25 30 35 40 45 < 25 25-34 35-44 45-54 55+ Numberofrespondents Age group Female Male Figure (1) Attitudes towards uranium mining
  • 59. Exploratory data analysis (EDA)  Exploratory Data Analysis (EDA) was heavily promoted by John Tukey, whose book on the topic is widely regarded as a statistical classic.  Exploring data, by summarising and plotting variables and the relationships between them, is an important step in subsequent modelling and analysis.  By exploring data, this procedure will gain insight the nature of data set and look for the errors and anomalies.
  • 60. Exploratory data analysis (EDA) An approach to data analysis that emphasizes the use of informal graphical procedures not based on prior assumptions about the structure of the data or on formal models for the data. The Cambridge Dictionary of Statistics, 4th edition  The essence of this approach is that, broadly speaking, data are assumed to possess the following structure Data = Smooth + Rough where the ‘Smooth’ is the underlying regularity or pattern in the data.  The objective of the exploratory approach is to separate the smooth from the ‘Rough’ with minimal use of formal mathematics or statistical methods.
  • 61. Exploratory data analysis (EDA)  Two forms of EDA : Numerical summaries and plots  Numerical summaries: Measures of central tendency, Measures of spread, Measures of correlation, confidence intervals  Plots: Histogram, Stem and leaf display, Box plot, scatter plot, Bar plot,……..
  • 62. Maps  A graph used to plot variables by geographic locations  Geographic information is an integral part of all statistical data.  Geographic areas have boundaries, names and other information that make it possible to locate them on the ground and relate statistical information to them.  This spatial relationship is particularly important for census data.  Maps are the most efficient tools to visualize spatial patterns
  • 63. Choropleth maps  The most common type of map is the choropleth map, in which areas are shaded in proportion to the value of the variable being displayed.  This kind of map provides an easy way to visualize patterns across space.  Only ratios (i.e. proportions, rates or densities) can be mapped with this technique
  • 72. Adjusting the chart parameters
  • 73. Type of data Vs Commonly used graphical presentation Scales of measurement Graphical presentation Nominal or ordinal scale Bar graph Pie diagram Trend diagram Box plot Interval or ratio scale Histogram Frequency polygon Ogive curve Scatter plot
  • 74. Take home message OLIVE JEAN DUNN, VIRGINIA A. CLARK, “Basic statistics: A Primer for the Biomedical Sciences”, Fourth Edition
  • 75. References 1. Wayne W. Daniel, Chad L.Cross;”Biostatistics: a foundation for analysis in the Health Sciences”, 10th edition 2. Marcello Pagano, Kimberlee Gauvreau; “Principles of Biostatistics”, 2nd edition 3. Michael J Campbell, “Statistics at square one”, 2nd edition 4. Ranjit Kumar, “Research methodology”, 3rd edition 5. Olive Jean Dunn, Virginia A. Clark, “Basic statistics: A Primer for the Biomedical Sciences”, 4th Edition 6. United Nations Geneva, 2009, “Making Data Meaningful Part 2:A guide to presenting statistics”