SlideShare a Scribd company logo
Data Management and Analysis
Data Management
Learning Objectives
By the end of the session, participants will be able to:
1.Understand the general rules of appropriate data
management
2.Understand how to define roles and
responsibilities regarding data management
3.Utilize the information featured in the session to
implement a system for good data management
Introduction
• Data management is an important component in M&E
and deserves extra attention and diligence
• M&E teams should invest a significant part of their
time and effort in data management
• M&E teams should understand the basic concepts of
data management
• Data management policies and procedures should be
clearly defined
Data Capture
Forms
Questionnaires
Data Entry Database
Paper based data
Personal Digital
Assistant (PDA)
Database
Paperless data
Data Capture, cont.
• Plan data capture carefully
• Decide on which software you will be using
• Define your database structure (tables or data files)
• Develop data entry screen (should be user-friendly and
include check for plausible values)
• Make provision for double- entry
Aspect Critical level
Consistency/validation 99%
Error (range check) 100%
Double entry 100%
Set Quality Target
Form/Questionnaire Flow
Field Supervisor
Filing Officer
Data Entry Supervisor
Interviewer
Data Entry
Database Manager
Questionnaire
completed
Questionnaire
fully entered
Questionnaire
with problems
Data Cleaning
• Check completeness of the data
• Check consistency- compare variables
• Check plausibility (value with acceptable range)
• Check for duplicates
• Check for outliers (run basic freq, mean)
Data Cleaning, cont.
Data Cleaning Trade-off curve
Data Security
• Access to data should be restricted (Password)
• Final analytical data should be anonymous
• Make sure to do a regular data backup-daily-
weekly-monthly…
• If possible store a copy of your back up off-site
Other Aspects to Consider
Data Ownership: Who has the legal rights to the data
and who retains the data
Data Retention: Length of time one needs to keep the
project data
Data Sharing: How project data and results are
disseminated, and when data should not be shared
Data Analysis and Interpretation
Session Objectives
1. Strengthen knowledge of terminology used in data
analysis and interpretation
2. Strengthen skills in data analysis and interpretation
3. Improve capacity to summarize data
4. Strengthen effective communication methods
What is Data Analysis?
• The process of understanding and explaining
what findings actually mean. Turning raw data
into useful information
• Provide answers to questions being asked at a
program site or research questions being
studied
• The greatest amount and best quality data
mean nothing if not properly analyzed, or, if not
analyzed at all
How would you analyze data to determine, “Is my
program meeting it’s objectives?”
Question Data Analysis Answers
Analysis is looking at the data in light of the questions
you need to answer
What is Data Analysis?, cont.
Is Our Program on Track?
• Analysis: Compare program targets and actual
program performance to learn how far you
are from target
• Interpretation: Why you have or have not
achieved the target and what this means for
your program
• May require more information
Examples of Analysis
Compare actual performance against targets
Indicator Progress (6/12/13) Target (1/30/14)
Number of persons trained on case
management
15 100
Comparing current performance to prior year
Indicator 2011 2012
No. of LLIN distributed 50,000 167,000
Compare performance between sites or groups
Indicator District A District B
Number of fever cases tested for
malaria by clinics
3,500 8,000
Statistical Measures
• Measure of central tendency
– Mean
– Median
– Mode
• Measure of variation
– Range
– Variance and standard deviation
– Interquartile range
– Proportion, Percentage
• Ratio, Rate
Mean
Sum of the values
divided by the
number of cases.
Also called average
̄y=
∑ yi
n
Month Cases 2008
Jan 30
Feb 45
Mar 38
April 41
May 37
Jun 40
Jul 70
Aug 270
Sep 280
Oct 200
Nov 100
Dec 29
∑ yi=1,180
n= 12
̄y=
1,180
12
=98.2
Average number of confirmed malaria cases
per month
Total number of cases
Number of observations
Mean number of cases
Very sensitive to variation
Number of observations
Mean number of cases
Total number of cases
Median
• Represents the middle
of the ordered sample
data
• For odd sample size, the
median is the middle
value
• For even, the median is
the midpoint/mean of
the two middle values
Month Cases
2008
Cases
2009
Dec 29 24
Jan 30 29
May 37 32
Mar 38 35
Jun 40 39
April 41 39
Feb 45 42
Jul 70 65
Nov 100 80
Oct 200 150
Aug 270 200
Sep 280 -
Median number of confirmed malaria cases
Not sensitive to variation
median=
41+45
2
=43
Median for 2008
Median for 2009
median= 39
Mode
• Value that occurs most
frequently
• It is the least useful (and
least used) of the three
measures of central
tendency
Month Cases
2008
Cases
2009
Dec 29 24
Jan 30 29
May 37 32
Mar 38 35
Jun 40 39
April 41 39
Feb 45 42
Jul 70 65
Nov 100 80
Oct 200 150
Aug 270 200
Sep 280 -
Mode number of confirmed malaria cases
mod e=none
Mode for 2008
Mode for 2009
39=mode
Practice Calculations
• What is the mode, mean
and median parasitemia
for the following set of
observations?
1.5, 1.8, 2.5, 4.1, 8.3, 1.2,
1.9, 0.6
• Answers:
– Mean = 2.74
– Median = 1.85
– Mode=none
– Would you use Mean or
Median?
– Answer: Median
– Use Median when you
have a large variation
between high and low
numbers
– Use Mean when there is
not a huge variation
between the values
Ratio
• Comparison of two numbers
• Expressed as:
– a to b, a per b, a:b
– 2 household members per (one) mosquito net, a
ratio of 3:1
• All individuals included in the numerator are not
necessarily included in the denominator
Proportion
• A ratio in which all individuals in the numerator are
also in the denominator
• Example: If a clinic has 12 female clients and 8 males
clients, then the proportion of male clients is 8/20 or
2/5
F F F F
F F F F
F F F F
M M M M
M M M M
Percentage
• A way to express a proportion
• Proportion multiplied by 100
• Example: Males comprise 2/5 of the
clients or, 40% of the clients are male
(0.40 x 100)
Important to know: What is the whole? An
orange? An apple? All clients? All clients on
with a fever?
Why do we want to know the
percentage?
• Helps us standardize so that we are able to
compare data across facilities, regions,
countries
• Better conceptualize what needs to be done
– Percentage helps us to track progress on our
targets
Rate
• A quantity measured with
respect to another
measured quantity
• Number of cases that occur
over a given time period
divided by population at risk
in the same time period
(Under five mortality rate)
Source: UNICEF: Statistics and Monitoring by Country
Nation Under five mortality
rate per 1,000 live
births in 2008
France 4
Ghana 76
Sierra Leone 194
Afghanistan 257
Probability of Dying Under Age Five per
1,000 Live Births
Annual Parasite Incidence (API)
Number of microscopically confirmed malaria cases
detected during one year per unit population
Confirmed malaria cases during 1 year
Population under surveillance
API X 1000
Most Common Software
• Microsoft Access
• Microsoft Excel
• Epi-Info
• SPSS
• Stata
• SAS
Data Analysis: Exercise
Learning objectives
1. Learn to calculate descriptive statistics and
run cross tabs in Excel and EpiInfo
2. Identify situations in which more
complicated analysis is necessary
MEASURE Evaluation is a MEASURE program project funded by the
U.S. Agency for International Development (USAID) through
Cooperative Agreement GHA-A-00-08-00003-00 and is implemented
by the Carolina Population Center at the University of North Carolina
at Chapel Hill, in partnership with Futures Group International, John
Snow, Inc., ICF Macro, Management Sciences for Health, and Tulane
University.
Visit us online at http://www.cpc.unc.edu/measure

More Related Content

PPT
Data management and analysis
PPTX
Data management
PDF
Risk Assessment PowerPoint Presentation Slides
PPT
Customer Relationship Management
PDF
Sample size determination
PPSX
Aggregate planning
PPT
Genetic basis of inheritance
PPTX
Variables in research
Data management and analysis
Data management
Risk Assessment PowerPoint Presentation Slides
Customer Relationship Management
Sample size determination
Aggregate planning
Genetic basis of inheritance
Variables in research

What's hot (20)

PPTX
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
PPTX
Analysis and Interpretation of Data
PPT
Data Quality
PPTX
Data Quality Presentation
PPTX
Data analysis
PPTX
ANALYSIS OF DATA.pptx
PPTX
Data Analysis, Presentation and Interpretation of Data
PPTX
Introduction to Statistics - Basic concepts
PPTX
Data collection methods
PPTX
statistical analysis
PPTX
Data analysis
PPSX
An overview of sampling
PPT
Data Quality Control
PPTX
Data Quality & Data Governance
PDF
Analysing qualitative data from information organizations
PPTX
Data Analysis & Visualization using MS. Excel
PPTX
Introduction to Business Data Analytics
PPTX
Unit 8 data analysis and interpretation
PPT
Chapter 10-DATA ANALYSIS & PRESENTATION
PDF
Data analysis
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Analysis and Interpretation of Data
Data Quality
Data Quality Presentation
Data analysis
ANALYSIS OF DATA.pptx
Data Analysis, Presentation and Interpretation of Data
Introduction to Statistics - Basic concepts
Data collection methods
statistical analysis
Data analysis
An overview of sampling
Data Quality Control
Data Quality & Data Governance
Analysing qualitative data from information organizations
Data Analysis & Visualization using MS. Excel
Introduction to Business Data Analytics
Unit 8 data analysis and interpretation
Chapter 10-DATA ANALYSIS & PRESENTATION
Data analysis
Ad

Viewers also liked (7)

PPTX
Morphology of cockroach
PPTX
The cockroach
PPTX
Cockroach
PDF
The cockroach presentation
PPTX
The Cockroach
PPTX
The cockroach
PPTX
Life cycle of cockroach
Morphology of cockroach
The cockroach
Cockroach
The cockroach presentation
The Cockroach
The cockroach
Life cycle of cockroach
Ad

Similar to data management and analysis (20)

PPTX
Business Statistics for Managers with SPSS[1].pptx
PDF
WEEK-1-IS-20022023-094301am.pdf
PPTX
Finding the answers to the research questions.pptx
PPTX
designing the methodology.pptx
PPTX
designing the methodology.pptx
PPTX
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
PPTX
1.2 types of data
PPTX
Data Analytics powerpoint presentation 1
PPTX
chapter 2 data collection and presentation business statistics
PPT
sources of data.ppt
DOCX
Assignment 2 RA Annotated BibliographyIn your final paper for .docx
PPT
1statistics (2).ppt and probabilty for grade 11
PDF
Final spss hands on training (descriptive analysis) may 24th 2013
PPTX
Statistics 000000000000000000000000.pptx
PPTX
first-batch-me-training.pptx
PPTX
Data quality: total survey error
PDF
This document presents an invaluable class notes for Quantitative Methods Top...
PPTX
Practical applications and analysis in Research Methodology
PPTX
Introduction to Data Analysis for Nurse Researchers
Business Statistics for Managers with SPSS[1].pptx
WEEK-1-IS-20022023-094301am.pdf
Finding the answers to the research questions.pptx
designing the methodology.pptx
designing the methodology.pptx
Seven steps for Use Routine Information to Improve HIV/AIDS Program_Snyder_5....
1.2 types of data
Data Analytics powerpoint presentation 1
chapter 2 data collection and presentation business statistics
sources of data.ppt
Assignment 2 RA Annotated BibliographyIn your final paper for .docx
1statistics (2).ppt and probabilty for grade 11
Final spss hands on training (descriptive analysis) may 24th 2013
Statistics 000000000000000000000000.pptx
first-batch-me-training.pptx
Data quality: total survey error
This document presents an invaluable class notes for Quantitative Methods Top...
Practical applications and analysis in Research Methodology
Introduction to Data Analysis for Nurse Researchers

Recently uploaded (20)

PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPTX
CME 2 Acute Chest Pain preentation for education
PPTX
CHEM421 - Biochemistry (Chapter 1 - Introduction)
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
PPTX
Neuropathic pain.ppt treatment managment
PPTX
Important Obstetric Emergency that must be recognised
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPT
ASRH Presentation for students and teachers 2770633.ppt
PPTX
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
PPT
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
PDF
focused on the development and application of glycoHILIC, pepHILIC, and comm...
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
post stroke aphasia rehabilitation physician
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PPTX
ACID BASE management, base deficit correction
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
شيت_عطا_0000000000000000000000000000.pdf
CME 2 Acute Chest Pain preentation for education
CHEM421 - Biochemistry (Chapter 1 - Introduction)
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
Neuropathic pain.ppt treatment managment
Important Obstetric Emergency that must be recognised
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
ASRH Presentation for students and teachers 2770633.ppt
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
focused on the development and application of glycoHILIC, pepHILIC, and comm...
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
Human Health And Disease hggyutgghg .pdf
post stroke aphasia rehabilitation physician
OPIOID ANALGESICS AND THEIR IMPLICATIONS
Medical Evidence in the Criminal Justice Delivery System in.pdf
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
ACID BASE management, base deficit correction

data management and analysis

  • 3. Learning Objectives By the end of the session, participants will be able to: 1.Understand the general rules of appropriate data management 2.Understand how to define roles and responsibilities regarding data management 3.Utilize the information featured in the session to implement a system for good data management
  • 4. Introduction • Data management is an important component in M&E and deserves extra attention and diligence • M&E teams should invest a significant part of their time and effort in data management • M&E teams should understand the basic concepts of data management • Data management policies and procedures should be clearly defined
  • 5. Data Capture Forms Questionnaires Data Entry Database Paper based data Personal Digital Assistant (PDA) Database Paperless data
  • 6. Data Capture, cont. • Plan data capture carefully • Decide on which software you will be using • Define your database structure (tables or data files) • Develop data entry screen (should be user-friendly and include check for plausible values) • Make provision for double- entry
  • 7. Aspect Critical level Consistency/validation 99% Error (range check) 100% Double entry 100% Set Quality Target
  • 8. Form/Questionnaire Flow Field Supervisor Filing Officer Data Entry Supervisor Interviewer Data Entry Database Manager Questionnaire completed Questionnaire fully entered Questionnaire with problems
  • 9. Data Cleaning • Check completeness of the data • Check consistency- compare variables • Check plausibility (value with acceptable range) • Check for duplicates • Check for outliers (run basic freq, mean)
  • 10. Data Cleaning, cont. Data Cleaning Trade-off curve
  • 11. Data Security • Access to data should be restricted (Password) • Final analytical data should be anonymous • Make sure to do a regular data backup-daily- weekly-monthly… • If possible store a copy of your back up off-site
  • 12. Other Aspects to Consider Data Ownership: Who has the legal rights to the data and who retains the data Data Retention: Length of time one needs to keep the project data Data Sharing: How project data and results are disseminated, and when data should not be shared
  • 13. Data Analysis and Interpretation
  • 14. Session Objectives 1. Strengthen knowledge of terminology used in data analysis and interpretation 2. Strengthen skills in data analysis and interpretation 3. Improve capacity to summarize data 4. Strengthen effective communication methods
  • 15. What is Data Analysis? • The process of understanding and explaining what findings actually mean. Turning raw data into useful information • Provide answers to questions being asked at a program site or research questions being studied • The greatest amount and best quality data mean nothing if not properly analyzed, or, if not analyzed at all
  • 16. How would you analyze data to determine, “Is my program meeting it’s objectives?” Question Data Analysis Answers Analysis is looking at the data in light of the questions you need to answer What is Data Analysis?, cont.
  • 17. Is Our Program on Track? • Analysis: Compare program targets and actual program performance to learn how far you are from target • Interpretation: Why you have or have not achieved the target and what this means for your program • May require more information
  • 18. Examples of Analysis Compare actual performance against targets Indicator Progress (6/12/13) Target (1/30/14) Number of persons trained on case management 15 100 Comparing current performance to prior year Indicator 2011 2012 No. of LLIN distributed 50,000 167,000 Compare performance between sites or groups Indicator District A District B Number of fever cases tested for malaria by clinics 3,500 8,000
  • 19. Statistical Measures • Measure of central tendency – Mean – Median – Mode • Measure of variation – Range – Variance and standard deviation – Interquartile range – Proportion, Percentage • Ratio, Rate
  • 20. Mean Sum of the values divided by the number of cases. Also called average ̄y= ∑ yi n Month Cases 2008 Jan 30 Feb 45 Mar 38 April 41 May 37 Jun 40 Jul 70 Aug 270 Sep 280 Oct 200 Nov 100 Dec 29 ∑ yi=1,180 n= 12 ̄y= 1,180 12 =98.2 Average number of confirmed malaria cases per month Total number of cases Number of observations Mean number of cases Very sensitive to variation Number of observations Mean number of cases Total number of cases
  • 21. Median • Represents the middle of the ordered sample data • For odd sample size, the median is the middle value • For even, the median is the midpoint/mean of the two middle values Month Cases 2008 Cases 2009 Dec 29 24 Jan 30 29 May 37 32 Mar 38 35 Jun 40 39 April 41 39 Feb 45 42 Jul 70 65 Nov 100 80 Oct 200 150 Aug 270 200 Sep 280 - Median number of confirmed malaria cases Not sensitive to variation median= 41+45 2 =43 Median for 2008 Median for 2009 median= 39
  • 22. Mode • Value that occurs most frequently • It is the least useful (and least used) of the three measures of central tendency Month Cases 2008 Cases 2009 Dec 29 24 Jan 30 29 May 37 32 Mar 38 35 Jun 40 39 April 41 39 Feb 45 42 Jul 70 65 Nov 100 80 Oct 200 150 Aug 270 200 Sep 280 - Mode number of confirmed malaria cases mod e=none Mode for 2008 Mode for 2009 39=mode
  • 23. Practice Calculations • What is the mode, mean and median parasitemia for the following set of observations? 1.5, 1.8, 2.5, 4.1, 8.3, 1.2, 1.9, 0.6 • Answers: – Mean = 2.74 – Median = 1.85 – Mode=none – Would you use Mean or Median? – Answer: Median – Use Median when you have a large variation between high and low numbers – Use Mean when there is not a huge variation between the values
  • 24. Ratio • Comparison of two numbers • Expressed as: – a to b, a per b, a:b – 2 household members per (one) mosquito net, a ratio of 3:1 • All individuals included in the numerator are not necessarily included in the denominator
  • 25. Proportion • A ratio in which all individuals in the numerator are also in the denominator • Example: If a clinic has 12 female clients and 8 males clients, then the proportion of male clients is 8/20 or 2/5 F F F F F F F F F F F F M M M M M M M M
  • 26. Percentage • A way to express a proportion • Proportion multiplied by 100 • Example: Males comprise 2/5 of the clients or, 40% of the clients are male (0.40 x 100) Important to know: What is the whole? An orange? An apple? All clients? All clients on with a fever?
  • 27. Why do we want to know the percentage? • Helps us standardize so that we are able to compare data across facilities, regions, countries • Better conceptualize what needs to be done – Percentage helps us to track progress on our targets
  • 28. Rate • A quantity measured with respect to another measured quantity • Number of cases that occur over a given time period divided by population at risk in the same time period (Under five mortality rate) Source: UNICEF: Statistics and Monitoring by Country Nation Under five mortality rate per 1,000 live births in 2008 France 4 Ghana 76 Sierra Leone 194 Afghanistan 257 Probability of Dying Under Age Five per 1,000 Live Births
  • 29. Annual Parasite Incidence (API) Number of microscopically confirmed malaria cases detected during one year per unit population Confirmed malaria cases during 1 year Population under surveillance API X 1000
  • 30. Most Common Software • Microsoft Access • Microsoft Excel • Epi-Info • SPSS • Stata • SAS
  • 32. Learning objectives 1. Learn to calculate descriptive statistics and run cross tabs in Excel and EpiInfo 2. Identify situations in which more complicated analysis is necessary
  • 33. MEASURE Evaluation is a MEASURE program project funded by the U.S. Agency for International Development (USAID) through Cooperative Agreement GHA-A-00-08-00003-00 and is implemented by the Carolina Population Center at the University of North Carolina at Chapel Hill, in partnership with Futures Group International, John Snow, Inc., ICF Macro, Management Sciences for Health, and Tulane University. Visit us online at http://www.cpc.unc.edu/measure

Editor's Notes

  • #2: Speaker Notes: Today we will learn about basic data management and analysis.
  • #3: Speaker Notes: We will start by discussing data management. We discussed this briefly in the data quality session and will go into further detail here.
  • #4: Speaker Notes: By the end of the session, participants will be able to: [READ BULLETS]
  • #5: Speaker Notes: Data management is an important component in M&E and deserves extra attention and diligence. M&E teams should invest a significant part of their time and effort in data management because a well-functioning data management system can improve the quality of the data collected and facilitate analysis. At a minimum, M&E teams should understand the basic concepts of data management. At the top or central level and at lower levels of a data system or study, data management policies and procedures should be clearly defined. If only the top-level individuals know the policies and procedures for data management, but the individuals imputing data or managing the system on a day-to-day basis do not, you are likely to find extensive problems in the data quality.
  • #6: Speaker Notes: There are two main methods of data capture, on for paper-based data and another for paperless electronic data. Paper-based data generally includes forms, registries and questionnaires. These data must be put into a database in one manner or another. This is usually done using a computer and keyboard; however, smart phones and SMS technology are sometimes used to enter data and transmit it to a database. Issues in your data can arise at any step in this process. Paperless data is collected electronically usually using a computer, PDA or smartphone. It is then put directly into a database using cables, internet, discs or USB drives. Since there are generally fewer steps to capturing paperless data, it is capable of being quicker and having less errors in the data capture process than paper-based data. Unfortunately, this technology is not always well understood. Individuals who are not tech savvy are likely to make errors. If an error is found, it may also be more difficult to find the original record as with paper-based data.
  • #7: Speaker Notes: To avoid errors and resulting data quality issues, it is important to plan data capture carefully. One of the first steps is to decide on which software, and sometimes hardware, you will be using. Some examples of software that can be used for data capture include CS-Pro, Microsoft Access or Excel, and EpiInfo. Next, you will need to define your database structure and create a data entry screen, which should be user-friendly and include check for plausible values. When possible, you should make provision for double-entry. This is most important for paper-based data, as errors are often introduced in the keying process.
  • #8: Speaker Notes: It is important right from the beginning to set data quality guidelines with clearly defined standards which will benchmark the quality of your data. By doing so, other users will trust your data. Three key steps should be considered.   Double entry refers to entering the same information twice by two independent persons. The two entries are then compared. You should ensure that the two entries match 100% for all the variables. For any mismatching information you have to refer to the original questionnaire/form to fix the problem.   Consistency/validation refers to conflicting information between two variables/questions. All related questions/variables must have at least 99% consistency. The data processing tools (data entry screen) should be configured to maximize consistency.   Error (Range check) refers to values out of acceptable ranges. Overall, 100% of all responses or values should be within the acceptable range. For example if your study population include women 15-49 years, then any age value outside this range should not be accepted. This check should be done at two levels: field supervision and data entry.
  • #9: Speaker Notes: Having a questionnaire or form aids the management of data and can enhance data quality. In this diagram you can see an ideal questionnaire or form flow. The data is collected by the interviewer or field personnel. The complete questionnaire is passed from the interviewer to the field supervisor to the filing officer, skipping the database manager to the data entry personnel. If a questionnaire with problems is discovered during this process it is passed back down the chain, with each person examining the problem until it gets to the person who can resolve it, the database manager is not skipped. Finally the fully entered questionnaire is sent from the data entry personnel to the filing officer.
  • #10: Speaker Notes: After data is in your data base it will need to be cleaned. This process involves checking: completeness of the data; consistency which can be checked by comparing variables; plausibility to determine whether data values fall within an acceptable range; and whether or not there are duplicate entries or outliers. Outliers can be found by running analysis on basic frequencies and means.
  • #11: Speaker Notes: This is a data cleaning trade off curve. You can see that as time and cost spent cleaning data increase, the data accuracy improves. However, there are clearly diminishing returns on investment. As more time is spent past a certain level, you see less improvements in data accuracy.
  • #12: Speaker Notes: Data security is important to maintain both confidentiality and the integrity of the data. Access to data should be restricted using a password and paper documents should be kept in a secured/locked location. Unique identifiers should be removed from the data set and the final analytical data should be anonymous. This is not to say that unique identifiers should be disposed of in all cases. In some studies, ethical review allows these to be kept for certain purposes. However, the person analyzing the data should not be able to identify individuals. Make sure to do a regular data backup; this can occur daily, weekly and/or monthly, depending on the nature of the data. If possible, you should store a copy of your back up off-site. This guards you against threats to integrity of the data as you will always be able to go back to the original data set even if the one you have on site is corrupted or destroyed.
  • #13: Speaker Notes: There are a few other aspects regarding data management that need to be considered when undertaking any data collection effort. Data ownership, or who has the legal rights to the data and who retains the data, should be decided before any data is collected. This will avoid confusion and conflict later on. A policy for data retention should specify the length of time one needs to keep the project data. For some projects, especially those on which a significant amount of publications are based, this may be a very considerable amount of time. This has implications on your data storage arrangements. Finally, all parties should agree on how data and results will be shared and disseminate. A data sharing policy should clearly stipulate when data should and should not be shared.
  • #14: Speaker Notes: In the previous module, we discussed the importance of using data to make informed decisions. One of the limitations of data use mentioned was the limited ability of program staff to analyze and interpret data. For data to be useful, they need to be processed and summarized to become meaningful as they relate to the program. The focus on this session is to present key concepts in data analysis. This session will review the most common data analysis terms and techniques used for descriptive data analysis. Then, in the next session, we’ll apply these techniques to the monitoring of health service delivery.
  • #15: Speaker Notes: The three objectives of this session are to : 1) Strengthen knowledge of terminology used in data analysis 2) Improve capacity to summarize data and present information 3) Strengthen skills in select data analysis and interpretation 4) Strengthen effective communication methods
  • #16: Speaker Notes: It is important to note that while the terms data and information are often used interchangeably; there is a distinction. Data refers to raw, unprocessed numbers, measurements, or text. Information refers to data that is processed, organized, structured or presented in a specific context. The process of transforming data into information is data analysis. The purpose of data analysis is to provide answers to questions being asked at a program site or to research questions being studied.
  • #17: Speaker Notes: While computer packages may aid in performing some analysis, analysis does not mean using a complicated computer analysis package. It means taking the data that you collect and looking at it in comparison to the questions that you need to answer. For example, if what you need to know is if your program is meeting its objectives – or if its on track - you would look at your program targets and compare them to the actual program performance. This is analysis. We will later take this one step further and talk about interpretation (e.g., through analysis you find that your program achieved only 10% of its target; now you have to figure out why.)
  • #18: Speaker Notes: Suppose you need to know if your program is on track, you would probably look at your program targets and compare them to the actual program performance. This is analysis. Interpretation is using the analysis to further understand your findings and the implications for your program. In many cases, this means using additional information, such as vital statistics, population-based surveys, and qualitative data, to supplement the routine service statistics. We will talk more about this later in the workshop.
  • #19: Speaker Notes: In this course, we will not be addressing all possible analyses that can be conducted at the program level – as there are many. We will be looking at some that MEASURE Evaluation staff have noted as common to the projects we work with. There are some common comparisons you can make when analyzing data:   Comparing actual performance (of specific indicators) against targets Examples: No. of persons trained as of 6/12/13: 15 persons Targeted no. of persons trained by 1/30/14: 100 persons     Comparing current performance to prior year   Example:  No. of LLIN distributed in 2011: 50,000 No. of LLIN distributed in 2012: 167,000   Compare performance between sites or groups   Example:  No. of fever cases tested for malaria by clinics in district A: 3,500 No. of fever cases tested for malaria by clinics in district B: 8,000  
  • #20: Speaker Notes: This slide lists the basic statistical terms used in data analysis that we will cover in this session.
  • #21: Speaker Notes: The most commonly investigated characteristic of a collection of data (or dataset) is its center, or the point around which the observations tend to cluster. The mean is the most frequently used measure to look at the central values of a dataset. The mean takes into consideration the magnitude of every value, which makes it sensitive to extreme values. If there are data in the dataset with extreme values – extremely low or high compared to most other values in the dataset – the mean may not be the most accurate method to use in assessing the point around which the observations tend to cluster. Use the mean when the data is normally distributed (symmetric).
  • #22: Speaker Notes: The middle value of a set of data when data points are arranged from least to greatest value The median is another measurement of central tendency but it is not as sensitive to extreme values as the mean because it takes into consideration the ordering and relative magnitude of the values. We therefore use the median when data are not symmetric or skewed. If a list of values is ranked from smallest to largest, then half of the values are greater than or equal to the median and the other half are less than or equal to it. When there is an even number of values, the median is the average of the two mid-point values. For example, for the first list of cases on the slide (2008), there are an even number of values and we have taken the average of the 6th and 7th highest of 12 values. You add 41+45 to get 86, and then divide that by 2 to get 43. When there is an odd number of values, the median is the middle value. For example, for the 2nd list (2009), the median is 39. Remember: with the median, you have to rank (or order) the figures before you can calculate it.
  • #23: Speaker Notes: The mode, which is used less often, is the value that occurs most frequently. If no values are repeated, there can be no mode. The mode is the least useful (and least used) of the three measures of central tendency
  • #24: Speaker Notes: Ask participants to use the example provided to calculate the mean and median. Wait a few minutes then ask participants to share their answers. Address any confusion. The mean is the sum of the values divided by the number of values The median is the average of the two middle values (1.8 and 1.9) because there are an even number of values otherwise the median would be the middle value.
  • #25: Speaker Notes: A ratio is a comparison of two numbers and is expressed as “a to b” or “a per b”. A proportion is a ratio in which all individuals included in the numerator are not necessarily included in the denominator. If we were to say there are 3 staff per clinic, the ratio is expressed numerically as 3:1. It is not the same as saying 1 to 3 or 1:3. The order of the numbers matters.
  • #26: Speaker Notes: A proportion is a ratio in which all individuals included in the numerator must also be included in the denominator. For example, if a clinic has 12 female clients and 8 male clients, the denominator is total clients, male and female or 20. The proportion of male clients is eight-twentieths or two-fifths.
  • #27: Speaker Notes: A percentage is a way to express a proportion multiplied by 100. By calculating a percentage, we can compare data across facilities, regions, and countries. Using the previous example, we saw that two-fifths of the clients are male. To make this a percentage, we convert the fraction to a decimal and multiply by 100 – 40%. Remember, in this example the denominator includes all of the clients both male and female. It is important to know and express the nature of the denominator. What is the whole? Are we talking about all clients? All pregnant clients? All clients with a fever?
  • #28: Speaker Notes: A percentage is used to express a quantity relative to another quantity. It allows us to compare different population groups, facilities, countries which may have different denominators. Percentages also allow us to understand what needs to be done by helping us track progress against targets and to look at our performance against quality of care indicators.
  • #29: Speaker Notes: In public health a rate is the number of cases that occur in a given time period divided by the population at risk during that time period. Since the number of occurrences of a specified outcome depends upon the size of the population being considered, dividing by their population sizes makes two groups more comparable. A rate is often expressed per 1,000, 10,000 or 100,000 population. A rate would be used most often in public health for expressing issues which occur infrequently, such as maternal mortality. It makes it easier to express 8 per 100,000 rather than .00008%. This is quite different from a ratio. In a ratio all individuals included in the numerator are not necessarily included in the denominator. The under-five mortality rate is the probability (expressed as a rate per 1,000 live births) of a child born in a specified year dying before reaching the age of five if subject to current age-specific mortality rates.
  • #30: Speaker Notes: For example: To calculate the API, you can divide confirmed malaria cases by the total number of people under surveillance. The people under surveillance may be your entire population at risk or the number of people in your project area
  • #31: Speaker Notes: Some of the most common data analysis software are: Microsoft Access, Microsoft Excel; Epi-Info; SPSS; Stata and SAS. Most of you probably have access to the Microsoft programs on your computers. Epi-Info is free. The other three software are not free and some have a considerable cost. Fortunately, much of the data analysis for programs can be done using Microsoft Excel.
  • #32: Speaker Notes: Now we will do an exercise to demonstrate some of these data analysis concepts.
  • #33: Speaker Notes: By the end of the session, participants will be able to: [READ BULLETS]